2018 Fiscal Year Research-status Report

ゼロ資源での教師なし音響パターン発見のための研究

Research Project

Project/Area Number	17K00237
Research Institution	Nara Institute of Science and Technology
Principal Investigator	サクリアニサクティ奈良先端科学技術大学院大学, 先端科学技術研究科, 特任准教授 (00395005)
Co-Investigator(Kenkyū-buntansha)	中村哲奈良先端科学技術大学院大学, データ駆動型サイエンス創造センター, 教授 (30263429)
Project Period (FY)	2017-04-01 – 2020-03-31
Keywords	音声認識 / ゼロ資源音声技術 / 脳波
Outline of Annual Research Achievements	2020年東京オリンピック・パラリンピックが近づくにつれ、海外からの観光客との言葉の壁はますます深刻な問題となっている。現在の音声認識・音声翻訳技術は、リソースが大きい言語についてはすでに容易に利用できるため、ここでは言語特有の知識も書き起こしデータもないようなゼロ資源の音声処理の問題を対象とする。2017 年度では、当初計画にあったアフリカ言語（ツォンガ語）のゼロリソースモデリングの構築に成功した。さらに、2017年のゼロリソースの音声チャレンジに成功することができた。インドネシアの大学との連携もまだ進行中であるが、研究成果は得られていなかった。2018 年度では、インドネシア言語のゼロリソースモデリングの構築に成功した。今回は、Dirichlet プロセスのガウス混合モデルを利用する代わりに、ディープラーニングに基づいてシステムを構築した。このシステムでは、（１）サブワード単位を発見すること、（２）音声を合成すること、および両方とも教師なしで行うことができた。また、2019年の世界ゼロ資源スピーチチャレンジに参加し、提案手法で上位結果を得ることができた。さらに、脳解析研究について、2017 年度では、脳波検査を用いて文章を判別する実験を行った。2018年度では、Speech-Imagination中のEEG振動とあからさまな相手の音声包絡線との間の同期を明らかにするための研究を行った。具体的には、（1）Speech-Imagination中のEEGベースの回帰音声エンベロープが顕在音声エンベロープと相関するかどうか、および（2）Imagined-EEGが参加した異なるエンベロープで音声刺激を分類できるかどうかを調べた。これらの結果は、Speech-Imagination中のＥＥＧと明白な対応物のエンベロープとの間の同期を示している。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 今年は、2017年には不可能だったインドネシア語のゼロリソースモデリングの構築に成功した。さらに、2019年のゼロリソースボイスチャレンジで最高のパフォーマンスを達成することができた。インドネシアの大学との協力は継続中である。また、他のアジアの研究機関との共同研究も始めている。Speech-ImaginationのEEG分析も行っている。現在のところまだ研究結果は得られていないが、インドネシア語でもEEG実験を開始した。
Strategy for Future Research Activity	次の研究活動を2019年に継続する。 (1)ゼロリソースモデリングとEEG実験の継続。 (2)提案枠組みの完成：低資源言語（インドネシア語／ツォンガ語）から主要言語（日本語／英語）への音声翻訳が可能な本格的なシステムの構築を目指す。

Research Products
(32 results)

All 2019 2018 Other

All Int'l Joint Research (1 results) Journal Article (6 results) (of which Int'l Joint Research: 6 results, Peer Reviewed: 6 results, Open Access: 6 results) Presentation (25 results) (of which Int'l Joint Research: 15 results)

[Int'l Joint Research] University of Indonesia/Institute Technology Bandung(インドネシア)
- Country Name
  INDONESIA
- Counterpart Institution
  University of Indonesia/Institute Technology Bandung
[Journal Article] Neural Oscillation-Based Classification of Japanese Spoken Sentences During Speech Perception2019
- Author(s)
  Hiroki Watanabe, Hiroki Tanaka, Sakriani Sakti, Satoshi Nakamura
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: Volume E102.D, issue 2 Pages: 383-391
- DOI
  10.1587/transinf.2018EDP7293
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Electroencephalogram-Based Single Trial Detection of Language Expectation Violations in Listening to Speech2019
- Author(s)
  Hiroki Tanaka, Hiroki Watanabe, Hayato Maki, Sakti Sakriani, Satoshi Nakamura
- Journal Title
  
  Frontiers in Computational Neuroscience
  
  Volume: 13
- DOI
  10.3389/fncom.2019.00015
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Quality Prediction of Synthesized Speech Based on Tensor Structured EEG Signals2018
- Author(s)
  Hayato Maki, Sakriani Sakti, Hiroki Tanaka, Satoshi Nakamura
- Journal Title
  
  PloS One
  
  Volume: 13 Pages: pp. 1-13
- DOI
  10.1371/journal.pone.0193521
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Construction of Spontaneous Emotion Corpus from Indonesian TV Talk Shows and Its Application on Multimodal Emotion Recognition2018
- Author(s)
  Nurul Lubis, Dessi Lestari, Sakriani Sakti, Ayu Purwarianti, and Satoshi Nakamura
- Journal Title
  
  Transactions on Information and Systems, Institute of Electronics, Information and Communication Engineers (IEICE)
  
  Volume: E101-D Pages: pp. 2092-2100
- DOI
  10.1587/transinf.2017EDP7362
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Sequence-to-Sequence Models for Emphasis Speech Translation2018
- Author(s)
  Quoc Truong Do, Sakriani Sakti, Satoshi Nakamura
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 26 Pages: pp. 1873 - 1883
- DOI
  10.1109/TASLP.2018.2846402
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Dirichlet Process Mixture of Mixtures Model for Unsupervised Subword Modeling2018
- Author(s)
  Michael Heck, Sakriani Sakti, Satoshi Nakamura
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 26 Pages: pp. 2027 - 2042
- DOI
  10.1109/TASLP.2018.2852500
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] カリキュラムラーニングを用いた音声翻訳の学習戦略の提案2019
- Author(s)
  叶高朋, Sakriani Sakti, 中村哲
- Organizer
  言語処理学会第25回年次大会（NLP2019)
[Presentation] Machine Speech Chainに基づく半教師あり学習を用いた日英コードスイッチング音声の認識2019
- Author(s)
  中山佐保子, Andros Tjandra, Sakriani Sakti, 中村哲
- Organizer
  言語処理学会第25回年次大会（NLP2019)
[Presentation] Affect-sensitive Dialogue Response Generation for Positive Emotion Elicitation2019
- Author(s)
  Nurul Lubis, Sakriani Sakti, Koichiro Yoshino and Satoshi Nakamura
- Organizer
  言語処理学会第25回年次大会（NLP2019)
[Presentation] Enhancing Neural Machine Translation with Image-based Paraphrase Augmentation2019
- Author(s)
  Johanes Effendi, Sakriani Sakti, Katsuhito Sudoh and Satoshi Nakamura
- Organizer
  言語処理学会第25回年次大会（NLP2019)
[Presentation] Speaker and Emotion Recognition of TV-Series Data Using Multimodal and Multitask Deep Learning2019
- Author(s)
  Sashi Novitasari, Quoc Truong Do, Sakriani Sakti, Dessi Lestari and Satoshi Nakamura
- Organizer
  言語処理学会第25回年次大会（NLP2019)
[Presentation] Unifying Speech Recognition and Generation with Machine Speech Chain2019
- Author(s)
  Andros Tjandra, Sakriani Sakti and Satoshi Nakamura
- Organizer
  言語処理学会第25回年次大会（NLP2019)
[Presentation] Sequence-to-Sequence ASR Optimization via Reinforcement Learning2018
- Author(s)
  Andors Tjandra, Sakriani Sakti, Satoshi Nakamura
- Organizer
  2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Int'l Joint Research
[Presentation] Graph regularized tensor factorization for single-trial EEG analysis2018
- Author(s)
  Hayato Maki, Hiroki Tanaka, Sakriani Sakti, Satoshi Nakamura
- Organizer
  2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Int'l Joint Research
[Presentation] Construction of English-French Multimodal Affective Conversational Corpus from Drama TV Series2018
- Author(s)
  Sashi Novitasari, Quoc-Truong Do, Sakriani Sakti, Dessi Lestari, Satoshi Nakamura
- Organizer
  LREC 2018
- Int'l Joint Research
[Presentation] Multi-modal Muti-task Deep Learning for Speaker and Emotion Recognition of TV-series Data2018
- Author(s)
  Sashi Novitasari, Quoc-Truong Do, Sakriani Sakti, Dessi Lestari, Satoshi Nakamura
- Organizer
  Oriental COCOSDA 2018
- Int'l Joint Research
[Presentation] Japanese-English Code-Switching Speech Data Construction2018
- Author(s)
  Sahoko Nakayama, Takatomo Kano, Quoc-Truong Do, Sakriani Sakti, Satoshi Nakamura
- Organizer
  Oriental COCOSDA 2018
- Int'l Joint Research
[Presentation] Single-trial Detection of Semantic Anomalies from EEG during Listening to Spoken Sentences2018
- Author(s)
  Hiroki Tanaka, Hiroki Watanabe, Hayato Maki, Sakriani Sakti, Satoshi Nakamura
- Organizer
  International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2018)
- Int'l Joint Research
[Presentation] Compressing End-to-End ASR Networks by Tensor-Train Decomposition2018
- Author(s)
  Takuma Mori, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
- Organizer
  Interspeech 2018
- Int'l Joint Research
[Presentation] Optimizing DPGMM Clustering in Zero Resource Setting Based on Functional Load2018
- Author(s)
  Bin Wu, Sakriani Sakti, Satoshi Nakamura
- Organizer
  SLTU 2018
- Int'l Joint Research
[Presentation] Incremental TTS for Japanese Language2018
- Author(s)
  Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura
- Organizer
  Interspeech 2018
- Int'l Joint Research
[Presentation] Machine Speech Chain with One-shot Speaker Adaptation2018
- Author(s)
  Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
- Organizer
  Interspeech 2018
- Int'l Joint Research
[Presentation] Speech Chain for Semi-Supervised Learning of Japanese-English Code-Switching ASR and TTS2018
- Author(s)
  Sahoko Nakayama, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
- Organizer
  IEEE SLT
- Int'l Joint Research
[Presentation] Multi-scale Alignment and Contextual History for Attention Mechanism in Sequence-to-Sequence Model2018
- Author(s)
  Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
- Organizer
  IEEE SLT
- Int'l Joint Research
[Presentation] Toward Multi-features Emphases Speech Translation: Assessment of Human Emphases Production and Perception with Speech and Text Clues2018
- Author(s)
  Quoc-Truong Do, Sakriani Sakti, Satoshi Nakamura
- Organizer
  IEEE SLT
- Int'l Joint Research
[Presentation] Using Spoken Word Posterior Features in Neural Machine Translation2018
- Author(s)
  Kaho Osamura, Takatomo Kano, Sakriani Sakti, Satoshi Nakamura
- Organizer
  IWSLT 2018
- Int'l Joint Research
[Presentation] Multi-paraphrase Augmentation to Leverage Neural Caption Translation2018
- Author(s)
  Johanes Effendi, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura
- Organizer
  IWSLT 2018
- Int'l Joint Research
[Presentation] Machine Speech Chain with Deep Learning2018
- Author(s)
  Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
- Organizer
  日本音響学会2018年秋季研究発表会
[Presentation] Multimodal Database of Negative Emotion Recovery in Dyadic Interactions: Construction and Analysis2018
- Author(s)
  Nurul Lubis, Michael Heck, Sakriani Sakti, Koichiro Yoshino, Satoshi Nakamura
- Organizer
  日本音響学会2018年秋季研究発表会
[Presentation] 日英コードスイッチング音声データの構築2018
- Author(s)
  中山佐保子, ドクオックチュオン, サクティサクリアニ, 中村哲
- Organizer
  日本音響学会2018年秋季研究発表会
[Presentation] Visual Description Paraphrase Corpus Creation with Various Elementary Operations2018
- Author(s)
  Johanes Effendi, Sakriani Sakti, Satoshi Nakamura
- Organizer
  日本音響学会2018年秋季研究発表会

2018 Fiscal Year Research-status Report

ゼロ資源での教師なし音響パターン発見のための研究

Principal Investigator

サクリアニ サクティ 奈良先端科学技術大学院大学, 先端科学技術研究科, 特任准教授 (00395005)

Current Status of Research Progress

Reason

Research Products

[Int'l Joint Research] University of Indonesia/Institute Technology Bandung(インドネシア)

Country Name

Counterpart Institution

[Journal Article] Neural Oscillation-Based Classification of Japanese Spoken Sentences During Speech Perception2019

Author(s)

Journal Title

DOI

[Journal Article] Electroencephalogram-Based Single Trial Detection of Language Expectation Violations in Listening to Speech2019

Author(s)

Journal Title

DOI

[Journal Article] Quality Prediction of Synthesized Speech Based on Tensor Structured EEG Signals2018

Author(s)

Journal Title

DOI

[Journal Article] Construction of Spontaneous Emotion Corpus from Indonesian TV Talk Shows and Its Application on Multimodal Emotion Recognition2018

Author(s)

Journal Title

DOI

[Journal Article] Sequence-to-Sequence Models for Emphasis Speech Translation2018

Author(s)

Journal Title

DOI

[Journal Article] Dirichlet Process Mixture of Mixtures Model for Unsupervised Subword Modeling2018

Author(s)

Journal Title

DOI

[Presentation] カリキュラムラーニングを用いた音声翻訳の学習戦略の提案2019

Author(s)

Organizer

[Presentation] Machine Speech Chainに基づく半教師あり学習を用いた日英コードスイッチング音声の認識2019

Author(s)

Organizer

[Presentation] Affect-sensitive Dialogue Response Generation for Positive Emotion Elicitation2019

Author(s)

Organizer

[Presentation] Enhancing Neural Machine Translation with Image-based Paraphrase Augmentation2019

Author(s)

Organizer

[Presentation] Speaker and Emotion Recognition of TV-Series Data Using Multimodal and Multitask Deep Learning2019

Author(s)

Organizer

[Presentation] Unifying Speech Recognition and Generation with Machine Speech Chain2019

Author(s)

Organizer

[Presentation] Sequence-to-Sequence ASR Optimization via Reinforcement Learning2018

Author(s)

Organizer

[Presentation] Graph regularized tensor factorization for single-trial EEG analysis2018

Author(s)

Organizer

[Presentation] Construction of English-French Multimodal Affective Conversational Corpus from Drama TV Series2018

Author(s)

Organizer

[Presentation] Multi-modal Muti-task Deep Learning for Speaker and Emotion Recognition of TV-series Data2018

Author(s)

Organizer

[Presentation] Japanese-English Code-Switching Speech Data Construction2018

Author(s)

Organizer

[Presentation] Single-trial Detection of Semantic Anomalies from EEG during Listening to Spoken Sentences2018

Author(s)

Organizer

[Presentation] Compressing End-to-End ASR Networks by Tensor-Train Decomposition2018

Author(s)

Organizer

[Presentation] Optimizing DPGMM Clustering in Zero Resource Setting Based on Functional Load2018

Author(s)

Organizer

[Presentation] Incremental TTS for Japanese Language2018

Author(s)

Organizer

[Presentation] Machine Speech Chain with One-shot Speaker Adaptation2018

サクリアニサクティ奈良先端科学技術大学院大学, 先端科学技術研究科, 特任准教授 (00395005)