2019 Fiscal Year Annual Research Report

Research for unsupervised acoustic pattern discovery with zero resources

Research Project

Project/Area Number	17K00237
Research Institution	Nara Institute of Science and Technology
Principal Investigator	サクリアニサクティ奈良先端科学技術大学院大学, 先端科学技術研究科, 特任准教授 (00395005)
Co-Investigator(Kenkyū-buntansha)	中村哲奈良先端科学技術大学院大学, データ駆動型サイエンス創造センター, 教授 (30263429)
Project Period (FY)	2017-04-01 – 2020-03-31
Keywords	音声認識 / ゼロ資源音声技術 / 脳波 / 音声翻訳
Outline of Annual Research Achievements	東京オリンピック・パラリンピックが近づくにつれ、海外からの観光客との言葉の壁はますます深刻な問題となっている。現在の音声認識・音声翻訳技術は、リソースが大きい言語についてはすでに容易に利用できるため、ここでは言語特有の知識も書き起こしデータもないようなゼロ資源の音声処理の問題を対象とする。2018 年度では、インドネシア言語のゼロリソースモデリングの構築に成功した。今回は、Dirichlet プロセスのガウス混合モデルを利用する代わりに、ディープラーニングに基づいてシステムを構築した。このシステムでは、（１）サブワード単位を発見すること、（２）音声を合成すること、および両方とも教師なしで行うことができた。また、2019年の世界ゼロ資源スピーチチャレンジに参加し、提案手法で上位結果を得ることができた。さらに、脳解析研究について、2018年度では、Speech-Imagination中のEEG振動とあからさまな相手の音声包絡線との間の同期を明らかにするための研究を行った。2019年では引き続き2020年のWorld Zero Resource Speech Challengeに参加し、システムのパフォーマンスを向上させることができた。また、テキストを書き起こさずに、未知の言語用の教師なし音声音声変換を作成し、IEEE自動音声認識および理解会議で公開した。また、すべての言語、すべての人々、すべての国の言語テクノロジーをサポートする世界言語言語コンソーシアムのため、ユネスコとの協力関係を構築した。このプロジェクトは、今後、2022年から2023年の10年間、国連国際先住民族言語年として継続される予定である。

Research Products
(26 results)

All 2020 2019 2018

All Journal Article (7 results) (of which Int'l Joint Research: 5 results, Peer Reviewed: 7 results, Open Access: 3 results) Presentation (18 results) (of which Int'l Joint Research: 15 results) Patent(Industrial Property Rights) (1 results)

[Journal Article] Machine Speech Chain2020
- Author(s)
  Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech and Language Processing
  
  Volume: Vol.28 Pages: 976-989
- DOI
  10.1109/TASLP.2020.2977776
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Leveraging Neural Caption Translation with Visually Grounded Paraphrase Augmentation2020
- Author(s)
  Johanes Effendi, Katsuhito Sudoh, Sakriani Sakti, Satoshi Nakamura
- Journal Title
  
  IEICE
  
  Volume: Vol.E103-D, No.03 Pages: 674-683
- DOI
  10.1587/transinf.2019EDP7065
- Peer Reviewed
[Journal Article] Recurrent Neural Network Compression based on Low-Rank Tensor Representation2020
- Author(s)
  Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
- Journal Title
  
  IEICE
  
  Volume: Volume E103.D Issue 2 Pages: 435-449
- DOI
  10.1587/transinf.2019EDP7040
- Peer Reviewed
[Journal Article] End-to-End Speech Recognition Sequence Training with Reinforcement Learning2019
- Author(s)
  Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
- Journal Title
  
  IEEE Access
  
  Volume: Volume: 7 Pages: 79758-79769
- DOI
  10.1109/ACCESS.2019.2922617
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Positive Emotion Elicitation in Chat-Based Dialogue Systems2019
- Author(s)
  Nurul Lubis, Sakriani Sakti, Koichiro Yoshino, Satoshi Nakamura
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech and Language Processing
  
  Volume: Volume: 27, Issue: 4 Pages: 866-877
- DOI
  10.1109/TASLP.2019.2900910
- Peer Reviewed / Int'l Joint Research
[Journal Article] Synchronization between overt speech envelope and EEG oscillations during imagined speech2019
- Author(s)
  Hiroki Watanabe, Hiroki Tanaka, Sakriani Sakti, Satoshi Nakamura
- Journal Title
  
  Neuroscience Research
  
  Volume: Volume 153 Pages: 48-55
- DOI
  10.1016/j.neures.2019.04.004
- Peer Reviewed / Int'l Joint Research
[Journal Article] Neural Oscillation-Based Classification of Japanese Spoken Sentences During Speech Perception2019
- Author(s)
  Hiroki Watanabe, Hiroki Tanaka, Sakriani Sakti, Satoshi Nakamura
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: Volume E102.D, issue 2 Pages: 383-391
- DOI
  10.1587/transinf.2018EDP7293
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] Neural Incremental Speech Recognition Through Attention Transfer2020
- Author(s)
  Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
- Organizer
  ANLP
[Presentation] From Speech Chain to Multimodal Chain: Leveraging Cross-modal Data Augmentation for Semi-supervised Learning2020
- Author(s)
  Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
- Organizer
  ANLP
[Presentation] Speech-to-Speech Translation without Text2020
- Author(s)
  Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
- Organizer
  ANLP
[Presentation] Neural Machine Translation with Acoustic Embedding2019
- Author(s)
  Takatomo Kano, Sakriani Sakti, Satoshi Nakamura
- Organizer
  IEEE Automatic Speech Recognition and Understanding (ASRU) Workshop
- Int'l Joint Research
[Presentation] Zero-shot Code-switching ASR and TTS with Multilingual Machine Speech Chain2019
- Author(s)
  Sahoko Nakayama, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
- Organizer
  IEEE Automatic Speech Recognition and Understanding (ASRU) Workshop
- Int'l Joint Research
[Presentation] Listening while Speaking: Improving ASR through Multimodal Chain2019
- Author(s)
  Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
- Organizer
  IEEE Automatic Speech Recognition and Understanding (ASRU) Workshop
- Int'l Joint Research
[Presentation] Speech-to-speech Translation between Untranscribed Unknown Languages2019
- Author(s)
  Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
- Organizer
  IEEE Automatic Speech Recognition and Understanding (ASRU) Workshop
- Int'l Joint Research
[Presentation] Dialogue Model and Response Generation for Emotion Improvement Elicitation2019
- Author(s)
  Nurul Lubis, Sakriani Sakti, Koichiro Yoshino, Satoshi Nakamura
- Organizer
  the 3rd Conversational AI workshop - NeurIPS 2019
- Int'l Joint Research
[Presentation] Recognition and Translation of Code-switching Speech Utterances2019
- Author(s)
  Sahoko Nakayama, Takatomo Kano, Andros Tjandra, Sakriani Sakti, and Satoshi Nakamura
- Organizer
  Oriental COCOSDA 2019
- Int'l Joint Research
[Presentation] Phoneme Level Speaking Rate Variation on Waveform Generation using GAN-TTS2019
- Author(s)
  Mayuko Okamoto, Sakriani Sakti, and Satoshi Nakamura
- Organizer
  Oriental COCOSDA 2019
- Int'l Joint Research
[Presentation] Sequence-to-sequence Learning via Attention Transfer for Incremental Speech Recognition2019
- Author(s)
  Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
- Organizer
  Interspeech 2019
- Int'l Joint Research
[Presentation] VQVAE Unsupervised Unit Discovery and Multi-Scale Code2Spec Inverter for Zerospeech Challenge 20192019
- Author(s)
  Andros Tjandra, Berrak Sisman, Mingyang Zhang, Sakriani Sakti, Haizou Li, Satoshi Nakamura
- Organizer
  Interspeech 2019
- Int'l Joint Research
[Presentation] Neural iTTS: Toward Synthesizing Speech in Real-time with End-to-end Neural Text-to-Speech Framework2019
- Author(s)
  Tomoya Yanagita, Sakriani Sakti and Satoshi Nakamura
- Organizer
  SSW
- Int'l Joint Research
[Presentation] Speech Quality Evaluation of Synthesized Japanese Speech Using EEG2019
- Author(s)
  Ivan Halim Parmonangan, Hiroki Tanaka, Sakriani Sakti, Shinnosuke Takamichi, Satoshi Nakamura
- Organizer
  Interspeech 2019
- Int'l Joint Research
[Presentation] EEG Analysis towards Evaluating Synthesized Speech Quality2019
- Author(s)
  Ivan Halim Parmonangan, Hiroki Tanaka, Sakti Sakriani, Shinnosuke Takamichi, Satoshi Nakamura
- Organizer
  IEEE Engineering in Medicine and Biology Society
- Int'l Joint Research
[Presentation] Cross-lingual speech-based ToBI label generation using bidirectional LSTM2019
- Author(s)
  Marco Vetter, Sakriani Sakti, Satoshi Nakamura
- Organizer
  IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP)
- Int'l Joint Research
[Presentation] End-to-end feedback loss in speech chain framework via straight-through estimator2019
- Author(s)
  Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
- Organizer
  IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP)
- Int'l Joint Research
[Presentation] Speech Artifact Removal from EEG Recordings of Spoken Word Production with Tensor Decomposition2019
- Author(s)
  Holy Lovenia, Hiroki Tanaka, Sakriani Sakti, Ayu Purwarianti, Satoshi Nakamura
- Organizer
  IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP)
- Int'l Joint Research
[Patent(Industrial Property Rights)] スピーチチェイン装置、コンピュータプログラムおよびＤＮＮ音声認識・合成相互学習方法2018
- Inventor(s)
  アンドロスチャンドラ, サクリアニサクティ, 中村哲
- Industrial Property Rights Holder
  アンドロスチャンドラ, サクリアニサクティ, 中村哲
- Industrial Property Rights Type
  特許
- Patent Publication Number
  特開2019-120841

2019 Fiscal Year Annual Research Report

Research for unsupervised acoustic pattern discovery with zero resources

Principal Investigator

サクリアニ サクティ 奈良先端科学技術大学院大学, 先端科学技術研究科, 特任准教授 (00395005)

Research Products

[Journal Article] Machine Speech Chain2020

Author(s)

Journal Title

DOI

[Journal Article] Leveraging Neural Caption Translation with Visually Grounded Paraphrase Augmentation2020

Author(s)

Journal Title

DOI

[Journal Article] Recurrent Neural Network Compression based on Low-Rank Tensor Representation2020

Author(s)

Journal Title

DOI

[Journal Article] End-to-End Speech Recognition Sequence Training with Reinforcement Learning2019

Author(s)

Journal Title

DOI

[Journal Article] Positive Emotion Elicitation in Chat-Based Dialogue Systems2019

Author(s)

Journal Title

DOI

[Journal Article] Synchronization between overt speech envelope and EEG oscillations during imagined speech2019

Author(s)

Journal Title

DOI

[Journal Article] Neural Oscillation-Based Classification of Japanese Spoken Sentences During Speech Perception2019

Author(s)

Journal Title

DOI

[Presentation] Neural Incremental Speech Recognition Through Attention Transfer2020

Author(s)

Organizer

[Presentation] From Speech Chain to Multimodal Chain: Leveraging Cross-modal Data Augmentation for Semi-supervised Learning2020

Author(s)

Organizer

[Presentation] Speech-to-Speech Translation without Text2020

Author(s)

Organizer

[Presentation] Neural Machine Translation with Acoustic Embedding2019

Author(s)

Organizer

[Presentation] Zero-shot Code-switching ASR and TTS with Multilingual Machine Speech Chain2019

Author(s)

Organizer

[Presentation] Listening while Speaking: Improving ASR through Multimodal Chain2019

Author(s)

Organizer

[Presentation] Speech-to-speech Translation between Untranscribed Unknown Languages2019

Author(s)

Organizer

[Presentation] Dialogue Model and Response Generation for Emotion Improvement Elicitation2019

Author(s)

Organizer

[Presentation] Recognition and Translation of Code-switching Speech Utterances2019

Author(s)

Organizer

[Presentation] Phoneme Level Speaking Rate Variation on Waveform Generation using GAN-TTS2019

Author(s)

Organizer

[Presentation] Sequence-to-sequence Learning via Attention Transfer for Incremental Speech Recognition2019

Author(s)

Organizer

[Presentation] VQVAE Unsupervised Unit Discovery and Multi-Scale Code2Spec Inverter for Zerospeech Challenge 20192019

Author(s)

Organizer

[Presentation] Neural iTTS: Toward Synthesizing Speech in Real-time with End-to-end Neural Text-to-Speech Framework2019

Author(s)

Organizer

[Presentation] Speech Quality Evaluation of Synthesized Japanese Speech Using EEG2019

Author(s)

Organizer

[Presentation] EEG Analysis towards Evaluating Synthesized Speech Quality2019

Author(s)

Organizer

[Presentation] Cross-lingual speech-based ToBI label generation using bidirectional LSTM2019

Author(s)

サクリアニサクティ奈良先端科学技術大学院大学, 先端科学技術研究科, 特任准教授 (00395005)