2017 Fiscal Year Annual Research Report

Next generation speech translation research

Research Project

Project/Area Number	17H06101
Research Institution	Nara Institute of Science and Technology
Principal Investigator	中村哲奈良先端科学技術大学院大学, データ駆動型サイエンス創造センター, 教授 (30263429)
Co-Investigator(Kenkyū-buntansha)	河原達也京都大学, 情報学研究科, 教授 (00234104) 猿渡洋東京大学, 大学院情報理工学系研究科, 教授 (30324974) 森島繁生早稲田大学, 理工学術院, 教授 (10200411) 戸田智基名古屋大学, 情報基盤センター, 教授 (90403328) 松本裕治奈良先端科学技術大学院大学, 先端科学技術研究科, 教授 (10211575) 須藤克仁奈良先端科学技術大学院大学, 先端科学技術研究科, 准教授 (00396152) サクリアニサクティ奈良先端科学技術大学院大学, 先端科学技術研究科, 特任准教授 (00395005) 吉野幸一郎奈良先端科学技術大学院大学, 先端科学技術研究科, 助教 (70760148) 田中宏季奈良先端科学技術大学院大学, 先端科学技術研究科, 助教 (10757834) 高道慎之介東京大学, 大学院情報理工学系研究科, 助教 (90784330)
Project Period (FY)	2017-05-31 – 2022-03-31
Keywords	音声翻訳
Outline of Annual Research Achievements	①A)教師無し手法である独立低ランク行列分析に関して、そのスパース性を制御できるモデル（t分布・一般化ガウス分布）へ拡張し、スパース性付与が雑音抑圧能力向上に寄与することを実証。B)音響モデルと言語モデルをニューラルネットワークで統合的に内包した単語単位のEnd-to-End (Acoustic-to-Word) モデルを実現。MNMFを用いた雑音抑圧法の研究。C)統語的距離がある日英のEnd-to-Endアテンションベースの同時通訳システムを構築。半教師学習が可能なマシンスピーチチェーンを開発。D)多言語機械翻訳における原言語側のデータ欠落を特殊記号で補う手法を提案し、機械翻訳精度を改善。E)対話翻訳のデータセットを構築しIWSLT2017において公開。対話翻訳特有の現象について分析。 ②A)深層学習を用いて強調などの非言語情報に対応した音声翻訳手法を提案し、従来手法を大きく上回る翻訳結果を得た。B)音声波形モデリングに基づく統計的声質変換技術として、信号処理に基づく音声波形加工処理と深層学習に基づく音声波形生成処理に関する研究を進め、基盤技術を大いに改善。 ③口内部の不自然さを払拭すべく、歯形モデルの生成、発話時口内部のビデオシーケンスのデータベース化、リップシンクの口形状と同期した口内部映像の合成により、パッチベースのポアソンイメージエディティングの手法で実写クオリティの表情合成システムを開発。音声信号からリップシンクをリミテッドアニメ風に実現するVoiceAnimatorシステムを実現。 ④リアルタイムのユーザの状態推定を目指し脳波からの違和感を検出。意味違反および統語違反状態を意図的に作り出し、被験者に聞いてもらう実験で、有意な事象関連電位の出現および6割程度での機械学習モデルによる意味違反検出に成功。 ⑤学術シンポジウムの日英講演の同時通訳コーパス約5.5時間分を構築。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 常時音声認識において従来と全く異なる認識方式を考案したことなど、一部、当初の計画以上の進展があったが、それ以外についてはおおむね計画通りに研究が進んでいる。
Strategy for Future Research Activity	①雑音抑制では、事前に用意された音源データを活用するため、独立低ランク行列分析における音源モデルパラメータ推定部をDNNによる推論に置き換える作業を進め、DNNとの融合を行う。スピーチチェーンの改良。英日間の機械翻訳における性能の検証。データ欠落を補う手法のさらなる性能向上。機械翻訳の基本方式の検討。対話翻訳特有の現象を考慮した翻訳手法を検討。翻訳の評価という観点で対話状態を用いるため対話状態付きの対訳データセットを構築。 ② 感情翻訳など新たな特性をシステムに導入する。深層学習に基づく音声波形生成処理の改善に取り組むとともに，声質変換処理へと導入することで，高精度な声質変換システムの構築を目指す。Voice Conversion Challenge 2018に参加し、構築したシステムの性能を明らかにする。③深層学習ベースの音声画像翻訳の手法に取り組む。④違和感検出モデルのさらなる改良、および違和感検出精度の向上を目指す。また同時通訳作業中の聴取負荷の測定実験を始める。⑤大規模な同時通訳コーパス構築のための体制を整備し100時間を超えるコーパス整備を行う。

Research Products
(51 results)

All 2018 2017

All Journal Article (4 results) (of which Int'l Joint Research: 2 results, Peer Reviewed: 4 results, Open Access: 4 results) Presentation (47 results) (of which Int'l Joint Research: 21 results, Invited: 3 results)

[Journal Article] Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks2018
- Author(s)
  Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 26 Pages: 84--96
- DOI
  10.1109/TASLP.2017.2761547
- Peer Reviewed / Open Access
[Journal Article] Preserving Word-Level Emphasis in Speech-to-Speech Translation2017
- Author(s)
  Quoc Truong Do, Tomoki Toda, Graham Neubig, Sakriani Sakti, and Satoshi Nakamura
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech and Language Processing
  
  Volume: 25 Pages: 544-556
- DOI
  10.1109/TASLP.2016.2643280
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Detecting Dementia through Interactive Computer Avatars2017
- Author(s)
  Hiroki Tanaka, Hiroyoshi Adachi, Norimichi Ukita, Manabu Ikeda, Hiroaki Kazui, Takashi Kudo, Satoshi Nakamura
- Journal Title
  
  IEEE Journal of Translational Engineering in Health and Medicine
  
  Volume: 5 Pages: 1-11
- DOI
  10.1109/JTEHM.2017.2752152
- Peer Reviewed / Open Access
[Journal Article] Articulatory modeling for pronunciation error detection without non-native training data based on DNN transfer learning.2017
- Author(s)
  R.Duan, T.Kawahara, M.Dantsuji, and J.Zhang.
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: E100-D Pages: 2174--2182
- DOI
  10.1587/transinf.2017EDP7019
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] 独立深層学習行列分析に基づく多チャネル音源分離2018
- Author(s)
  角野隼斗, 北村大地, 高宗典玄, 高道慎之介, 猿渡洋, 小野順貴
- Organizer
  日本音響学会 2018年春季研究発表会
[Presentation] Detecting Suppression of Negative Emotion by Time Series Change of Cerebral Blood Flow using fNIRS2018
- Author(s)
  Masahiro Honda, Hiroki Tanaka , Sakti Sakriani, Satoshi Nakamura
- Organizer
  IEEE International Conference on Biomedical and Health Informatics (BHI)
- Int'l Joint Research
[Presentation] Distilling Knowledge from a Multi-scale deep CNN Ensemble for Robust and Light-weight Acoustic Modeling2018
- Author(s)
  Michael Heck, Masayuki Suzuki, Takashi Fukuda, Gakuto Kurata, Satoshi Nakamura
- Organizer
  第120回音声言語情報処理研究会 (IPSJ SIG-SLP)
[Presentation] 音声認識単語仮説の曖昧性を考慮するニューラル機械翻訳2018
- Author(s)
  長村佳歩，叶高朋，SakrianiSakti，須藤克仁，中村哲
- Organizer
  言語処理学会第24回年次大会(NLP2018)
[Presentation] 原言語側の欠落を考慮したMulti-Source NMT2018
- Author(s)
  西村優汰, 須藤克仁, 中村哲
- Organizer
  言語処理学会第24回年次大会(NLP2018)
[Presentation] エージェントによる非定型質問への応答からの認知症検出2018
- Author(s)
  宇城毅犠，田中宏季，足立浩祥，數井裕光，池田学，工藤喬，中村哲
- Organizer
  IPSJ SIG
[Presentation] EEGを用いた合成音声に対する体感品質予想2018
- Author(s)
  真木勇人, Sakriani Sakti, 田中宏季, 中村哲
- Organizer
  電子情報通信学会MEとバイオサイバネティックス研究会（MBE）
[Presentation] 電極配置のグラフ構造を利用したテンソル分解による単一試行EEG解析2018
- Author(s)
  真木勇人, 田中宏季, Sakriani Sakti, 中村哲
- Organizer
  電子情報通信学会MEとバイオサイバネティックス研究会（MBE）
[Presentation] 生体信号からの感情コンピューティングと自閉症支援2018
- Author(s)
  田中宏季, 寺澤直人, 本田将大, 真木勇人, サクリアニサクティ
- Organizer
  第13回日本感性工学会春季大会
[Presentation] マルチチャネル非負値行列因子分解に基づくビームフォーミングを用いた雑音環境下音声認識.2018
- Author(s)
  島田一希, 坂東宜昭, 三村正人, 糸山克寿, 吉井和佳, 河原達也.
- Organizer
  電子情報通信学会SP
[Presentation] CTCによる文字単位のモデルを併用したattentionによる単語単位の end-to-end音声認識.2018
- Author(s)
  上乃聖, 稲熊寛文, 三村正人, 河原達也.
- Organizer
  情報処理学会SIG-SLP
[Presentation] 独立深層学習行列分析に基づく多チャネル音源分離の実験的評価2018
- Author(s)
  北村大地, 角野隼斗, 高宗典玄, 高道慎之介, 猿渡洋, 小野順貴
- Organizer
  電子情報通信学会技術研究報告音声研究会 (SP)
[Presentation] Development of NU voice conversion system 20182018
- Author(s)
  P.L. Tobing, Y.-C. Wu，T. Hayashi，K. Kobayashi，T. Toda
- Organizer
  電子情報通信学会技術研究報告音声研究会 (SP)
[Presentation] Development of NU non-parallel voice conversion system 20182018
- Author(s)
  Y.-C. Wu, P.L. Tobing, T. Hayashi, K. Kobayashi, T. Toda
- Organizer
  電子情報通信学会技術研究報告音声研究会 (SP)
[Presentation] Structured-Based Curriculum Learning for End-to-End English-Japanese Speech Translation2017
- Author(s)
  Takatomo Kano, Sakriani Sakti, Satoshi Nakamura
- Organizer
  Interspeech 2017
- Int'l Joint Research
[Presentation] Toward Expressive Speech Translation: A Unified Seq-to-Seq LSTMs Approach for Translating Words and Emphasis2017
- Author(s)
  Quoc Truong Do, Sakriani Sakti, Satoshi Nakamura
- Organizer
  Interspeech 2017
- Int'l Joint Research
[Presentation] Subject-independent Classification of Japanese Spoken Sentences by Multiple Frequency Bands Phase Pattern of EEG Response during Speech Perception2017
- Author(s)
  Hiroki Watanabe, Hiroki Tanaka, Sakriani Sakti, Satoshi Nakamurau
- Organizer
  Interspeech 2017
- Int'l Joint Research
[Presentation] Ensembles of Multi-scale VGG Acoustic Models2017
- Author(s)
  Michael Heck, Masayuki Suzuki, Takashi Fukuda, Gakuto Kurata, Satoshi Nakamura
- Organizer
  Interspeech 2017
- Int'l Joint Research
[Presentation] Recognizing Emotionally Coloured Dialogue Speech using Speaker-Adapted DNN-CNN Bottleneck Features2017
- Author(s)
  K. Mukaihara, S. Sakti, S. Nakamura
- Organizer
  SPECOM 2017
- Int'l Joint Research
[Presentation] Feature Optimized DPGMM Clustering for Unsupervised Subword Modeling: A Contribution to ZEROSPEECH 20172017
- Author(s)
  M. Heck, S. Sakti, S. Nakamura
- Organizer
  ASRU 2017
- Int'l Joint Research
[Presentation] Attention-based Wav2Text with Feature Transfer2017
- Author(s)
  A. Tjandra, S. Sakti, S. Nakamura
- Organizer
  ASRU 2017
- Int'l Joint Research
[Presentation] Listening while Speaking: Speech Chain by Deep Learning2017
- Author(s)
  A. Tjandra, S. Sakti, S. Nakamura
- Organizer
  ASRU 2017
- Int'l Joint Research
[Presentation] Local Monotonic Attention Mechanism for End-to-end Speech and Language Processing2017
- Author(s)
  A. Tjandra, S. Sakti, S. Nakamura
- Organizer
  IJCNLP 2017
- Int'l Joint Research
[Presentation] Neural Machine Translation via Binary Code Prediction2017
- Author(s)
  Yusuke Oda, Philip Arthur, Graham Neubig, Koichiro Yoshino, Satoshi Nakamura
- Organizer
  55th Annual Meeting of the Association for Computational Linguistics (ACL) (Long Papers)
- Int'l Joint Research
[Presentation] End-to-end Speech Recognition with Local Monotonic Attention2017
- Author(s)
  A. Tjandra, S. Sakti, S. Nakamura
- Organizer
  NIPS Workshop
- Int'l Joint Research
[Presentation] 日本語インクリメンタル音声合成システム実装のための言語特徴の検討2017
- Author(s)
  柳田智也, S. Sakti, 中村哲
- Organizer
  情報処理学会音声言語情報処理研究会
[Presentation] テンソルトレイン分解によるEnd-to-End自動音声認識モデルの圧縮2017
- Author(s)
  森巧磨. Andros Tjandra, Sakriani Sakti, 中村哲
- Organizer
  情報処理学会自然言語処理研究会
[Presentation] Tracking Liking State in Brain Activity while Watching Multiple Movies2017
- Author(s)
  Naoto Terasawa, Hiroki Tanaka, Sakriani Sakti, Satoshi Nakamura
- Organizer
  19th ACM International Conference on Multimodal Interaction (ICMI'17)
- Int'l Joint Research
[Presentation] EEG-based Emotional State Tracking during Watching Movie considering Self-Assessment Manikin2017
- Author(s)
  Naoto Terasawa, Hiroki Tanaka, Sakriani Sakti, Satoshi Nakamura
- Organizer
  39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC2017)
- Int'l Joint Research
[Presentation] Creation of a Multi-paraphrase Corpus based on Various Elementary Operations2017
- Author(s)
  Johanes Effendi, Sakriani Sakti, Satoshi Nakamura
- Organizer
  Oriental COCOSDA 2017
- Int'l Joint Research
[Presentation] 音声文聴取時における意味違反が生じた際の脳波自動判別2017
- Author(s)
  田中宏季, 渡部宏樹, 真木勇人, Sakriani Sakti, 中村哲
- Organizer
  電子情報通信学会技術研究報告ヒューマン情報処理研究会 (HIP)
[Presentation] Knowledge Distillation from a Multi-scale VGG Ensemble for Acoustic Modeling2017
- Author(s)
  Michael Heck, Masayuki Suzuki, Takashi Fukuda, Gakuto Kurata and Satoshi Nakamura
- Organizer
  日本音響学会2017年秋季研究発表会
[Presentation] Dialogue Modeling for Eliciting Positive Emotion2017
- Author(s)
  Nurul Lubis, Sakriani Sakti, Koichiro Yoshino and Satoshi Nakamura
- Organizer
  日本音響学会2017年秋季研究発表会
[Presentation] Joint Translation of Words and Emphasis in Speech-to-Speech Translation using Sequence-to-Sequence Models2017
- Author(s)
  Quoc Truong Do, Sakriani Sakti and Satoshi Nakamura
- Organizer
  日本音響学会2017年秋季研究発表会
[Presentation] カリキュラムラーニングを用いた日英直接翻訳システムの提案2017
- Author(s)
  叶高朋, Sakriani Sakti, 中村哲
- Organizer
  日本音響学会2017年秋季研究発表会
[Presentation] Tensor Train based RNN Compression for Polyphonic Music Modelling2017
- Author(s)
  Andros Tjandra, Sakriani Sakti and Satoshi Nakamura
- Organizer
  日本音響学会2017年秋季研究発表会
[Presentation] 音声翻訳研究のこれから2017
- Author(s)
  中村哲
- Organizer
  日本音響学会2017年秋季研究発表会
- Invited
[Presentation] Cross-domain speech recognition using nonparallel corpora with cycle-consistent adversarial networks.2017
- Author(s)
  M.Mimura, S.Sakai, and T.Kawahara
- Organizer
  IEEE Workshop Automatic Speech Recognition & Understanding (ASRU)
- Int'l Joint Research
[Presentation] Automatic meeting transcription system for the Japanese Parliament (Diet).2017
- Author(s)
  T.Kawahara
- Organizer
  APSIPA ASC
- Int'l Joint Research / Invited
[Presentation] Modeling difficulties of second language learners using speech technology.2017
- Author(s)
  T.Kawahara
- Organizer
  Seoul International Conference on Speech Sciences (SICSS)
- Int'l Joint Research / Invited
[Presentation] Semi-blind speech enhancement based on recurrent neural network for source separation and dereverberation.2017
- Author(s)
  M.Wake, Y.Bando, M.Mimura, K.Itoyama, K.Yoshii, and T.Kawahara.
- Organizer
  IEEE Machine Learning for Signal Processing Workshop (MLSP)
- Int'l Joint Research
[Presentation] Combined multi-channel NMF-based robust beamforming for noisy speech recognition.2017
- Author(s)
  M.Mimura, Y.Bando, K.Shimada, S.Sakai, K.Yoshii, and T.Kawahara.
- Organizer
  INTERSPEECH
- Int'l Joint Research
[Presentation] Social signal detection in spontaneous dialogue using bidirectional LSTM-CTC.2017
- Author(s)
  H.Inaguma, K.Inoue, M.Mimura, and T.Kawahara.
- Organizer
  INTERSPEECH
- Int'l Joint Research
[Presentation] 雑音環境下音声認識のための多チャネル非負値行列因子分解に基づく教師なしビームフォーマ.2017
- Author(s)
  島田一希, 坂東宜昭, 三村正人, 糸山克寿, 吉井和佳, 河原達也.
- Organizer
  電子情報通信学会SP
[Presentation] 再帰型ニューラルネットワークを用いたセミブラインド音声分離・強調.2017
- Author(s)
  和気雅弥, 坂東宜昭, 三村正人, 糸山克寿, 吉井和佳, 河原達也.
- Organizer
  電子情報通信学会SP
[Presentation] 深層生成モデルを事前分布に用いた教師なし音声強調.2017
- Author(s)
  坂東宜昭, 三村正人, 糸山克寿, 吉井和佳, 河原達也.
- Organizer
  電子情報通信学会SP
[Presentation] End-to-endモデルによるsocial signals検出および音声認識との統合.2017
- Author(s)
  稲熊寛文, 井上昂治, 三村正人, 河原達也.
- Organizer
  情報処理学会SIG-SLP

2017 Fiscal Year Annual Research Report

Next generation speech translation research

Principal Investigator

中村 哲 奈良先端科学技術大学院大学, データ駆動型サイエンス創造センター, 教授 (30263429)

Current Status of Research Progress

Reason

Research Products

[Journal Article] Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks2018

Author(s)

Journal Title

DOI

[Journal Article] Preserving Word-Level Emphasis in Speech-to-Speech Translation2017

Author(s)

Journal Title

DOI

[Journal Article] Detecting Dementia through Interactive Computer Avatars2017

Author(s)

Journal Title

DOI

[Journal Article] Articulatory modeling for pronunciation error detection without non-native training data based on DNN transfer learning.2017

Author(s)

Journal Title

DOI

[Presentation] 独立深層学習行列分析に基づく多チャネル音源分離2018

Author(s)

Organizer

[Presentation] Detecting Suppression of Negative Emotion by Time Series Change of Cerebral Blood Flow using fNIRS2018

Author(s)

Organizer

[Presentation] Distilling Knowledge from a Multi-scale deep CNN Ensemble for Robust and Light-weight Acoustic Modeling2018

Author(s)

Organizer

[Presentation] 音声認識単語仮説の曖昧性を考慮する ニューラル機械翻訳2018

Author(s)

Organizer

[Presentation] 原言語側の欠落を考慮したMulti-Source NMT2018

Author(s)

Organizer

[Presentation] エージェントによる非定型質問への応答からの認知症検出2018

Author(s)

Organizer

[Presentation] EEGを用いた合成音声に対する体感品質予想2018

Author(s)

Organizer

[Presentation] 電極配置のグラフ構造を利用したテンソル分解による単一試行EEG解析2018

Author(s)

Organizer

[Presentation] 生体信号からの感情コンピューティングと自閉症支援2018

Author(s)

Organizer

[Presentation] マルチチャネル非負値行列因子分解に基づく ビームフォーミングを用いた雑音環境下音声認識.2018

Author(s)

Organizer

[Presentation] CTCによる文字単位のモデルを併用したattentionによる単語単位の end-to-end音声認識.2018

Author(s)

Organizer

[Presentation] 独立深層学習行列分析に基づく多チャネル音源分離の実験的評価2018

Author(s)

Organizer

[Presentation] Development of NU voice conversion system 20182018

Author(s)

Organizer

[Presentation] Development of NU non-parallel voice conversion system 20182018

Author(s)

Organizer

[Presentation] Structured-Based Curriculum Learning for End-to-End English-Japanese Speech Translation2017

Author(s)

Organizer

[Presentation] Toward Expressive Speech Translation: A Unified Seq-to-Seq LSTMs Approach for Translating Words and Emphasis2017

Author(s)

Organizer

[Presentation] Subject-independent Classification of Japanese Spoken Sentences by Multiple Frequency Bands Phase Pattern of EEG Response during Speech Perception2017

Author(s)

Organizer

[Presentation] Ensembles of Multi-scale VGG Acoustic Models2017

Author(s)

Organizer

[Presentation] Recognizing Emotionally Coloured Dialogue Speech using Speaker-Adapted DNN-CNN Bottleneck Features2017

Author(s)

Organizer

中村哲奈良先端科学技術大学院大学, データ駆動型サイエンス創造センター, 教授 (30263429)

[Presentation] 音声認識単語仮説の曖昧性を考慮するニューラル機械翻訳2018

[Presentation] マルチチャネル非負値行列因子分解に基づくビームフォーミングを用いた雑音環境下音声認識.2018