Noise-robust speech recognition and spoken dialog system for service robots

Research Project

Project/Area Number	19K24343
Research Category	Grant-in-Aid for Research Activity Start-up
Allocation Type	Multi-year Fund
Review Section	1001:Information science, computer engineering, and related fields
Research Institution	Kobe University
Principal Investigator	Takashima Ryoichi 神戸大学, 都市安全研究センター, 准教授 (50846102)
Project Period (FY)	2019-08-30 – 2021-03-31
Project Status	Completed (Fiscal Year 2020)
Budget Amount *help	¥2,860,000 (Direct Cost: ¥2,200,000、Indirect Cost: ¥660,000) Fiscal Year 2020: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000) Fiscal Year 2019: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Keywords	音声認識 / 音声対話 / ニューラルネットワーク / 機械学習
Outline of Research at the Start	近年の労働力不足の問題解決のため、サービスロボットに対するニーズが高まっている。本研究では、サービスロボットとの円滑なインタラクションを目的とした、音声対話の研究を行う。実用シーンでは雑音の大きい音声が入力されやすく、音声認識が誤りやすい状況下で適切な応答を出力する必要がある。従来では、この問題に対して雑音除去、音声認識、対話技術が個別に研究されており、必ずしも対話成功という目的に対して全体最適化がされていない。本研究では、音声入力から対話までの全モジュールを、対話成功の目的から全体最適化することで性能向上を目指すとともに、人間の聴覚から対話までの仕組みを機械学習の観点で理解することを目指す。
Outline of Final Research Achievements	The final of this research is to construct an End-to-End model which handles both speech recognition and dialogue system. In the field of speech dialogue system, the conventional system independently optimizes speech recognition module and dialogue module. However, the training of End-to-End model requires huge training data; therefore, the technique to train models on limited training data is important. For this reason, in this research, we propose training techniques using multi-step transfer learning, self-supervised learning, and external knowledge, and confirm that our proposed method can training models showing better performance than conventional methods.
Academic Significance and Societal Importance of the Research Achievements	近年の労働力不足の問題解決のため、サービスロボットに対するニーズが高まっている。音声によるロボットとの対話はユーザにとって馴染みやすいが、高雑音環境といった音声認識が困難な状況では期待した対話性能が得られない。従来、このような問題に対して音声認識、対話技術が個別に最適化される形で研究されており、必ずしも音声対話成功という最終目的に対して最適化がされていなかった。これらのモジュールを一本化して全体最適化が行えればさらに性能向上が見込まれるが、これには膨大な学習データが必要である。本研究の成果は、限られた学習データで安定してモデルを学習する方式であり、前述の全体最適化に利用可能と期待している。

Report

(3 results)

2020 Annual Research Report Final Research Report ( PDF )
2019 Research-status Report

Research Products
(22 results)

All 2021 2020 2019 Other

All Journal Article (1 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 1 results, Open Access: 1 results) Presentation (20 results) (of which Int'l Joint Research: 5 results) Remarks (1 results)

[Journal Article] Knowledge transferability between the speech data of persons with dysarthria speaking different languages for dysarthric speech recognition2019
- Author(s)
  Yuki Takashima, Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki
- Journal Title
  
  IEEE Access
  
  Volume: 7 Pages: 164320-164326
- DOI
  10.1109/access.2019.2951856
- NAID
  120006818768
- Related Report
  2019 Research-status Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] 口唇口蓋裂者の音声認識のためのデータ拡張方式の検討2021
- Author(s)
  冨士原健斗，高島遼一，杉山千尋，田中信和，野原幹司，野崎一徳，滝口哲也
- Organizer
  日本音響学会2021年春季研究発表会講演論文集
- Related Report
  2020 Annual Research Report
[Presentation] Dysarthric Speech Conversion by Learning Disentangled Representations with Non-parallel Data2021
- Author(s)
  陳訓泉，陳金輝，高島遼一，滝口哲也
- Organizer
  日本音響学会2021年春季研究発表会講演論文集
- Related Report
  2020 Annual Research Report
[Presentation] 自己教師あり学習によるラベル無し自由発話を用いた構音障害者音声認識2021
- Author(s)
  澤佑哉，冨士原健斗，相原龍，高島遼一，滝口哲也，本山信明
- Organizer
  日本音響学会2021年春季研究発表会講演論文集
- Related Report
  2020 Annual Research Report
[Presentation] Dysarthric Speech Recognition Based on Deep Metric Learning2020
- Author(s)
  Yuki Takashima, Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki
- Organizer
  Interspeech
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] Convolutional neural networks Memory Optimization Inference with Splitting Image2020
- Author(s)
  Weihao Zhuang, Tristan Hascoet, Ryoichi Takashima, Tetsuya Takiguchi and Yasuo Ariki
- Organizer
  IEEE Global Conference on Consumer Electronics (GCCE)
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] An Investigation of End-to-End Speech Recognition Using Model Adaptation for Dysarthric Speakers2020
- Author(s)
  Yuya Sawa, Ryoichi Takashima, Tetsuya Takiguchi
- Organizer
  IEEE Global Conference on Consumer Electronics (GCCE)
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] 構音障害者音声認識における発話辞書適応の検討2020
- Author(s)
  澤佑哉, 高島遼一, 滝口哲也, 有木康雄
- Organizer
  日本音響学会2020年秋季研究発表会講演論文集
- Related Report
  2020 Annual Research Report
[Presentation] Two-step acoustic model adaptation for dysarthric speech recognition2020
- Author(s)
  Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki
- Organizer
  2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Related Report
  2019 Research-status Report
- Int'l Joint Research
[Presentation] Optimizing the Computational Efficiency of 3D Segmentation Models for Connectomics2020
- Author(s)
  Weihao Zhuang, Hascoet Tristan, Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki
- Organizer
  The 26th International Workshop on Frontiers of Computer Vision (IW-FCV 2020)
- Related Report
  2019 Research-status Report
- Int'l Joint Research
[Presentation] Hybrid CTC/attentionモデルを用いた構音障害者音声認識の検討2020
- Author(s)
  澤佑哉, 高島遼一, 滝口哲也, 有木康雄
- Organizer
  日本音響学会2020年春季研究発表会講演論文集
- Related Report
  2019 Research-status Report
[Presentation] 少量データを用いた構音障害者音声合成の健常者モデルによる明瞭性改善2020
- Author(s)
  南坂竜翔, 高島遼一, 滝口哲也
- Organizer
  日本音響学会2020年春季研究発表会講演論文集
- Related Report
  2019 Research-status Report
[Presentation] Differentiable Programmingを用いた強化学習の最適化2020
- Author(s)
  黄伊莎, Tristan Hascoet, 高島遼一, 滝口哲也, 有木康雄
- Organizer
  情報処理学会第82回全国大会講演論文集
- Related Report
  2019 Research-status Report
[Presentation] ニューロンセグメンテーションにおけるマルチドメイン学習による汎化性能の改善2020
- Author(s)
  長谷川貴大, Tristan Hascoet, 高島遼一, 滝口哲也, 有木康雄
- Organizer
  情報処理学会第82回全国大会講演論文集
- Related Report
  2019 Research-status Report
[Presentation] 構音障害者音声認識のための健常者音声及び他言語障害者音声を用いた転移学習2019
- Author(s)
  高島悠樹, 高島遼一, 滝口哲也, 有木康雄
- Organizer
  電子情報通信学会技術研究報告
- Related Report
  2019 Research-status Report
[Presentation] 外部知識を用いた雑談対話システムの汎化性能向上の検討2019
- Author(s)
  麻生大聖, 高島遼一, 滝口哲也, 有木康雄
- Organizer
  日本音響学会2019年秋季研究発表会講演論文集
- Related Report
  2019 Research-status Report
[Presentation] Speech-to-Speech Translation using Dual Learning and Prosody Conversion2019
- Author(s)
  Zhaojie Luo, Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki
- Organizer
  日本音響学会2019年秋季研究発表会講演論文集
- Related Report
  2019 Research-status Report
[Presentation] 構音障害者の少量データを用いた深層学習による音声合成の検討2019
- Author(s)
  南坂竜翔, 高島遼一, 滝口哲也, 有木康雄
- Organizer
  日本音響学会2019年秋季研究発表会講演論文集
- Related Report
  2019 Research-status Report
[Presentation] 構音障害者を対象とした日本語大語彙連続音声認識の検討2019
- Author(s)
  高島遼一, 滝口哲也, 有木康雄
- Organizer
  日本音響学会2019年秋季研究発表会講演論文集
- Related Report
  2019 Research-status Report
[Presentation] WordNetを用いた雑談対話システムの汎化性能の向上2019
- Author(s)
  麻生大聖, 高島遼一, 滝口哲也, 有木康雄
- Organizer
  電子情報通信学会技術研究報告
- Related Report
  2019 Research-status Report
[Presentation] Reduce GPU Memory Usage of Training Neural Network by CPU Offloading2019
- Author(s)
  Weihao Zhuang, Tristan Hascoet, Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki
- Organizer
  第22回画像の認識・理解シンポジウム
- Related Report
  2019 Research-status Report
[Remarks] 研究者webページ
- URL
  http://www.me.cs.scitec.kobe-u.ac.jp/~rtakashima/
- Related Report
  2020 Annual Research Report 2019 Research-status Report

Noise-robust speech recognition and spoken dialog system for service robots

Principal Investigator

Takashima Ryoichi 神戸大学, 都市安全研究センター, 准教授 (50846102)

¥2,860,000 (Direct Cost: ¥2,200,000、Indirect Cost: ¥660,000)

Report

Research Products

[Journal Article] Knowledge transferability between the speech data of persons with dysarthria speaking different languages for dysarthric speech recognition2019

Author(s)

Journal Title

DOI

NAID

Related Report

[Presentation] 口唇口蓋裂者の音声認識のためのデータ拡張方式の検討2021

Author(s)

Organizer

Related Report

[Presentation] Dysarthric Speech Conversion by Learning Disentangled Representations with Non-parallel Data2021

Author(s)

Organizer

Related Report

[Presentation] 自己教師あり学習によるラベル無し自由発話を用いた構音障害者音声認識2021

Author(s)

Organizer

Related Report

[Presentation] Dysarthric Speech Recognition Based on Deep Metric Learning2020

Author(s)

Organizer

Related Report

[Presentation] Convolutional neural networks Memory Optimization Inference with Splitting Image2020

Author(s)

Organizer

Related Report

[Presentation] An Investigation of End-to-End Speech Recognition Using Model Adaptation for Dysarthric Speakers2020

Author(s)

Organizer

Related Report

[Presentation] 構音障害者音声認識における発話辞書適応の検討2020

Author(s)

Organizer

Related Report

[Presentation] Two-step acoustic model adaptation for dysarthric speech recognition2020

Author(s)

Organizer

Related Report

[Presentation] Optimizing the Computational Efficiency of 3D Segmentation Models for Connectomics2020

Author(s)

Organizer

Related Report

[Presentation] Hybrid CTC/attentionモデルを用いた構音障害者音声認識の検討2020

Author(s)

Organizer

Related Report

[Presentation] 少量データを用いた構音障害者音声合成の健常者モデルによる明瞭性改善2020

Author(s)

Organizer

Related Report

[Presentation] Differentiable Programmingを用いた強化学習の最適化2020

Author(s)

Organizer

Related Report

[Presentation] ニューロンセグメンテーションにおけるマルチドメイン学習による汎化性能の改善2020

Author(s)

Organizer

Related Report

[Presentation] 構音障害者音声認識のための健常者音声及び他言語障害者音声を用いた転移学習2019

Author(s)

Organizer

Related Report

[Presentation] 外部知識を用いた雑談対話システムの汎化性能向上の検討2019

Author(s)

Organizer

Related Report

[Presentation] Speech-to-Speech Translation using Dual Learning and Prosody Conversion2019

Author(s)

Organizer

Related Report

[Presentation] 構音障害者の少量データを用いた深層学習による音声合成の検討2019

Author(s)

Organizer

Related Report