2021 Fiscal Year Annual Research Report

A Study on Multi-modal Automatic Simultaneous Interpretation System and Evaluation Method

Research Project

Project/Area Number	21H05054
Research Institution	Nara Institute of Science and Technology
Principal Investigator	中村哲奈良先端科学技術大学院大学, データ駆動型サイエンス創造センター, 教授 (30263429)
Co-Investigator(Kenkyū-buntansha)	河原達也京都大学, 情報学研究科, 教授 (00234104) 戸田智基名古屋大学, 情報基盤センター, 教授 (90403328) 森島繁生早稲田大学, 理工学術院, 教授 (10200411) 猿渡洋東京大学, 大学院情報理工学系研究科, 教授 (30324974) 渡辺太郎奈良先端科学技術大学院大学, 先端科学技術研究科, 教授 (90395038) 松下佳世立教大学, 異文化コミュニケーション学部, 准教授 (90746679) 山田優立教大学, 異文化コミュニケーション学部, 教授 (70645001) 須藤克仁奈良先端科学技術大学院大学, 先端科学技術研究科, 准教授 (00396152) SAKTI Sakriani 奈良先端科学技術大学院大学, 先端科学技術研究科, 特任准教授 (00395005) 高道慎之介東京大学, 大学院情報理工学系研究科, 助教 (90784330) 田中宏季奈良先端科学技術大学院大学, 先端科学技術研究科, 助教 (10757834) 品川政太朗奈良先端科学技術大学院大学, 先端科学技術研究科, 助教 (70897454)
Project Period (FY)	2021-07-05 – 2026-03-31
Keywords	音声翻訳
Outline of Annual Research Achievements	課題1-A)強調を含んだ原音声をend-to-endで対象言語に変換する方式を検討し有効性を示した．形容詞以外の品詞の強調やフォーカスの抽出および翻訳法に関する検討を開始した．音声とビデオの身体的話者性変換の統合システムの改良を実施した．任意の話者および言語にも対応可能な深層ネットワーク統合型音声変換法を提案し，その有効性を示した．任意のフォトリアルな発話表情合成を実現するため，Nerfに基づく従来の3次元顔モデルベースとは異なる，3次元モデルの仲介や長時間レンダリングを必要としない輝度場の機械学習に基づくスーパーフォトリアル顔画像合成法に着手した．B)分野やキーワード等の情報を明に与える形での事前適応について方式調査を行った．マルチモーダル事前学習モデルを用いた予備検討を行い，本タスクに最適化する際の学習効率が課題であることが明らかになった．C)漸進的音声合成の品質と遅延の改善を優先的に実施した．大きく語順の異なる同時音声翻訳に対応するため，構文情報を利用する方法，プレフィクスを利用する方法を提案し有効性を確認した．課題2-A)遠隔同時通訳において導入されている既存の通訳者支援システムを特定し分析した．B)現在使われている通訳品質評価法について，先行研究の検証や関連機関へのヒアリングを行い，主観評価ではなく客観的に測定可能なデータに基づく通訳品質評価法の確立に向けた作業に着手した．翻訳評価で用いられる枠組みであるMQMを用いた小規模な同時通訳評価アノテーションを実施した．C)同時通訳中における文単位のASSR反応による認知負荷の測定，認知負荷指数の関係の解析を実施した．課題3-A)原発話・通訳発話のアラインメントツールプロトタイプを構築し評価アノテーションに活用した．C)モジュールの再構築による統合システムの更新を行いIWSLT 2022の同時翻訳共通タスクに参加した．
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason ２０２２年度は，旧基盤Sプロジェクトから新基盤Sプロジェクトが重複し，移行を行う必要があったため多少の混乱が生じたが，旧基盤Ｓプロジェクトにおける成果を統合した音声同時通訳システムの性能と遅延評価を行い，それらを改良して，国際会議のshared taskに参加することが出来た．課題１，２，３についても，これまでの研究をさらに改良するための研究に加えて，パラ言語，事前情報の利用による同時音声通訳の研究，さらに，通訳者，通訳研究者と共に同時通訳の評価法の研究を開始した．
Strategy for Future Research Activity	課題1-A)種々の品詞に強調，フォーカスを有する原音声をend-to-endで対象言語に変換する方式についてさらに研究を進める．音声とビデオの身体的話者性変換の統合システムの改良を継続する．任意の話者および言語にも対応可能な深層ネットワーク統合型音声変換法，発話表情合成のモデルの研究をさらに進める．B)分野やキーワード等の情報を明に与える形での事前適応について，事前情報のアノテーションの実施，マルチモーダル事前学習モデルを用いた検討を進める．C)漸進的音声合成の品質と遅延の改善を進める．遅延圧縮のため合成音声の要約などについても研究を進める．課題2-A)遠隔同時通訳において導入されている既存の通訳者支援システムのデータから通訳方略の分析を続ける．B)客観的に測定可能なデータに基づく通訳品質評価法の確立に向けた検討を進める．翻訳評価で用いられる枠組みであるMQMを用いた同時通訳評価アノテーションを実施する．C)同時通訳中におけるチャンク単位のASSR反応による認知負荷の測定，認知負荷指数の関係の解析を実施する．課題3-A)原発話・通訳発話のアラインメントツールプロトタイプを構築しアライメントコーパスを構築する． C)モジュールの再構築による統合システムの改良を続ける．

Research Products
(48 results)

All 2022 2021

All Journal Article (8 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 8 results, Open Access: 8 results) Presentation (39 results) (of which Int'l Joint Research: 23 results, Invited: 3 results) Patent(Industrial Property Rights) (1 results)

[Journal Article] Modeling Unsupervised Empirical Adaptation by DPGMM and DPGMM-RNN Hybrid Model to Extract Perceptual Features for Low-resource ASR2022
- Author(s)
  Bin Wu, Sakriani Sakti, Jinsong Zhang, and Satoshi Nakamura
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: Vol. 30 Pages: 901-916
- DOI
  10.1109/TASLP.2022.3150220
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] On Knowledge Distillation for Translating Erroneous Speech Transcriptions2022
- Author(s)
  Ryo Fukuda, Katsuhito Sudoh, and Satoshi Nakamura
- Journal Title
  
  自然言語処理
  
  Volume: 2-29 Pages: -
- Peer Reviewed / Open Access
[Journal Article] Alignment knowledge distillation for online streaming attention-based speech recognition2021
- Author(s)
  H.Inaguma and T.Kawahara
- Journal Title
  
  IEEE/ACM Trans. Audio, Speech & Language Process
  
  Volume: Vol.29 Pages: -
- DOI
  10.1109/TASLP.2021.3133217
- Peer Reviewed / Open Access
[Journal Article] Neural Incremental Speech Recognition Toward Real-time Machine Speech Translation2021
- Author(s)
  Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura,
- Journal Title
  
  IEICE
  
  Volume: E104-D,No.12 Pages: -
- DOI
  10.1587/transinf.2021EDP7014
- Peer Reviewed / Open Access
[Journal Article] Audio-Oriented Video Interpolation Using Key Pose2021
- Author(s)
  Takayuki Nakatsuka, Yukitaka Tsuchiya, Masatoshi Hamanaka and Shigeo Morishima
- Journal Title
  
  International Journal of Pattern Recognition and Artificial Intelligence
  
  Volume: Vol. 35, No. 16 Pages: -
- DOI
  10.1142/S0218001421600168
- Peer Reviewed / Open Access
[Journal Article] Synthesizing waveform sequence-to-sequence to augment training data for sequence-to-sequence speech recognition2021
- Author(s)
  S.Ueno, M.Mimura, S.Sakai, and T.Kawahara
- Journal Title
  
  Acoustical Science & Technology
  
  Volume: Vol. 42 Pages: pp. 333--343
- DOI
  10.1250/ast.42.333
- Peer Reviewed / Open Access
[Journal Article] Length-constrained Neural Machine Translation using Length Prediction and Perturbation into Length-aware Positional Encoding2021
- Author(s)
  Yui Oka, Katsuhito Sudoh, Satoshi Nakamura
- Journal Title
  
  自然言語処理
  
  Volume: No. 28, Vol. 3 Pages: 778-801
- DOI
  10.5715/jnlp.28.778
- Peer Reviewed / Open Access
[Journal Article] End-to-End Image-to-Speech Generation for Untranscribed Unknown Languages2021
- Author(s)
  Johanes Effendi, Sakriani Sakti, Satoshi Nakamura
- Journal Title
  
  IEEE Access
  
  Volume: - Pages: 55144-55154
- DOI
  10.1109/ACCESS.2021.3071541
- Peer Reviewed / Open Access
[Presentation] 画像文字からの音声合成2022
- Author(s)
  中野嘉文，佐伯高明，高道慎之介，須藤克仁，猿渡洋
- Organizer
  言語処理学会2022年年次大会
[Presentation] JTubeSpeech: 音声認識と話者照合のためにYouTubeから構築される日本語音声コーパス2022
- Author(s)
  高道慎之介，Kurzinger Ludwig，佐伯高明，塩田さやか，渡部晋治
- Organizer
  言語処理学会2022年年次大会
[Presentation] IWSLT Evaluation Campaign: Simultaneous Speech Translation2022
- Author(s)
  須藤克仁
- Organizer
  情報処理学会第141回音声言語情報処理研究会
- Int'l Joint Research / Invited
[Presentation] Machine Speech Chain による音声聴取生成システムのモデル化の試み2022
- Author(s)
  中村哲
- Organizer
  日本音響学会2022年春季研究発表会
- Invited
[Presentation] 音声機械翻訳のための音声翻訳コーパスに基づく発話分割2022
- Author(s)
  福田りょう, 須藤克仁, 中村哲
- Organizer
  言語処理学会第28回年次大会
[Presentation] 構文ラベル予測による同時ニューラル機械翻訳2022
- Author(s)
  Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura
- Organizer
  言語処理学会第28回年次大会
[Presentation] Masked Language Model による系列確率に基づく文法誤り検出2022
- Author(s)
  土肥康輔，須藤克仁，中村哲
- Organizer
  言語処理学会第28回年次大会
[Presentation] 音声認識出力の曖昧性に頑健な音声翻訳のための音声認識の精度ごとの性能比較2022
- Author(s)
  胡尤佳，須藤克仁，Sakriani Sakti，中村哲
- Organizer
  言語処理学会第28回年次大会
[Presentation] Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network2021
- Author(s)
  Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari
- Organizer
  Proc. ASRU
- Int'l Joint Research
[Presentation] An end-to-end model from speech to clean transcript for parliamentary meetings2021
- Author(s)
  M.Mimura, S.Sakai, and T.Kawahara
- Organizer
  In Proc. APSIPA ASC
- Int'l Joint Research
[Presentation] VAD-free streaming hybrid CTC/Attention ASR for unsegmented recording2021
- Author(s)
  H.Inaguma, M.Mimura, and T.Kawahara
- Organizer
  In Proc. INTERSPEECH
- Int'l Joint Research
[Presentation] StableEmit: Selection probability discount for reducing emission latency of streaming monotonic attention ASR2021
- Author(s)
  H.Inaguma, M.Mimura, and T.Kawahara
- Organizer
  In Proc. INTERSPEECH
- Int'l Joint Research
[Presentation] USING LOCAL PHRASE DEPENDENCY STRUCTURE INFORMATION IN NEURAL SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS2021
- Author(s)
  Nobuyoshi Kaiki, Sakriani Sakti and Satoshi Nakamura
- Organizer
  O-COCOSDA 2021
- Int'l Joint Research
[Presentation] Unsupervised Neural-Based Graph Clustering for Variable-Length Speech Representation Discovery of Zero-Resource Languages2021
- Author(s)
  Shun Takahashi, Sakriani Sakti, Satoshi Nakamura
- Organizer
  Proc. Interspeech 2021
- Int'l Joint Research
[Presentation] Dynamically Adaptive Machine Speech Chain Inference for TTS in Noisy Environment: Listen and Speak Louder2021
- Author(s)
  Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura
- Organizer
  Proc. Interspeech 2021
- Int'l Joint Research
[Presentation] Weakly-supervised Speech-to-text Mapping with Visually Connected Non-parallel Speech-text Data using Cyclic Partially-aligned Transformer2021
- Author(s)
  Johanes Effendi, Sakriani Sakti, Satoshi Nakamura
- Organizer
  Proc. Interspeech 2021
- Int'l Joint Research
[Presentation] Transcribing Paralinguistic Acoustic Cues to Target Language Text in Transformer-Based Speech-to-Text Translation2021
- Author(s)
  Hirotaka Tokuyama, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura
- Organizer
  Proc. Interspeech 2021
- Int'l Joint Research
[Presentation] Large-Scale English-Japanese Simultaneous Interpretation Corpus: Construction and Analyses with Sentence-Aligned Data2021
- Author(s)
  Kosuke Doi, Katsuhito Sudoh, Satoshi Nakamura
- Organizer
  Proc. IWSLT
- Int'l Joint Research
[Presentation] Simultaneous Speech-to-speech Translation System with Transformer-based Incremental ASR, MT, and TTS2021
- Author(s)
  Ryo Fukuda, Sashi Novitasari, Yui Oka, Yasumasa Kano, Yuki Yano, Yuka Ko, Hirotaka Tokuyama, Kosuke Doi, Tomoya Yanagita, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura
- Organizer
  Proc. Oriental COCOSDA, 2021
- Int'l Joint Research
[Presentation] ASR Posterior-Based Loss for Multi-Task End-to-End Speech Translation2021
- Author(s)
  Yuka Ko, Katsuhito Sudoh, Sakriani Sakti, Satoshi Nakamura
- Organizer
  Proc. Interspeech
- Int'l Joint Research
[Presentation] Multichannel Audio Source Separation with Independent Deeply Learned Matrix Analysis Using Product of Source Models2021
- Author(s)
  Takuya Hasumi, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari, Daichi Kitamura, Yu Takahashi, Kazunobu Kondo
- Organizer
  Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2021 (APSIPA ASC 2021)
- Int'l Joint Research
[Presentation] 多変量一般化Gauss分布に基づくランク制約付き空間共分散行列推定法における雑音欠落ランク空間基底推定2021
- Author(s)
  近藤祐斗，久保優騎，高宗典玄，北村大地，猿渡洋
- Organizer
  日本音響学会2021秋季研究発表会
[Presentation] Product of Priors型確率分布を導入した音源モデルに基づく独立深層学習行列分析による多チャネル音源分離2021
- Author(s)
  蓮実拓也，中村友彦，高宗典玄，猿渡洋，北村大地，高橋祐，近藤多伸
- Organizer
  日本音響学会2021秋季研究発表会
[Presentation] ヘビーテイル生成モデルに基づく独立深層学習テンソル分析2021
- Author(s)
  成澤直輝，池下林太郎，高宗典玄，北村大地，中村友彦，猿渡洋，中谷智広
- Organizer
  日本音響学会2021秋季研究発表会
[Presentation] 独立深層学習行列分析を用いたランク制約付き空間共分散行列推定による音声強調2021
- Author(s)
  三澤颯大，中村友彦，高宗典玄，北村大地，猿渡洋
- Organizer
  日本音響学会2021秋季研究発表会
[Presentation] ドメイン適応と話者一致損失を用いた話者適応によるクロスリンガル音声合成2021
- Author(s)
  辛徳泰，齋藤佑樹，高道慎之介，郡山知樹，猿渡洋
- Organizer
  日本音響学会2021秋季研究発表会
[Presentation] 大規模言語モデルの知識蒸留によるコンテキスト推定モデルを用いた低遅延逐次音声合成2021
- Author(s)
  佐伯高明，高道慎之介，猿渡洋
- Organizer
  日本音響学会2021秋季研究発表会
[Presentation] ASR rescoring and confidence estimation with ELECTRA2021
- Author(s)
  H.Futami, H.Inaguma, M.Mimura, S.Sakai, and T.Kawahara
- Organizer
  IEEE Workshop Automatic Speech Recognition & Understanding (ASRU)
- Int'l Joint Research
[Presentation] Data augmentation for ASR using TTS via a discrete representation2021
- Author(s)
  S.Ueno, M.Mimura, S.Sakai, and T.Kawahara
- Organizer
  IEEE Workshop Automatic Speech Recognition & Understanding (ASRU)
- Int'l Joint Research
[Presentation] Light Source Selection in Primary Sample Space Neural Photon Sampling2021
- Author(s)
  Yuta tsuji, Tatsuya Yatagawa, Shigeo Morishima
- Organizer
  The 14th ACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia 2021
- Int'l Joint Research
[Presentation] Low-latency real-time non-parallel voice conversion based on cyclic variational autoencoder and multiband WaveRNN with data-driven linear prediction2021
- Author(s)
  Patrick Lumban Tobing, Tomoki Toda
- Organizer
  11th ISCA Speech Synthesis Workshop (SSW)
- Int'l Joint Research
[Presentation] High-fidelity and low-latency universal neural vocoder based on multiband WaveRNN with data-driven linear prediction for discrete waveform modeling2021
- Author(s)
  Patrick Lumban Tobing, Tomoki Toda
- Organizer
  INTERSPEECH
- Int'l Joint Research
[Presentation] Relational data selection for data augmentation of speaker-dependent multi-band MelGAN vocoder2021
- Author(s)
  Yi-Chiao Wu, Cheng-Hung Hu, Hung-Shin Lee, Yu-Huai Peng, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda
- Organizer
  INTERSPEECH
- Int'l Joint Research
[Presentation] NAIST English-to-Japanese Simultaneous Translation System for IWSLT 2021 Simultaneous Text-to-text Task2021
- Author(s)
  Ryo Fukuda, Yui Oka, Yasumasa Kano, Yuki Yano, Yuka Ko, Hirotaka Tokuyama, Kosuke Doi, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura
- Organizer
  the 18th International Conference on Spoken Language Translation (IWSLT 2021)
- Int'l Joint Research
[Presentation] On Knowledge Distillation for Translating Erroneous Speech Transcriptions2021
- Author(s)
  Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura
- Organizer
  the 18th International Conference on Spoken Language Translation
- Int'l Joint Research
[Presentation] Recent Advances in Speech Translation2021
- Author(s)
  Satoshi Nakamura, with Katsuhito Sudo, Sakriani Sakti, Ryo Fukuda, Sashi Novitasari, Tomoya Yanagita, Kosuke Doi, Yasumasa Kano, Yuki Yano, Hirotaka Tokuyama, Yui Oka
- Organizer
  AI Innovation Summit 2021
- Int'l Joint Research / Invited
[Presentation] Improving Intelligibility of Synthesized Speech in Noisy Condition with Dynamically Adaptive Machine Speech Chain2021
- Author(s)
  Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura
- Organizer
  SIG-SLP 2021
[Presentation] 局所的な句構造の情報を用いたニューラル音声合成2021
- Author(s)
  海木延佳, サクティサクリアニ, 中村哲
- Organizer
  音学シンポジウム2021
[Presentation] ゼロ資源状況におけるサブワード単位の獲得にむけてグラフニューラルネットワークを用いた手法2021
- Author(s)
  高橋舜、サクリアニサクティ、中村哲
- Organizer
  2021年度人工知能学会全国大会 (第35回)
[Patent(Industrial Property Rights)] 音声合成装置、音声合成方法及び音声合成プログラム2022
- Inventor(s)
  高道慎之介, 佐伯高明, 猿渡洋
- Industrial Property Rights Holder
  高道慎之介, 佐伯高明, 猿渡洋
- Industrial Property Rights Type
  特許
- Industrial Property Number
  特願2022-020534

2021 Fiscal Year Annual Research Report

A Study on Multi-modal Automatic Simultaneous Interpretation System and Evaluation Method

Principal Investigator

中村 哲 奈良先端科学技術大学院大学, データ駆動型サイエンス創造センター, 教授 (30263429)

Current Status of Research Progress

Reason

Research Products

[Journal Article] Modeling Unsupervised Empirical Adaptation by DPGMM and DPGMM-RNN Hybrid Model to Extract Perceptual Features for Low-resource ASR2022

Author(s)

Journal Title

DOI

[Journal Article] On Knowledge Distillation for Translating Erroneous Speech Transcriptions2022

Author(s)

Journal Title

[Journal Article] Alignment knowledge distillation for online streaming attention-based speech recognition2021

Author(s)

Journal Title

DOI

[Journal Article] Neural Incremental Speech Recognition Toward Real-time Machine Speech Translation2021

Author(s)

Journal Title

DOI

[Journal Article] Audio-Oriented Video Interpolation Using Key Pose2021

Author(s)

Journal Title

DOI

[Journal Article] Synthesizing waveform sequence-to-sequence to augment training data for sequence-to-sequence speech recognition2021

Author(s)

Journal Title

DOI

[Journal Article] Length-constrained Neural Machine Translation using Length Prediction and Perturbation into Length-aware Positional Encoding2021

Author(s)

Journal Title

DOI

[Journal Article] End-to-End Image-to-Speech Generation for Untranscribed Unknown Languages2021

Author(s)

Journal Title

DOI

[Presentation] 画像文字からの音声合成2022

Author(s)

Organizer

[Presentation] JTubeSpeech: 音声認識と話者照合のためにYouTubeから構築される日本語音声コーパス2022

Author(s)

Organizer

[Presentation] IWSLT Evaluation Campaign: Simultaneous Speech Translation2022

Author(s)

Organizer

[Presentation] Machine Speech Chain による 音声聴取生成システムのモデル化の試み2022

Author(s)

Organizer

[Presentation] 音声機械翻訳のための音声翻訳コーパスに基づく発話分割2022

Author(s)

Organizer

[Presentation] 構文ラベル予測による同時ニューラル機械翻訳2022

Author(s)

Organizer

[Presentation] Masked Language Model による系列確率に基づく文法誤り検出2022

Author(s)

Organizer

[Presentation] 音声認識出力の曖昧性に頑健な音声翻訳のための音声認識の精度ごとの性能比較2022

Author(s)

Organizer

[Presentation] Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network2021

Author(s)

Organizer

[Presentation] An end-to-end model from speech to clean transcript for parliamentary meetings2021

Author(s)

Organizer

[Presentation] VAD-free streaming hybrid CTC/Attention ASR for unsegmented recording2021

Author(s)

Organizer

[Presentation] StableEmit: Selection probability discount for reducing emission latency of streaming monotonic attention ASR2021

Author(s)

Organizer

[Presentation] USING LOCAL PHRASE DEPENDENCY STRUCTURE INFORMATION IN NEURAL SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS2021

Author(s)

Organizer

[Presentation] Unsupervised Neural-Based Graph Clustering for Variable-Length Speech Representation Discovery of Zero-Resource Languages2021

Author(s)

Organizer

中村哲奈良先端科学技術大学院大学, データ駆動型サイエンス創造センター, 教授 (30263429)

[Presentation] Machine Speech Chain による音声聴取生成システムのモデル化の試み2022

[Presentation] 局所的な句構造の情報を用いたニューラル音声合成2021