• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2023 Fiscal Year Annual Research Report

A Study on Multi-modal Automatic Simultaneous Interpretation System and Evaluation Method

Research Project

Project/Area Number 21H05054
Research InstitutionNara Institute of Science and Technology

Principal Investigator

中村 哲  奈良先端科学技術大学院大学, 先端科学技術研究科, 教授 (30263429)

Co-Investigator(Kenkyū-buntansha) 河原 達也  京都大学, 情報学研究科, 教授 (00234104)
戸田 智基  名古屋大学, 情報基盤センター, 教授 (90403328)
森島 繁生  早稲田大学, 理工学術院, 教授 (10200411)
猿渡 洋  東京大学, 大学院情報理工学系研究科, 教授 (30324974)
SAKTI Sakriani  北陸先端科学技術大学院大学, 先端科学技術研究科, 准教授 (00395005)
松下 佳世  立教大学, 異文化コミュニケーション学部, 教授 (90746679)
山田 優  立教大学, 異文化コミュニケーション学部, 教授 (70645001) [Withdrawn]
高道 慎之介  東京大学, 大学院情報理工学系研究科, 講師 (90784330)
渡辺 太郎  奈良先端科学技術大学院大学, 先端科学技術研究科, 教授 (90395038)
須藤 克仁  奈良先端科学技術大学院大学, 先端科学技術研究科, 准教授 (00396152)
田中 宏季  奈良先端科学技術大学院大学, 先端科学技術研究科, 助教 (10757834)
品川 政太朗  奈良先端科学技術大学院大学, 先端科学技術研究科, 客員助教 (70897454)
Project Period (FY) 2021-07-05 – 2026-03-31
Keywords音声翻訳
Outline of Annual Research Achievements

【課題1】多元同時通訳方式:A)「強調」に関しては,フォーカスに関して,音声の韻律と言語表現の最適組み合わせ出力に取り組んだ.パラ言語情報制御機能を備えた音声変換・合成技術に関する基礎検討を行った.また,豊かな音声表情翻訳手法については発話者の韻律同期もしくは感情表出時の顔動画の個性表現に関して検討を進め,動画生成時のキーフレーム補間時のアイデンティティ同期の方法について検討を進めた.B)字幕翻訳を例に,分野やキャラクタ等の情報を明確に与える形での事前適応を試みた.C) 通訳出力最適化については,Local Agreement法とAlignAtt法による通訳方略の検討および音声合成の言語処理部の逐次動作化を進めた.
【課題2】通訳品質の評価法とリアルタイム評価技術に関しては,A)「順送り」や「省略」などの分析をさらに進めた.また進的翻訳技術との連携により応用技術に落とし込み,通訳者の補助として有用な技術の切り出しの検討も進めた.B)通訳者が重視する観点の考慮,順送り訳の度合いの考慮などを含んだ自動通訳品質評価指標の検討を進めた.C)EEGを用いた認知負荷の高い構文の解析,文中の語順の異なる位置と認知負荷の関係,認知負荷を位相振幅カップリング(PAC)で分析する研究が進んだ.
【課題3】コーパス構築とシステムとしては,A)自動アライメントによる通訳対訳コーパスの増強と同時通訳システムへの活用,また,通訳品質評価への応用について検討した.B)多元パラ言語アノテーション付きコーパス50時間,事前情報50時間については方針の検討を行った.C)モジュールの統合,評価を行い,エコシステムの設計,実装については引き続きIWSLTの評価タスクに参加してシステムの性能改善を進める。

Current Status of Research Progress
Current Status of Research Progress

2: Research has progressed on the whole more than it was originally planned.

Reason

IWSLT評価タスクを目指した同時通訳システム試作とそれに伴う各モジュールの研究開発が順調に進んでいる。2022年度は、漸進的な音声認識、機械翻訳、音声合成を接続してシステムを構築したが、2023年度は多言語の事前学習モデル(音声モデル、翻訳モデル)をベースに改良を行い、入力言語の音声から直接対象言語のテキストへ変換し、それを逐次音声合成するシステムを構築した。評価についても、通訳者、同時通訳システムにおいて適用可能な自動評価システムができつつある。

Strategy for Future Research Activity

IWSLTの評価タスクに参加継続し、システムの高速化、性能改善を進めるとともに、研究用プロトタイプをさらに発展させて、実証実験可能なシステムを構築する。同時に、フォーカス、声質、発話表情を中心としたマルチモーダル翻訳システムと通訳の自動品質評価法を確立する。

  • Research Products

    (59 results)

All 2024 2023

All Journal Article (7 results) (of which Peer Reviewed: 7 results,  Open Access: 6 results) Presentation (51 results) (of which Int'l Joint Research: 37 results,  Invited: 2 results) Book (1 results)

  • [Journal Article] Emotion-controllable Speech Synthesis using Emotion Soft Label, Utterance-level Prosodic Factors, and Word-level Prominence2024

    • Author(s)
      Xuan Luo, Shinnosuke Takamichi, Yuki Saito, Tomoki Koriyama, Hiroshi Saruwatari
    • Journal Title

      APSIPA Transactions on Signal and Information Processing

      Volume: 13 Pages: 1-30

    • DOI

      10.1561/116.00000242

    • Peer Reviewed / Open Access
  • [Journal Article] Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis2024

    • Author(s)
      Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari
    • Journal Title

      IEEE/ACM Transactions on Audio, Speech and Language Processing

      Volume: 32 Pages: 1829-1844

    • DOI

      10.1109/TASLP.2024.3369537

    • Peer Reviewed / Open Access
  • [Journal Article] Prefix Alignment for Training Simultaneous Machine Translation2024

    • Author(s)
      Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura
    • Journal Title

      Journal of Natural Language Processing

      Volume: 31 Pages: 79-104

    • DOI

      10.5715/jnlp.31.79

    • Peer Reviewed / Open Access
  • [Journal Article] High-fidelity and pitch-controllable neural vocoder based on unified source-filter networks2023

    • Author(s)
      Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda
    • Journal Title

      IEEE/ACM Transactions on Audio, Speech and Language Processing

      Volume: 31 Pages: 3717-3729

    • DOI

      10.1109/TASLP.2023.3282098

    • Peer Reviewed / Open Access
  • [Journal Article] PoP-IDLMA: Product-of-Prior Independent Deeply Learned Matrix Analysis for Multichannel Music Source Separation2023

    • Author(s)
      Takuya Hasumi, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari, Daichi Kitamura, Yu Takahashi, and Kazunobu Kondo
    • Journal Title

      IEEE/ACM Transactions on Audio, Speech, and Language Processing

      Volume: 31 Pages: 2680-2694

    • DOI

      10.1109/TASLP.2023.3293044

    • Peer Reviewed / Open Access
  • [Journal Article] Content Order-Controllable MR-to-Text2023

    • Author(s)
      Keisuke Toyama, Katsuhito Sudoh, Satoshi Nakamura
    • Journal Title

      IEEE Access

      Volume: 11 Pages: 129353-129365

    • DOI

      10.1109/ACCESS.2023.3334139

    • Peer Reviewed / Open Access
  • [Journal Article] Improving Speech Translation Accuracy and Time Efficiency With Fine-Tuned wav2vec 2.0-Based Speech Segmentation2023

    • Author(s)
      Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura
    • Journal Title

      IEEE/ACM Transactions on Audio, Speech and Language Processing

      Volume: 32 Pages: 906-914

    • DOI

      10.1109/TASLP.2023.3343614

    • Peer Reviewed
  • [Presentation] 言語モデルの文法知識評価における間接肯定証拠の分析2024

    • Author(s)
      大羽未悠, 大関洋平, 深津聡世, 芳賀あかり, 大内啓樹, 渡辺太郎, 菅原朔
    • Organizer
      言語処理学会 第30回年次大会
  • [Presentation] 小規模言語モデルによる子供の過剰一般化のモデリング2024

    • Author(s)
      芳賀あかり, 菅原朔, 深津聡世, 大羽未悠, 大内啓樹, 渡辺太郎, 大関洋平
    • Organizer
      言語処理学会 第30回年次大会
  • [Presentation] テキストスタイル変換を用いた話し言葉音声合成2024

    • Author(s)
      吉岡 大貴,安田 裕介,戸田 智基
    • Organizer
      日本音響学会春季研究発表会
  • [Presentation] 音声生成に関する情報処理技術の研究事例2024

    • Author(s)
      戸田 智基
    • Organizer
      人工知能研究センター第76回人工知能セミナー
    • Invited
  • [Presentation] Cocktail Machine Speech Chain: 重複あり音声を用いた音声認識・音声合成モデルの統一的学習2024

    • Author(s)
      松永 裕太
    • Organizer
      日本音響学会2024年春季研究発表会
  • [Presentation] テキスト生成の自動評価尺度に基づく音声生成の自動評価2024

    • Author(s)
      佐伯 高明
    • Organizer
      電子情報通信学会 音声研究会
  • [Presentation] 原発話に忠実な英日同時機械翻訳の実現に向けた順送り訳評価データ作成2024

    • Author(s)
      福田りょう, 土肥康輔, 須藤克仁, 中村哲
    • Organizer
      情報処理学会 第259回 自然言語処理研究発表会
  • [Presentation] 文内コンテキストを利用した分割統治ニューラル機械翻訳2024

    • Author(s)
      石川隆太, 加納保昌, 須藤克仁, 中村哲
    • Organizer
      言語処理学会 第30回年次大会
  • [Presentation] タグ付き混合データ学習と自己教師あり学習による同時通訳データを用いたEnd-to-End同時音声翻訳2024

    • Author(s)
      胡尤佳, 福田りょう, 西川勇太, 加納保昌, 須藤克仁, 中村哲
    • Organizer
      言語処理学会 第30回年次大会
  • [Presentation] 文法項目の多様性と誤り情報を利用したエッセイ自動採点2024

    • Author(s)
      土肥康輔,須藤克仁,中村哲
    • Organizer
      言語処理学会 第30回年次大会
  • [Presentation] 同時通訳・同時翻訳のための語順同期性評価2024

    • Author(s)
      蒔苗茉那, 須藤克仁, 中村哲
    • Organizer
      言語処理学会 第30回年次大会
  • [Presentation] 漸進的な音声分割を用いたストリーミング同時音声翻訳2024

    • Author(s)
      福田りょう, 須藤克仁, 中村哲
    • Organizer
      言語処理学会 第30回年次大会
  • [Presentation] Model-based Subsampling for Knowledge Graph Completion2023

    • Author(s)
      Xincan Feng, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe
    • Organizer
      13th International Joint Conference on Natural Language
    • Int'l Joint Research
  • [Presentation] Generating Diverse Translation with Perturbed kNN-MT2023

    • Author(s)
      Yuto Nishida, Makoto Morishita, Hidetaka Kamigaito, Taro Watanabe
    • Organizer
      18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
    • Int'l Joint Research
  • [Presentation] A comparative study of ethical norms of professional and non-professional interpreters in the media2023

    • Author(s)
      Kayo Matsushita
    • Organizer
      6th International Conference on Non-Professional Interpreting and Translation
    • Int'l Joint Research
  • [Presentation] Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder2023

    • Author(s)
      Yusuke Yasuda, Tomoki Toda
    • Organizer
      IEEE ICASSP 2023
    • Int'l Joint Research
  • [Presentation] Source-Filter HiFiGAN: fast and pitch controllable high-fidelity neural vocoder2023

    • Author(s)
      Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda
    • Organizer
      IEEE ICASSP 2023
    • Int'l Joint Research
  • [Presentation] Emotion awareness in multi-utterance turn for improving emotion prediction in multi-speaker conversation2023

    • Author(s)
      Xiaohan Shi, Xingfeng Li, Tomoki Toda
    • Organizer
      INTERSPEECH 2023
    • Int'l Joint Research
  • [Presentation] 注意機構付きVAEを用いたテキスト発話スタイル変換における少量パラレルデータの活用2023

    • Author(s)
      吉岡 大貴, 安田 裕介, 戸田 智基
    • Organizer
      日本音響学会秋季研究発表会
  • [Presentation] A comparative study of voice conversion models with large-scale speech and singing data: the T13 systems for the Singing Voice Conversion Challenge 20232023

    • Author(s)
      Ryuichi Yamamoto, Reo Yoneyama, Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda
    • Organizer
      IEEE ASRU 2023
    • Int'l Joint Research
  • [Presentation] Leveraging the Multilingual Indonesian Ethnic Languages Dataset in Self-supervised Model for Low-resource ASR Task2023

    • Author(s)
      Sakriani Sakti, Benita Angela Titalim
    • Organizer
      IEEE ASRU
    • Int'l Joint Research
  • [Presentation] Speech Recognition and Meaning Interpretation: Towards Disambiguation of Structurally Ambiguous Spoken Utterances in Indonesian2023

    • Author(s)
      Ruhiyah Widiaputri, Ayu Purwarianti, Dessi Lestari, Kurniawati Azizah, Dipta Tanaya, Sakriani Sakti
    • Organizer
      EMNLP
    • Int'l Joint Research
  • [Presentation] Generating Speech with Prosodic Prominence based on SSL-Visually Grounded Models2023

    • Author(s)
      Ika Hartanti Bella Septina, Dipta Tanaya, Kurniawati Azizah, Dessi Lestari, Ayu Purwarianti, Sakriani Sakti
    • Organizer
      Oriental COCOSDA
    • Int'l Joint Research
  • [Presentation] Exploring Difficulties Encountered by Professional Interpreters in Japanese-to-English and English-to-Japanese Simultaneous Translation2023

    • Author(s)
      Hang Xi, Sakriani Sakti
    • Organizer
      Oriental COCOSDA
    • Int'l Joint Research
  • [Presentation] STEN-TTS: Improving Zero-shot Cross-Lingual Transfer for Multi-Lingual TTS with Style-Enhanced Normalization Diffusion Framework2023

    • Author(s)
      Chung Tran, Chi Mai Luong, Sakriani Sakti
    • Organizer
      INTERSPEECH
    • Int'l Joint Research
  • [Presentation] Unsupervised Learning of Discrete Latent Representations with Data-Adaptive Dimensionality from Continuous Speech Streams2023

    • Author(s)
      Shun Takahashi, Sakriani Sakti
    • Organizer
      INTERSPEECH
    • Int'l Joint Research
  • [Presentation] Low-Resource Japanese-English Speech-to-Text Translation Leveraging Speech-Text Unified-model Representation Learning2023

    • Author(s)
      Tu Dinh Tran, Sakti Sakriani
    • Organizer
      INTERSPEECH Satellite Workshop - the ELRA/ISCA Special Interest Group on Under-resourced Languages (SIGUL)
    • Int'l Joint Research
  • [Presentation] VGSAlign: Bilingual Speech Alignment of Unpaired and Untranscribed Languages using Self-Supervised Visually Grounded Speech Models2023

    • Author(s)
      Luan Thanh Nguyen, Sakriani Sakti
    • Organizer
      INTERSPEECH Satellite Workshop - the ELRA/ISCA Special Interest Group on Under-resourced Languages (SIGUL)
    • Int'l Joint Research
  • [Presentation] An Isotropy Analysis for Self-supervised Acoustic Unit Embeddings on the Zero Resource Speech Challenge 2021 Framework2023

    • Author(s)
      Jianan Chen, Sakriani Sakti
    • Organizer
      IEEE ICASSP
    • Int'l Joint Research
  • [Presentation] Self-adaptive Incremental Machine Speech Chain for Lombard TTS with High-granularity ASR Feedback in Dynamic Noise Condition2023

    • Author(s)
      Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura
    • Organizer
      IEEE ICASSP
    • Int'l Joint Research
  • [Presentation] Language Technology for All: From the technology and indigenous community perspectives2023

    • Author(s)
      Sakriani Sakti
    • Organizer
      Oriental COCOSDA
    • Int'l Joint Research / Invited
  • [Presentation] E2E Refined Dataset2023

    • Author(s)
      Keisuke Toyama, Katsuhito Sudoh, Satoshi Nakamura
    • Organizer
      26th International Conference of Oriental-COCOSDA 2023
    • Int'l Joint Research
  • [Presentation] Investigation of Validity of Paradigmatic Diagnosis for Downstep in Japanese2023

    • Author(s)
      Kei Furukawa, Satoshi Nakamura
    • Organizer
      26th International Conference of Oriental-COCOSDA 2023
    • Int'l Joint Research
  • [Presentation] Inter-connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation2023

    • Author(s)
      Yuta Nishikawa, Satoshi Nakamura
    • Organizer
      INTERSPEECH2023
    • Int'l Joint Research
  • [Presentation] Boundary-Driven Account for Downstep in Japanese2023

    • Author(s)
      Kei Furukawa, Satoshi Nakamura
    • Organizer
      20th International Congress of Phonetic Sciences
    • Int'l Joint Research
  • [Presentation] Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining2023

    • Author(s)
      Takaaki Saeki
    • Organizer
      The 32nd International Joint Conference on Artificial Intelligence (IJCAI 2023) Main Track
    • Int'l Joint Research
  • [Presentation] NAIST Simultaneous Speech Translation System for IWSLT 20232023

    • Author(s)
      Ryo Fukuda, Yuta Nishikawa, Yasumasa Kano, Yuka Ko, Tomoya Yanagita, Kosuke Doi, Mana Makinae, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura
    • Organizer
      the 20th International Conference on Spoken Language Translation (IWSLT 2023)
    • Int'l Joint Research
  • [Presentation] Tagged End-to-End Simultaneous Speech Translation Training using Simultaneous Interpretation Data2023

    • Author(s)
      Yuka Ko, Ryo Fukuda, Yuta Nishikawa, Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura
    • Organizer
      the 20th International Conference on Spoken Language Translation (IWSLT 2023)
    • Int'l Joint Research
  • [Presentation] Average Token Delay: A Latency Metric for Simultaneous Translation2023

    • Author(s)
      Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura
    • Organizer
      Interspeech 2023
    • Int'l Joint Research
  • [Presentation] E2E Refined Dataset2023

    • Author(s)
      Keisuke Toyama, Katsuhito Sudoh, Satoshi Nakamura
    • Organizer
      the 26th International Conference of Oriental-COCOSDA
    • Int'l Joint Research
  • [Presentation] Average Token Delay: 同時通訳の遅延評価尺度2023

    • Author(s)
      加納保昌, 須藤克仁, 中村哲
    • Organizer
      日本通訳翻訳学会第24回年次大会
  • [Presentation] Embedding articulatory constraints for low-resource speech recognition based on large pre-trained model.2023

    • Author(s)
      J.Lee, M.Mimura, and T.Kawahara.
    • Organizer
      INTERSPEECH
    • Int'l Joint Research
  • [Presentation] Time-domain speech enhancement assisted by multi-resolution frequency encoder and decoder.2023

    • Author(s)
      H.Shi, M.Mimura, L.Wang, J.Dang, and T.Kawahara.
    • Organizer
      IEEE-ICASSP
    • Int'l Joint Research
  • [Presentation] Domain and language adaptation using heterogeneous datasets for wav2vec2.0-based speech recognition of low-resource language.2023

    • Author(s)
      K.Soky, S.Li, C.Chu, and T.Kawahara.
    • Organizer
      IEEE-ICASSP
    • Int'l Joint Research
  • [Presentation] Keep Eyes on the Sentence: An Interactive Sentence Simplification System for English Learners Based on Eye Tracking and Large Language Models2023

    • Author(s)
      Taichi Higasa, Keitaro Tanaka, Qi Feng, Shigeo Morishima
    • Organizer
      ACM CHI Conference on Human Factors in Computing Systems, CHI 2024 (Late-Breaking Work)
    • Int'l Joint Research
  • [Presentation] Gaze-Driven Sentence Simplification for Language Learners: Enhancing Comprehension and Readability2023

    • Author(s)
      Taichi Higasa, Keitaro Tanaka, Qi Feng, Shigeo Morishima
    • Organizer
      The 25th International Conference on Multimodal Interaction, ICMI 2023
    • Int'l Joint Research
  • [Presentation] Audio-Visual Speech Enhancement With Selective Off-Screen Speech Extraction2023

    • Author(s)
      Tomoya Yoshinaga, Keitaro Tanaka, Shigeo Morishima
    • Organizer
      The 31st European Signal Processing Conference, EUSIPCO2023, Best Student Paper Contest Finalist
    • Int'l Joint Research
  • [Presentation] Efficient 3D Reconstruction of NeRF using Camera Pose Interpolation and Photometric Bundle Adjustment2023

    • Author(s)
      Tsukasa Takeda, Shugo Yamaguchi, Kazuhito Sato, Kosuke Fukazawa, Shigeo Morishima
    • Organizer
      ACM Special Interest Group on Computer Graphics and Interactive Techniques Conference, SIGGRAPH 2023
    • Int'l Joint Research
  • [Presentation] Deformable Neural Radiance Fields for Object Motion Blur Removal2023

    • Author(s)
      Kazuhito Sato, Shugo Yamaguchi, Tsukasa Takeda, and Shigeo Morishima
    • Organizer
      ACM Special Interest Group on Computer Graphics and Interactive Techniques Conference Posters, SIGGRAPH 2023 Posters
    • Int'l Joint Research
  • [Presentation] Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning2023

    • Author(s)
      Sara Kashiwagi, Keitaro Tanaka, Qi Feng, Shigeo Morishima
    • Organizer
      INTERSPEECH2023
    • Int'l Joint Research
  • [Presentation] Memory Efficient Diffusion Probabilistic Models via Patch-based Generation2023

    • Author(s)
      Shinei Arakawa, Hideki Tsunashima, Daichi Horita, Keitaro Tanaka, Shigeo Morishima
    • Organizer
      The IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop 2023, CVPR workshop 2023
    • Int'l Joint Research
  • [Book] Utilizing remote simultaneous interpreting data for interpreting quality assessment A corpus-based study2023

    • Author(s)
      Masaru Yamada, Kayo Matsushita, Hiroyuki Ishizuka
    • Total Pages
      17
    • Publisher
      Routledge

URL: 

Published: 2024-12-25  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi