A Study on Multi-modal Automatic Simultaneous Interpretation System and Evaluation Method

Research Project

Project/Area Number	21H05054
Research Category	Grant-in-Aid for Scientific Research (S)
Allocation Type	Single-year Grants
Review Section	Broad Section J
Research Institution	Nara Institute of Science and Technology
Principal Investigator	中村哲奈良先端科学技術大学院大学, 研究推進機構, 特任教授 (30263429)
Co-Investigator(Kenkyū-buntansha)	河原達也京都大学, 情報学研究科, 教授 (00234104) 戸田智基名古屋大学, 情報基盤センター, 教授 (90403328) 森島繁生早稲田大学, 理工学術院, 教授 (10200411) 猿渡洋東京大学, 大学院情報理工学系研究科, 教授 (30324974) 松下佳世立教大学, 異文化コミュニケーション学部, 教授 (90746679) 須藤克仁奈良女子大学, 生活環境科学系, 教授 (00396152) 高道慎之介慶應義塾大学, 理工学部(矢上), 准教授 (90784330) 渡辺太郎奈良先端科学技術大学院大学, 先端科学技術研究科, 教授 (90395038) SAKTI Sakriani 奈良先端科学技術大学院大学, 先端科学技術研究科, 教授 (00395005) 山田優立教大学, 異文化コミュニケーション学部, 教授 (70645001) 田中宏季奈良先端科学技術大学院大学, 先端科学技術研究科, 助教 (10757834) 品川政太朗奈良先端科学技術大学院大学, 先端科学技術研究科, 客員助教 (70897454)
Project Period (FY)	2021-07-05 – 2026-03-31
Project Status	Granted (Fiscal Year 2024)
Budget Amount *help	¥189,280,000 (Direct Cost: ¥145,600,000、Indirect Cost: ¥43,680,000) Fiscal Year 2025: ¥36,790,000 (Direct Cost: ¥28,300,000、Indirect Cost: ¥8,490,000) Fiscal Year 2024: ¥36,790,000 (Direct Cost: ¥28,300,000、Indirect Cost: ¥8,490,000) Fiscal Year 2023: ¥36,790,000 (Direct Cost: ¥28,300,000、Indirect Cost: ¥8,490,000) Fiscal Year 2022: ¥36,790,000 (Direct Cost: ¥28,300,000、Indirect Cost: ¥8,490,000) Fiscal Year 2021: ¥42,120,000 (Direct Cost: ¥32,400,000、Indirect Cost: ¥9,720,000)
Keywords	音声翻訳
Outline of Research at the Start	本研究では，課題１：多元同時通訳方式：パラ言語音声翻訳およびビデオ・事前・外部知識の利用による多元同時通訳，通訳出力最適化，漸進的音声通訳方式高度化，課題２：通訳品質の評価法とリアルタイム評価技術：通訳プロセス分析，通訳者支援技術，通訳者・自動通訳システム共通の通訳品質の評価法，脳活動を含むセンシングによる通訳品質客観的自動評価法の確立．課題３：コーパス構築とシステム：通訳時間アライメント・品質アノテーション，コーパス増強，実運用システムの構築とデータ収集・改良のエコシステムの構築とアクティブラーニング，ライフロングラーニング法の確立を実施する
Outline of Annual Research Achievements	【課題１】多元同時通訳方式：A)「強調」に関しては，フォーカスに関して，音声の韻律と言語表現の最適組み合わせ出力に取り組んだ．パラ言語情報制御機能を備えた音声変換・合成技術に関する基礎検討を行った．また，豊かな音声表情翻訳手法については発話者の韻律同期もしくは感情表出時の顔動画の個性表現に関して検討を進め，動画生成時のキーフレーム補間時のアイデンティティ同期の方法について検討を進めた．B)字幕翻訳を例に，分野やキャラクタ等の情報を明確に与える形での事前適応を試みた．C) 通訳出力最適化については，Local Agreement法とAlignAtt法による通訳方略の検討および音声合成の言語処理部の逐次動作化を進めた．【課題２】通訳品質の評価法とリアルタイム評価技術に関しては，A)「順送り」や「省略」などの分析をさらに進めた．また進的翻訳技術との連携により応用技術に落とし込み，通訳者の補助として有用な技術の切り出しの検討も進めた．B)通訳者が重視する観点の考慮，順送り訳の度合いの考慮などを含んだ自動通訳品質評価指標の検討を進めた．C)EEGを用いた認知負荷の高い構文の解析，文中の語順の異なる位置と認知負荷の関係，認知負荷を位相振幅カップリング（PAC）で分析する研究が進んだ．【課題３】コーパス構築とシステムとしては，A)自動アライメントによる通訳対訳コーパスの増強と同時通訳システムへの活用，また，通訳品質評価への応用について検討した．B)多元パラ言語アノテーション付きコーパス50時間，事前情報50時間については方針の検討を行った．C)モジュールの統合，評価を行い，エコシステムの設計，実装については引き続きIWSLTの評価タスクに参加してシステムの性能改善を進める。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason IWSLT評価タスクを目指した同時通訳システム試作とそれに伴う各モジュールの研究開発が順調に進んでいる。2022年度は、漸進的な音声認識、機械翻訳、音声合成を接続してシステムを構築したが、2023年度は多言語の事前学習モデル（音声モデル、翻訳モデル）をベースに改良を行い、入力言語の音声から直接対象言語のテキストへ変換し、それを逐次音声合成するシステムを構築した。評価についても、通訳者、同時通訳システムにおいて適用可能な自動評価システムができつつある。
Strategy for Future Research Activity	IWSLTの評価タスクに参加継続し、システムの高速化、性能改善を進めるとともに、研究用プロトタイプをさらに発展させて、実証実験可能なシステムを構築する。同時に、フォーカス、声質、発話表情を中心としたマルチモーダル翻訳システムと通訳の自動品質評価法を確立する。
Assessment Rating	Interim Assessment Comments (Rating) A: In light of the aim of introducing the research area into the research categories, the expected progress has been made in research.

Report

(7 results)

2023 Abstract (Interim Assessment) ( PDF ) Annual Research Report Interim Assessment (Comments) ( PDF )
2022 Annual Research Report
2021 Abstract ( PDF ) Comments on the Screening Results ( PDF ) Annual Research Report

Research Products
(162 results)

All 2024 2023 2022 2021

All Journal Article (26 results) (of which Int'l Joint Research: 3 results, Peer Reviewed: 25 results, Open Access: 22 results) Presentation (134 results) (of which Int'l Joint Research: 79 results, Invited: 5 results) Book (1 results) Patent(Industrial Property Rights) (1 results)

[Journal Article] Emotion-controllable Speech Synthesis using Emotion Soft Label, Utterance-level Prosodic Factors, and Word-level Prominence2024
- Author(s)
  Xuan Luo, Shinnosuke Takamichi, Yuki Saito, Tomoki Koriyama, Hiroshi Saruwatari
- Journal Title
  
  APSIPA Transactions on Signal and Information Processing
  
  Volume: 13 Issue: 1 Pages: 1-30
- DOI
  10.1561/116.00000242
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis2024
- Author(s)
  Saeki Takaaki、Maiti Soumi、Li Xinjian、Watanabe Shinji、Takamichi Shinnosuke、Saruwatari Hiroshi
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 32 Pages: 1829-1844
- DOI
  10.1109/taslp.2024.3369537
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Improving Speech Translation Accuracy and Time Efficiency With Fine-Tuned wav2vec 2.0-Based Speech Segmentation2024
- Author(s)
  Fukuda Ryo、Sudoh Katsuhito、Nakamura Satoshi
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 32 Pages: 906-916
- DOI
  10.1109/taslp.2023.3343614
- Related Report
  2023 Annual Research Report
- Peer Reviewed
[Journal Article] Prefix Alignment for Training Simultaneous Machine Translation2024
- Author(s)
  Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura
- Journal Title
  
  Journal of Natural Language Processing
  
  Volume: 31 Issue: 1 Pages: 79-104
- DOI
  10.5715/jnlp.31.79
- ISSN
  1340-7619, 2185-8314
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Sound Field Interpolation for Rotation-Invariant Multichannel Array Signal Processing2023
- Author(s)
  Wakabayashi Yukoh、Yamaoka Kouei、Ono Nobutaka
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 31 Pages: 2286-2298
- DOI
  10.1109/taslp.2023.3282098
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] PoP-IDLMA: Product-of-Prior Independent Deeply Learned Matrix Analysis for Multichannel Music Source Separation2023
- Author(s)
  Takuya Hasumi, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari, Daichi Kitamura, Yu Takahashi, and Kazunobu Kondo
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 31 Pages: 2680-2694
- DOI
  10.1109/taslp.2023.3293044
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Content Order-Controllable MR-to-Text2023
- Author(s)
  Keisuke Toyama, Katsuhito Sudoh, Satoshi Nakamura
- Journal Title
  
  IEEE Access
  
  Volume: 11 Pages: 129353-129365
- DOI
  10.1109/access.2023.3334139
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] End-to-End Generation of Written-style Transcript of Speech from Parliamentary Meetings2023
- Author(s)
  Mimura Masato、Kawahara Tatsuya
- Journal Title
  
  Journal of Natural Language Processing
  
  Volume: 30 Issue: 1 Pages: 88-124
- DOI
  10.5715/jnlp.30.88
- ISSN
  1340-7619, 2185-8314
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Japanese Neural Incremental Text-to-Speech Synthesis Framework With an Accent Phrase Input2023
- Author(s)
  Yanagita Tomoya、Sakti Sakriani、Nakamura Satoshi
- Journal Title
  
  IEEE Access
  
  Volume: 11 Pages: 22355-22363
- DOI
  10.1109/access.2023.3251657
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Synthesis Unit for Japanese Incremental Text-to-Speech2022
- Author(s)
  柳田智也、サクテイサクリアニ、中村哲
- Journal Title
  
  情報処理学会論文誌
  
  Volume: 63 Issue: 4 Pages: 1149-1158
- DOI
  10.20729/00217617
- Year and Date
  2022-04-15
- Related Report
  2022 Annual Research Report
- Peer Reviewed
[Journal Article] Deficient-basis-complementary rank-constrained spatial covariance matrix estimation based on multivariate generalized Gaussian distribution for blind speech extraction2022
- Author(s)
  Yuto Kondo, Yuki Kubo, Norihiro Takamune , Daichi Kitamura, and Hiroshi Saruwatari
- Journal Title
  
  EURASIP Journal on Advances in Signal Processing
  
  Volume: 88(2022) Issue: 1
- DOI
  10.1186/s13634-022-00905-z
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Neural Machine Translation with Synchronous Latent Phrase Structure2022
- Author(s)
  Shintaro Harada, Taro Watanabe
- Journal Title
  
  Journal of Natural Language Processing
  
  Volume: 29 Issue: 2 Pages: 587-610
- DOI
  10.5715/jnlp.29.587
- ISSN
  1340-7619, 2185-8314
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] TriECCC: Trilingual Corpus of the Extraordinary Chambers in the Courts of Cambodia for Speech Recognition and Translation Studies2022
- Author(s)
  Soky Kak、Mimura Masato、Kawahara Tatsuya、Chu Chenhui、Li Sheng、Ding Chenchen、Sam Sethserey
- Journal Title
  
  International Journal of Asian Language Processing
  
  Volume: 31 Issue: 03n04 Pages: 1-21
- DOI
  10.1142/s2717554522500072
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] A cyclical approach to synthetic and natural speech mismatch refinement of neural post-filter for low-cost text-to-speech system2022
- Author(s)
  Y.-C. Wu, P.L. Tobing, K. Yasuhara, N. Matsunaga, Y. Ohtani, T. Toda
- Journal Title
  
  APSIPA Transactions on Signal and Information Processing
  
  Volume: Vol. 11, No. 1, e30 Issue: 1
- DOI
  10.1561/116.00000020
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] A Machine Speech Chain Approach for Dynamically Adaptive Lombard TTS in Static and Dynamic Noise Environments2022
- Author(s)
  Novitasari Sashi、Sakti Sakriani、Nakamura Satoshi
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 30 Pages: 2673-2688
- DOI
  10.1109/taslp.2022.3196879
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Tackling multiple object tracking with complicated motions ? Re-designing the integration of motion and appearance2022
- Author(s)
  Yang Fan、Wang Zheng、Wu Yang、Sakti Sakriani、Nakamura Satoshi
- Journal Title
  
  Image and Vision Computing
  
  Volume: 124 Pages: 104514-104514
- DOI
  10.1016/j.imavis.2022.104514
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Modeling Unsupervised Empirical Adaptation by DPGMM and DPGMM-RNN Hybrid Model to Extract Perceptual Features for Low-resource ASR2022
- Author(s)
  Bin Wu, Sakriani Sakti, Jinsong Zhang, and Satoshi Nakamura
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: Vol. 30 Pages: 901-916
- DOI
  10.1109/taslp.2022.3150220
- Related Report
  2022 Annual Research Report 2021 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Knowledge Distillation for Translating Erroneous Speech Transcriptions2022
- Author(s)
  Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura
- Journal Title
  
  Journal of Natural Language Processing
  
  Volume: 29 Issue: 2 Pages: 344-366
- DOI
  10.5715/jnlp.29.344
- ISSN
  1340-7619, 2185-8314
- Related Report
  2022 Annual Research Report
- Peer Reviewed
[Journal Article] How Remote Interpreting Changed the Japanese Interpreting Industry: Findings from an online survey conducted during the COVID-19 pandemic2022
- Author(s)
  Kayo Matsushita
- Journal Title
  
  INContext: Studies in Translation and Interculturalism
  
  Volume: 2(2) Issue: 2 Pages: 167-185
- DOI
  10.54754/incontext.v2i2.22
- Related Report
  2022 Annual Research Report
[Journal Article] On Knowledge Distillation for Translating Erroneous Speech Transcriptions2022
- Author(s)
  Ryo Fukuda, Katsuhito Sudoh, and Satoshi Nakamura
- Journal Title
  
  自然言語処理
  
  Volume: 2-29
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Neural Incremental Speech Recognition Toward Real-Time Machine Speech Translation2021
- Author(s)
  Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura,
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: E104.D Issue: 12 Pages: 2195-2208
- DOI
  10.1587/transinf.2021EDP7014
- NAID
  130008123347
- ISSN
  0916-8532, 1745-1361
- Year and Date
  2021-12-01
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Synthesizing waveform sequence-to-sequence to augment training data for sequence-to-sequence speech recognition2021
- Author(s)
  S.Ueno, M.Mimura, S.Sakai, and T.Kawahara
- Journal Title
  
  Acoustical Science and Technology
  
  Volume: 42 Issue: 6 Pages: 333-343
- DOI
  10.1250/ast.42.333
- NAID
  130008110355
- ISSN
  0369-4232, 1346-3969, 1347-5177
- Year and Date
  2021-11-01
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Alignment knowledge distillation for online streaming attention-based speech recognition2021
- Author(s)
  H.Inaguma and T.Kawahara
- Journal Title
  
  IEEE/ACM Trans. Audio, Speech & Language Process
  
  Volume: Vol.29 Pages: 1-15
- DOI
  10.1109/taslp.2021.3133217
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Audio-Oriented Video Interpolation Using Key Pose2021
- Author(s)
  Takayuki Nakatsuka, Yukitaka Tsuchiya, Masatoshi Hamanaka and Shigeo Morishima
- Journal Title
  
  International Journal of Pattern Recognition and Artificial Intelligence
  
  Volume: Vol. 35, No. 16 Issue: 16
- DOI
  10.1142/s0218001421600168
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Length-constrained Neural Machine Translation using Length Prediction and Perturbation into Length-aware Positional Encoding2021
- Author(s)
  Yui Oka, Katsuhito Sudoh, Satoshi Nakamura
- Journal Title
  
  Journal of Natural Language Processing
  
  Volume: 28 Issue: 3 Pages: 778-801
- DOI
  10.5715/jnlp.28.778
- NAID
  130008088116
- ISSN
  1340-7619, 2185-8314
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] End-to-End Image-to-Speech Generation for Untranscribed Unknown Languages2021
- Author(s)
  Johanes Effendi, Sakriani Sakti, Satoshi Nakamura
- Journal Title
  
  IEEE Access
  
  Volume: 9 Pages: 55144-55154
- DOI
  10.1109/access.2021.3071541
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] 言語モデルの文法知識評価における間接肯定証拠の分析2024
- Author(s)
  大羽未悠, 大関洋平, 深津聡世, 芳賀あかり, 大内啓樹, 渡辺太郎, 菅原朔
- Organizer
  言語処理学会第30回年次大会
- Related Report
  2023 Annual Research Report
[Presentation] 小規模言語モデルによる子供の過剰一般化のモデリング2024
- Author(s)
  芳賀あかり, 菅原朔, 深津聡世, 大羽未悠, 大内啓樹, 渡辺太郎, 大関洋平
- Organizer
  言語処理学会第30回年次大会
- Related Report
  2023 Annual Research Report
[Presentation] テキストスタイル変換を用いた話し言葉音声合成2024
- Author(s)
  吉岡大貴，安田裕介，戸田智基
- Organizer
  日本音響学会春季研究発表会
- Related Report
  2023 Annual Research Report
[Presentation] 音声生成に関する情報処理技術の研究事例2024
- Author(s)
  戸田智基
- Organizer
  人工知能研究センター第76回人工知能セミナー
- Related Report
  2023 Annual Research Report
- Invited
[Presentation] Cocktail Machine Speech Chain: 重複あり音声を用いた音声認識・音声合成モデルの統一的学習2024
- Author(s)
  松永裕太
- Organizer
  日本音響学会2024年春季研究発表会
- Related Report
  2023 Annual Research Report
[Presentation] テキスト生成の自動評価尺度に基づく音声生成の自動評価2024
- Author(s)
  佐伯高明
- Organizer
  電子情報通信学会音声研究会
- Related Report
  2023 Annual Research Report
[Presentation] 原発話に忠実な英日同時機械翻訳の実現に向けた順送り訳評価データ作成2024
- Author(s)
  福田りょう, 土肥康輔, 須藤克仁, 中村哲
- Organizer
  情報処理学会第259回自然言語処理研究発表会
- Related Report
  2023 Annual Research Report
[Presentation] 文内コンテキストを利用した分割統治ニューラル機械翻訳2024
- Author(s)
  石川隆太, 加納保昌, 須藤克仁, 中村哲
- Organizer
  言語処理学会第30回年次大会
- Related Report
  2023 Annual Research Report
[Presentation] タグ付き混合データ学習と自己教師あり学習による同時通訳データを用いたEnd-to-End同時音声翻訳2024
- Author(s)
  胡尤佳, 福田りょう, 西川勇太, 加納保昌, 須藤克仁, 中村哲
- Organizer
  言語処理学会第30回年次大会
- Related Report
  2023 Annual Research Report
[Presentation] 文法項目の多様性と誤り情報を利用したエッセイ自動採点2024
- Author(s)
  土肥康輔,須藤克仁,中村哲
- Organizer
  言語処理学会第30回年次大会
- Related Report
  2023 Annual Research Report
[Presentation] 同時通訳・同時翻訳のための語順同期性評価2024
- Author(s)
  蒔苗茉那, 須藤克仁, 中村哲
- Organizer
  言語処理学会第30回年次大会
- Related Report
  2023 Annual Research Report
[Presentation] 漸進的な音声分割を用いたストリーミング同時音声翻訳2024
- Author(s)
  福田りょう, 須藤克仁, 中村哲
- Organizer
  言語処理学会第30回年次大会
- Related Report
  2023 Annual Research Report
[Presentation] Model-based Subsampling for Knowledge Graph Completion2023
- Author(s)
  Xincan Feng, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe
- Organizer
  13th International Joint Conference on Natural Language
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Generating Diverse Translation with Perturbed kNN-MT2023
- Author(s)
  Yuto Nishida, Makoto Morishita, Hidetaka Kamigaito, Taro Watanabe
- Organizer
  18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] A comparative study of ethical norms of professional and non-professional interpreters in the media2023
- Author(s)
  Kayo Matsushita
- Organizer
  6th International Conference on Non-Professional Interpreting and Translation
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder2023
- Author(s)
  Yusuke Yasuda, Tomoki Toda
- Organizer
  IEEE ICASSP 2023
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Source-Filter HiFiGAN: fast and pitch controllable high-fidelity neural vocoder2023
- Author(s)
  Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda
- Organizer
  IEEE ICASSP 2023
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Emotion awareness in multi-utterance turn for improving emotion prediction in multi-speaker conversation2023
- Author(s)
  Xiaohan Shi, Xingfeng Li, Tomoki Toda
- Organizer
  INTERSPEECH 2023
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] 注意機構付きVAEを用いたテキスト発話スタイル変換における少量パラレルデータの活用2023
- Author(s)
  吉岡大貴, 安田裕介, 戸田智基
- Organizer
  日本音響学会秋季研究発表会
- Related Report
  2023 Annual Research Report
[Presentation] A comparative study of voice conversion models with large-scale speech and singing data: the T13 systems for the Singing Voice Conversion Challenge 20232023
- Author(s)
  Ryuichi Yamamoto, Reo Yoneyama, Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda
- Organizer
  IEEE ASRU 2023
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Leveraging the Multilingual Indonesian Ethnic Languages Dataset in Self-supervised Model for Low-resource ASR Task2023
- Author(s)
  Sakriani Sakti, Benita Angela Titalim
- Organizer
  IEEE ASRU
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Speech Recognition and Meaning Interpretation: Towards Disambiguation of Structurally Ambiguous Spoken Utterances in Indonesian2023
- Author(s)
  Ruhiyah Widiaputri, Ayu Purwarianti, Dessi Lestari, Kurniawati Azizah, Dipta Tanaya, Sakriani Sakti
- Organizer
  EMNLP
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Generating Speech with Prosodic Prominence based on SSL-Visually Grounded Models2023
- Author(s)
  Ika Hartanti Bella Septina, Dipta Tanaya, Kurniawati Azizah, Dessi Lestari, Ayu Purwarianti, Sakriani Sakti
- Organizer
  Oriental COCOSDA
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Exploring Difficulties Encountered by Professional Interpreters in Japanese-to-English and English-to-Japanese Simultaneous Translation2023
- Author(s)
  Hang Xi, Sakriani Sakti
- Organizer
  Oriental COCOSDA
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] STEN-TTS: Improving Zero-shot Cross-Lingual Transfer for Multi-Lingual TTS with Style-Enhanced Normalization Diffusion Framework2023
- Author(s)
  Chung Tran, Chi Mai Luong, Sakriani Sakti
- Organizer
  INTERSPEECH
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Unsupervised Learning of Discrete Latent Representations with Data-Adaptive Dimensionality from Continuous Speech Streams2023
- Author(s)
  Shun Takahashi, Sakriani Sakti
- Organizer
  INTERSPEECH
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Low-Resource Japanese-English Speech-to-Text Translation Leveraging Speech-Text Unified-model Representation Learning2023
- Author(s)
  Tu Dinh Tran, Sakti Sakriani
- Organizer
  INTERSPEECH Satellite Workshop - the ELRA/ISCA Special Interest Group on Under-resourced Languages (SIGUL)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] VGSAlign: Bilingual Speech Alignment of Unpaired and Untranscribed Languages using Self-Supervised Visually Grounded Speech Models2023
- Author(s)
  Luan Thanh Nguyen, Sakriani Sakti
- Organizer
  INTERSPEECH Satellite Workshop - the ELRA/ISCA Special Interest Group on Under-resourced Languages (SIGUL)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] An Isotropy Analysis for Self-supervised Acoustic Unit Embeddings on the Zero Resource Speech Challenge 2021 Framework2023
- Author(s)
  Jianan Chen, Sakriani Sakti
- Organizer
  IEEE ICASSP
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Self-adaptive Incremental Machine Speech Chain for Lombard TTS with High-granularity ASR Feedback in Dynamic Noise Condition2023
- Author(s)
  Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura
- Organizer
  IEEE ICASSP
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Language Technology for All: From the technology and indigenous community perspectives2023
- Author(s)
  Sakriani Sakti
- Organizer
  Oriental COCOSDA
- Related Report
  2023 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] E2E Refined Dataset2023
- Author(s)
  Keisuke Toyama, Katsuhito Sudoh, Satoshi Nakamura
- Organizer
  26th International Conference of Oriental-COCOSDA 2023
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Investigation of Validity of Paradigmatic Diagnosis for Downstep in Japanese2023
- Author(s)
  Kei Furukawa, Satoshi Nakamura
- Organizer
  26th International Conference of Oriental-COCOSDA 2023
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Inter-connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation2023
- Author(s)
  Yuta Nishikawa, Satoshi Nakamura
- Organizer
  INTERSPEECH2023
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Boundary-Driven Account for Downstep in Japanese2023
- Author(s)
  Kei Furukawa, Satoshi Nakamura
- Organizer
  20th International Congress of Phonetic Sciences
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining2023
- Author(s)
  Takaaki Saeki
- Organizer
  The 32nd International Joint Conference on Artificial Intelligence (IJCAI 2023) Main Track
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] NAIST Simultaneous Speech Translation System for IWSLT 20232023
- Author(s)
  Ryo Fukuda, Yuta Nishikawa, Yasumasa Kano, Yuka Ko, Tomoya Yanagita, Kosuke Doi, Mana Makinae, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura
- Organizer
  the 20th International Conference on Spoken Language Translation (IWSLT 2023)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Tagged End-to-End Simultaneous Speech Translation Training using Simultaneous Interpretation Data2023
- Author(s)
  Yuka Ko, Ryo Fukuda, Yuta Nishikawa, Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura
- Organizer
  the 20th International Conference on Spoken Language Translation (IWSLT 2023)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Average Token Delay: A Latency Metric for Simultaneous Translation2023
- Author(s)
  Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura
- Organizer
  Interspeech 2023
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] E2E Refined Dataset2023
- Author(s)
  Keisuke Toyama, Katsuhito Sudoh, Satoshi Nakamura
- Organizer
  the 26th International Conference of Oriental-COCOSDA
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Average Token Delay: 同時通訳の遅延評価尺度2023
- Author(s)
  加納保昌, 須藤克仁, 中村哲
- Organizer
  日本通訳翻訳学会第24回年次大会
- Related Report
  2023 Annual Research Report
[Presentation] Embedding articulatory constraints for low-resource speech recognition based on large pre-trained model.2023
- Author(s)
  J.Lee, M.Mimura, and T.Kawahara.
- Organizer
  INTERSPEECH
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Time-domain speech enhancement assisted by multi-resolution frequency encoder and decoder.2023
- Author(s)
  H.Shi, M.Mimura, L.Wang, J.Dang, and T.Kawahara.
- Organizer
  IEEE-ICASSP
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Domain and language adaptation using heterogeneous datasets for wav2vec2.0-based speech recognition of low-resource language.2023
- Author(s)
  K.Soky, S.Li, C.Chu, and T.Kawahara.
- Organizer
  IEEE-ICASSP
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Keep Eyes on the Sentence: An Interactive Sentence Simplification System for English Learners Based on Eye Tracking and Large Language Models2023
- Author(s)
  Taichi Higasa, Keitaro Tanaka, Qi Feng, Shigeo Morishima
- Organizer
  ACM CHI Conference on Human Factors in Computing Systems, CHI 2024 (Late-Breaking Work)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Gaze-Driven Sentence Simplification for Language Learners: Enhancing Comprehension and Readability2023
- Author(s)
  Taichi Higasa, Keitaro Tanaka, Qi Feng, Shigeo Morishima
- Organizer
  The 25th International Conference on Multimodal Interaction, ICMI 2023
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Audio-Visual Speech Enhancement With Selective Off-Screen Speech Extraction2023
- Author(s)
  Tomoya Yoshinaga, Keitaro Tanaka, Shigeo Morishima
- Organizer
  The 31st European Signal Processing Conference, EUSIPCO2023, Best Student Paper Contest Finalist
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Efficient 3D Reconstruction of NeRF using Camera Pose Interpolation and Photometric Bundle Adjustment2023
- Author(s)
  Tsukasa Takeda, Shugo Yamaguchi, Kazuhito Sato, Kosuke Fukazawa, Shigeo Morishima
- Organizer
  ACM Special Interest Group on Computer Graphics and Interactive Techniques Conference, SIGGRAPH 2023
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Deformable Neural Radiance Fields for Object Motion Blur Removal2023
- Author(s)
  Kazuhito Sato, Shugo Yamaguchi, Tsukasa Takeda, and Shigeo Morishima
- Organizer
  ACM Special Interest Group on Computer Graphics and Interactive Techniques Conference Posters, SIGGRAPH 2023 Posters
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning2023
- Author(s)
  Sara Kashiwagi, Keitaro Tanaka, Qi Feng, Shigeo Morishima
- Organizer
  INTERSPEECH2023
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Memory Efficient Diffusion Probabilistic Models via Patch-based Generation2023
- Author(s)
  Shinei Arakawa, Hideki Tsunashima, Daichi Horita, Keitaro Tanaka, Shigeo Morishima
- Organizer
  The IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop 2023, CVPR workshop 2023
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] vTTS: visual-text to speech2023
- Author(s)
  Yoshifumi Nakano
- Organizer
  IEEE SLT 2022
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] 日本語音声合成におけるアクセント句韻律特徴量の表現と予測2023
- Author(s)
  佐藤匡紀
- Organizer
  音声研究会 (SP)
- Related Report
  2022 Annual Research Report
[Presentation] 動画キャプションモデルを用いた字幕翻訳の検討2023
- Author(s)
  成浦拓音
- Organizer
  言語処理学会第29回年次大会
- Related Report
  2022 Annual Research Report
[Presentation] Cyclic Partially-aligned Transformer for Visually Connected Speech-to-text Mapping2023
- Author(s)
  J. Effendi, S. Sakti, S. Nakamura
- Organizer
  The 2023 Spring meeting of the Acoustical Society of Japan (ASJ)
- Related Report
  2022 Annual Research Report
[Presentation] インクリメンタル音声合成のための逐次読み・アクセント推定法の検討2023
- Author(s)
  柳田智也, 中村哲
- Organizer
  日本音響学会　2023年春季研究発表会
- Related Report
  2022 Annual Research Report
[Presentation] 事前学習モデルによる分割統治ニューラル機械翻訳2023
- Author(s)
  石川隆太, 加納保昌, 須藤克仁, 中村哲
- Organizer
  言語処理学会第29回年次大会
- Related Report
  2022 Annual Research Report
[Presentation] 音声機械翻訳の時間効率と精度を改善するための連続音声分割2023
- Author(s)
  福田りょう, 須藤克仁, 中村哲
- Organizer
  言語処理学会第29回年次大会
- Related Report
  2022 Annual Research Report
[Presentation] エッセイ自動採点における文法特徴と学習者レベルの関係2023
- Author(s)
  土肥康輔，須藤克仁，中村哲
- Organizer
  言語処理学会第29回年次大会
- Related Report
  2022 Annual Research Report
[Presentation] Average Token Delay: 同時翻訳の遅延評価尺度2023
- Author(s)
  加納保昌, 須藤克仁, 中村哲
- Organizer
  言語処理学会第29回年次大会
- Related Report
  2022 Annual Research Report
[Presentation] 同時通訳品質評価方法検討のための同時通訳者と翻訳者の評価比較分析2023
- Author(s)
  蒔苗茉那, 須藤克仁, 中村哲, 松下佳世, 山田優
- Organizer
  言語処理学会第29回年次大会
- Related Report
  2022 Annual Research Report
[Presentation] 非流暢性タグを用いた目的言語テキストによる自由発話の音声翻訳2023
- Author(s)
  胡尤佳, 須藤克仁, 中村哲
- Organizer
  言語処理学会第29回年次大会
- Related Report
  2022 Annual Research Report
[Presentation] 日英翻訳を対象としたイディオム表現の評価指標の提案2023
- Author(s)
  廣瀬惟歩, 渡辺太郎
- Organizer
  言語処理学会第29回年次大会
- Related Report
  2022 Annual Research Report
[Presentation] 摂動を加えた kNN 機械翻訳による多様な翻訳候補の生成2023
- Author(s)
  西田悠人, 森下睦, 上垣外英剛, 渡辺太郎
- Organizer
  言語処理学会第29回年次大会
- Related Report
  2022 Annual Research Report
[Presentation] Detection of Selective Attention processing during Simultaneous Interpretation by EEG Auditory Steady-State response-related Phase-Amplitude Coupling2022
- Author(s)
  Haruko Yagura, Hiroki Tanaka, Katsuhito Sudoh, and Satoshi Nakamura
- Organizer
  NEURO2022
- Related Report
  2022 Annual Research Report
[Presentation] Adapting to Non-Centered Languages for Zero-shot Multilingual Translation2022
- Author(s)
  Zhi Qu
- Organizer
  the 29th International Conference on Computational Linguistics
- Related Report
  2022 Annual Research Report
[Presentation] Sharing Parameter by Conjugation for Knowledge Graph Embeddings in Complex Space2022
- Author(s)
  Xincan Feng
- Organizer
  TextGraphs-16: Graph-based Methods for Natural Language Processing
- Related Report
  2022 Annual Research Report
[Presentation] Phone-informed refinement of synthesized mel spectrogram for data augmentation in speech recognition.2022
- Author(s)
  S.Ueno
- Organizer
  IEEE-ICASSP
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Leveraging simultaneous translation for enhancing transcription of low-resource language via cross attention mechanism.2022
- Author(s)
  K.Soky
- Organizer
  INTERSPEECH
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] End-to-end speech-to-punctuated-text recognition.2022
- Author(s)
  J.Nozaki
- Organizer
  INTERSPEECH
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Non-autoregressive error correction for CTC-based ASR with phone-conditioned masked LM.2022
- Author(s)
  H.Futami
- Organizer
  INTERSPEECH
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Towards the establishment of a quality assessment framework for interpreting performance2022
- Author(s)
  Kayo Matsushita, Masaru Yamada
- Organizer
  Translation in Transition 6 Conference
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Syntactic Cross and Reading Effort in English to Japanese Translation2022
- Author(s)
  Takanori Mizowaki, Haruka Ogawa, Masaru Yamada
- Organizer
  The proceedings of Workshop on Empirical Translation Process Research, The 15th Conference of the Association for Machine Translation in the Americas (AMTA)
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] 注意機構付きVAEを用いたテキスト発話スタイル変換の改良2022
- Author(s)
  吉岡大貴, 安田裕介, 松永悟行, 大谷大和, 戸田智基
- Organizer
  日本音響学会秋季研究発表会
- Related Report
  2022 Annual Research Report
[Presentation] 拡散確率モデルとアライメントモデルを用いた潜在特徴系列変換に基づくテキスト音声合成2022
- Author(s)
  安田裕介, 戸田智基
- Organizer
  日本音響学会秋季研究発表会
- Related Report
  2022 Annual Research Report
[Presentation] Interpretable emotional control for text-to-speech system toward development of sympathetic educational-support robots2022
- Author(s)
  J. Feng, T. Yoshikawa, T. Toda
- Organizer
  日本音響学会秋季研究発表会
- Related Report
  2022 Annual Research Report
[Presentation] Unified source-filter GAN with harmonic-plus-noise source excitation generation2022
- Author(s)
  R. Yoneyama, Y.-C. Wu, T. Toda
- Organizer
  INTERSPEECH
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Interpretable control for emotional text-to-speech system toward development of sympathetic educational-support robots2022
- Author(s)
  J. Feng, T. Yoshikawa, T. Toda
- Organizer
  APSIPA ASC
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] 内容語保存機構を備えた変分自己符号化器に基づくテキスト発話スタイル変換2022
- Author(s)
  吉岡大貴, 安田裕介, 松永悟行, 大谷大和, 戸田智基
- Organizer
  情報処理学会音声言語情報処理研究会
- Related Report
  2022 Annual Research Report
[Presentation] 合成音声の主観評価結果の統計的解析2022
- Author(s)
  安田裕介, 戸田智基
- Organizer
  日本音響学会春季研究発表会
- Related Report
  2022 Annual Research Report
[Presentation] SiFi-GAN：音源フィルタ構造に基づくHiFi-GAN2022
- Author(s)
  米山怜於, Y.-C. Wu, 戸田智基
- Organizer
  日本音響学会春季研究発表会
- Related Report
  2022 Annual Research Report
[Presentation] サイクル学習を用いた注意機構付きVAEによるテキスト発話スタイル変換2022
- Author(s)
  吉岡大貴, 安田裕介, 松永悟行, 大谷大和, 戸田智基
- Organizer
  日本音響学会春季研究発表会
- Related Report
  2022 Annual Research Report
[Presentation] Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing2022
- Author(s)
  H. Qi, S. Novitasari, S. Sakti, S. Nakamura
- Organizer
  INTERSPEECH
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing2022
- Author(s)
  Heli Qi, Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura
- Organizer
  INTERSPEECH
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Representing ‘how you say’ with ‘what you say’:English corpus of focused speech and text reflecting corresponding implications2022
- Author(s)
  Naoaki Suzuki, Satoshi Nakamura
- Organizer
  INTERSPEECH
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation2022
- Author(s)
  Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura
- Organizer
  INTERSPEECH
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Applying Syntax?Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis2022
- Author(s)
  Kei Furukawa, Takeshi Kishiyama, Satoshi Nakamura
- Organizer
  INTERSPEECH
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Simultaneous Neural Machine Translation with Prefix Alignment2022
- Author(s)
  Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura
- Organizer
  19th International Conference on Spoken Language Translation (IWSLT 2022)
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] NAIST Simultaneous Speech-to-Text Translation System for IWSLT 20222022
- Author(s)
  Ryo Fukuda, Yuka Ko, Yasumasa Kano, Kosuke Doi, Hirotaka Tokuyama, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura
- Organizer
  19th International Conference on Spoken Language Translation (IWSLT 2022)
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] どう言ったかを何を言ったかで表す～フォーカスを含んだ発話及びその含意を反映したテキストを含む英語コーパス～2022
- Author(s)
  鱸尚晃, 中村哲
- Organizer
  第24回音声言語シンポジウム（SP/SLP）兼第9回自然言語処理シンポジウム
- Related Report
  2022 Annual Research Report
[Presentation] NAIST同時通訳コーパスの構築：翻訳字幕との比較と通訳経験年数に基づく分析2022
- Author(s)
  土肥康輔，須藤克仁，中村哲
- Organizer
  日本通訳翻訳学会第23回年次大会
- Related Report
  2022 Annual Research Report
[Presentation] 視線情報と比喩度に基づく英語フレーズの理解度推定2022
- Author(s)
  樋笠泰祐，平田明日香，田中啓太郎，森島繁生
- Organizer
  第30回インタラクティブシステムとソフトウェアに関するワークショップ , WISS 2022
- Related Report
  2022 Annual Research Report
[Presentation] Unsupervised Disentanglement of Timbral, Pitch, and Variation Features From Musical Instrument Sounds With Random Perturbation2022
- Author(s)
  Keitaro Tanaka, Yoshiaki Bando, Kazuyoshi Yoshii, and Shigeo Morishima
- Organizer
  APSIPA ASC 2022
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] The Sound of Bounding-Boxes2022
- Author(s)
  Takashi Oya, Shohei Iwase, Shigeo Morishima
- Organizer
  International Conference on Pattern Recognition 2022 , ICPR 2022
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Audio-Driven Violin Performance Animation with Clear Fingering and Bowing2022
- Author(s)
  Asuka Hirata, Keitaro Tanaka, Masatoshi Hamanaka, Shigeo Morishima
- Organizer
  The Premier Conference & Exhibition on Computer Graphics & Interactive Techniques, SIGGRAPH 2022
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] 画像文字からの音声合成2022
- Author(s)
  中野嘉文，佐伯高明，高道慎之介，須藤克仁，猿渡洋
- Organizer
  言語処理学会2022年年次大会
- Related Report
  2021 Annual Research Report
[Presentation] JTubeSpeech: 音声認識と話者照合のためにYouTubeから構築される日本語音声コーパス2022
- Author(s)
  高道慎之介，Kurzinger Ludwig，佐伯高明，塩田さやか，渡部晋治
- Organizer
  言語処理学会2022年年次大会
- Related Report
  2021 Annual Research Report
[Presentation] IWSLT Evaluation Campaign: Simultaneous Speech Translation2022
- Author(s)
  須藤克仁
- Organizer
  情報処理学会第141回音声言語情報処理研究会
- Related Report
  2021 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] Machine Speech Chain による音声聴取生成システムのモデル化の試み2022
- Author(s)
  中村哲
- Organizer
  日本音響学会2022年春季研究発表会
- Related Report
  2021 Annual Research Report
- Invited
[Presentation] 音声機械翻訳のための音声翻訳コーパスに基づく発話分割2022
- Author(s)
  福田りょう, 須藤克仁, 中村哲
- Organizer
  言語処理学会第28回年次大会
- Related Report
  2021 Annual Research Report
[Presentation] 構文ラベル予測による同時ニューラル機械翻訳2022
- Author(s)
  Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura
- Organizer
  言語処理学会第28回年次大会
- Related Report
  2021 Annual Research Report
[Presentation] Masked Language Model による系列確率に基づく文法誤り検出2022
- Author(s)
  土肥康輔，須藤克仁，中村哲
- Organizer
  言語処理学会第28回年次大会
- Related Report
  2021 Annual Research Report
[Presentation] 音声認識出力の曖昧性に頑健な音声翻訳のための音声認識の精度ごとの性能比較2022
- Author(s)
  胡尤佳，須藤克仁，Sakriani Sakti，中村哲
- Organizer
  言語処理学会第28回年次大会
- Related Report
  2021 Annual Research Report
[Presentation] Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network2021
- Author(s)
  Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari
- Organizer
  Proc. ASRU
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] An end-to-end model from speech to clean transcript for parliamentary meetings2021
- Author(s)
  M.Mimura, S.Sakai, and T.Kawahara
- Organizer
  In Proc. APSIPA ASC
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] VAD-free streaming hybrid CTC/Attention ASR for unsegmented recording2021
- Author(s)
  H.Inaguma, M.Mimura, and T.Kawahara
- Organizer
  In Proc. INTERSPEECH
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] StableEmit: Selection probability discount for reducing emission latency of streaming monotonic attention ASR2021
- Author(s)
  H.Inaguma, M.Mimura, and T.Kawahara
- Organizer
  In Proc. INTERSPEECH
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] USING LOCAL PHRASE DEPENDENCY STRUCTURE INFORMATION IN NEURAL SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS2021
- Author(s)
  Nobuyoshi Kaiki, Sakriani Sakti and Satoshi Nakamura
- Organizer
  O-COCOSDA 2021
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Unsupervised Neural-Based Graph Clustering for Variable-Length Speech Representation Discovery of Zero-Resource Languages2021
- Author(s)
  Shun Takahashi, Sakriani Sakti, Satoshi Nakamura
- Organizer
  Proc. Interspeech 2021
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Dynamically Adaptive Machine Speech Chain Inference for TTS in Noisy Environment: Listen and Speak Louder2021
- Author(s)
  Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura
- Organizer
  Proc. Interspeech 2021
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Weakly-supervised Speech-to-text Mapping with Visually Connected Non-parallel Speech-text Data using Cyclic Partially-aligned Transformer2021
- Author(s)
  Johanes Effendi, Sakriani Sakti, Satoshi Nakamura
- Organizer
  Proc. Interspeech 2021
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Transcribing Paralinguistic Acoustic Cues to Target Language Text in Transformer-Based Speech-to-Text Translation2021
- Author(s)
  Hirotaka Tokuyama, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura
- Organizer
  Proc. Interspeech 2021
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Large-Scale English-Japanese Simultaneous Interpretation Corpus: Construction and Analyses with Sentence-Aligned Data2021
- Author(s)
  Kosuke Doi, Katsuhito Sudoh, Satoshi Nakamura
- Organizer
  Proc. IWSLT
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Simultaneous Speech-to-speech Translation System with Transformer-based Incremental ASR, MT, and TTS2021
- Author(s)
  Ryo Fukuda, Sashi Novitasari, Yui Oka, Yasumasa Kano, Yuki Yano, Yuka Ko, Hirotaka Tokuyama, Kosuke Doi, Tomoya Yanagita, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura
- Organizer
  Proc. Oriental COCOSDA, 2021
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] ASR Posterior-Based Loss for Multi-Task End-to-End Speech Translation2021
- Author(s)
  Yuka Ko, Katsuhito Sudoh, Sakriani Sakti, Satoshi Nakamura
- Organizer
  Proc. Interspeech
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Multichannel Audio Source Separation with Independent Deeply Learned Matrix Analysis Using Product of Source Models2021
- Author(s)
  Takuya Hasumi, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari, Daichi Kitamura, Yu Takahashi, Kazunobu Kondo
- Organizer
  Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2021 (APSIPA ASC 2021)
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] 多変量一般化Gauss分布に基づくランク制約付き空間共分散行列推定法における雑音欠落ランク空間基底推定2021
- Author(s)
  近藤祐斗，久保優騎，高宗典玄，北村大地，猿渡洋
- Organizer
  日本音響学会2021秋季研究発表会
- Related Report
  2021 Annual Research Report
[Presentation] Product of Priors型確率分布を導入した音源モデルに基づく独立深層学習行列分析による多チャネル音源分離2021
- Author(s)
  蓮実拓也，中村友彦，高宗典玄，猿渡洋，北村大地，高橋祐，近藤多伸
- Organizer
  日本音響学会2021秋季研究発表会
- Related Report
  2021 Annual Research Report
[Presentation] ヘビーテイル生成モデルに基づく独立深層学習テンソル分析2021
- Author(s)
  成澤直輝，池下林太郎，高宗典玄，北村大地，中村友彦，猿渡洋，中谷智広
- Organizer
  日本音響学会2021秋季研究発表会
- Related Report
  2021 Annual Research Report
[Presentation] 独立深層学習行列分析を用いたランク制約付き空間共分散行列推定による音声強調2021
- Author(s)
  三澤颯大，中村友彦，高宗典玄，北村大地，猿渡洋
- Organizer
  日本音響学会2021秋季研究発表会
- Related Report
  2021 Annual Research Report
[Presentation] ドメイン適応と話者一致損失を用いた話者適応によるクロスリンガル音声合成2021
- Author(s)
  辛徳泰，齋藤佑樹，高道慎之介，郡山知樹，猿渡洋
- Organizer
  日本音響学会2021秋季研究発表会
- Related Report
  2021 Annual Research Report
[Presentation] 大規模言語モデルの知識蒸留によるコンテキスト推定モデルを用いた低遅延逐次音声合成2021
- Author(s)
  佐伯高明，高道慎之介，猿渡洋
- Organizer
  日本音響学会2021秋季研究発表会
- Related Report
  2021 Annual Research Report
[Presentation] ASR rescoring and confidence estimation with ELECTRA2021
- Author(s)
  H.Futami, H.Inaguma, M.Mimura, S.Sakai, and T.Kawahara
- Organizer
  IEEE Workshop Automatic Speech Recognition & Understanding (ASRU)
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Data augmentation for ASR using TTS via a discrete representation2021
- Author(s)
  S.Ueno, M.Mimura, S.Sakai, and T.Kawahara
- Organizer
  IEEE Workshop Automatic Speech Recognition & Understanding (ASRU)
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Light Source Selection in Primary Sample Space Neural Photon Sampling2021
- Author(s)
  Yuta tsuji, Tatsuya Yatagawa, Shigeo Morishima
- Organizer
  The 14th ACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia 2021
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Low-latency real-time non-parallel voice conversion based on cyclic variational autoencoder and multiband WaveRNN with data-driven linear prediction2021
- Author(s)
  Patrick Lumban Tobing, Tomoki Toda
- Organizer
  11th ISCA Speech Synthesis Workshop (SSW)
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] High-fidelity and low-latency universal neural vocoder based on multiband WaveRNN with data-driven linear prediction for discrete waveform modeling2021
- Author(s)
  Patrick Lumban Tobing, Tomoki Toda
- Organizer
  INTERSPEECH
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Relational data selection for data augmentation of speaker-dependent multi-band MelGAN vocoder2021
- Author(s)
  Yi-Chiao Wu, Cheng-Hung Hu, Hung-Shin Lee, Yu-Huai Peng, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda
- Organizer
  INTERSPEECH
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] NAIST English-to-Japanese Simultaneous Translation System for IWSLT 2021 Simultaneous Text-to-text Task2021
- Author(s)
  Ryo Fukuda, Yui Oka, Yasumasa Kano, Yuki Yano, Yuka Ko, Hirotaka Tokuyama, Kosuke Doi, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura
- Organizer
  the 18th International Conference on Spoken Language Translation (IWSLT 2021)
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] On Knowledge Distillation for Translating Erroneous Speech Transcriptions2021
- Author(s)
  Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura
- Organizer
  the 18th International Conference on Spoken Language Translation
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Recent Advances in Speech Translation2021
- Author(s)
  Satoshi Nakamura, with Katsuhito Sudo, Sakriani Sakti, Ryo Fukuda, Sashi Novitasari, Tomoya Yanagita, Kosuke Doi, Yasumasa Kano, Yuki Yano, Hirotaka Tokuyama, Yui Oka
- Organizer
  AI Innovation Summit 2021
- Related Report
  2021 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] Improving Intelligibility of Synthesized Speech in Noisy Condition with Dynamically Adaptive Machine Speech Chain2021
- Author(s)
  Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura
- Organizer
  SIG-SLP 2021
- Related Report
  2021 Annual Research Report
[Presentation] 局所的な句構造の情報を用いたニューラル音声合成2021
- Author(s)
  海木延佳, サクティサクリアニ, 中村哲
- Organizer
  音学シンポジウム2021
- Related Report
  2021 Annual Research Report
[Presentation] ゼロ資源状況におけるサブワード単位の獲得にむけてグラフニューラルネットワークを用いた手法2021
- Author(s)
  高橋舜、サクリアニサクティ、中村哲
- Organizer
  2021年度人工知能学会全国大会 (第35回)
- Related Report
  2021 Annual Research Report
[Book] Utilizing remote simultaneous interpreting data for interpreting quality assessment A corpus-based study2023
- Author(s)
  Masaru Yamada, Kayo Matsushita, Hiroyuki Ishizuka
- Total Pages
  17
- Publisher
  Routledge
- Related Report
  2023 Annual Research Report
[Patent(Industrial Property Rights)] 音声合成装置、音声合成方法及び音声合成プログラム2022
- Inventor(s)
  高道慎之介, 佐伯高明, 猿渡洋
- Industrial Property Rights Holder
  高道慎之介, 佐伯高明, 猿渡洋
- Industrial Property Rights Type
  特許
- Industrial Property Number
  2022-020534
- Filing Date
  2022
- Related Report
  2021 Annual Research Report

A Study on Multi-modal Automatic Simultaneous Interpretation System and Evaluation Method

Principal Investigator

中村 哲 奈良先端科学技術大学院大学, 研究推進機構, 特任教授 (30263429)

¥189,280,000 (Direct Cost: ¥145,600,000、Indirect Cost: ¥43,680,000)

Current Status of Research Progress

Reason

Interim Assessment Comments (Rating)

Report

Research Products

[Journal Article] Emotion-controllable Speech Synthesis using Emotion Soft Label, Utterance-level Prosodic Factors, and Word-level Prominence2024

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis2024

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Improving Speech Translation Accuracy and Time Efficiency With Fine-Tuned wav2vec 2.0-Based Speech Segmentation2024

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Prefix Alignment for Training Simultaneous Machine Translation2024

Author(s)

Journal Title

DOI

ISSN

Related Report

[Journal Article] Sound Field Interpolation for Rotation-Invariant Multichannel Array Signal Processing2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] PoP-IDLMA: Product-of-Prior Independent Deeply Learned Matrix Analysis for Multichannel Music Source Separation2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Content Order-Controllable MR-to-Text2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] End-to-End Generation of Written-style Transcript of Speech from Parliamentary Meetings2023

Author(s)

Journal Title

DOI

ISSN

Related Report

[Journal Article] Japanese Neural Incremental Text-to-Speech Synthesis Framework With an Accent Phrase Input2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Synthesis Unit for Japanese Incremental Text-to-Speech2022

Author(s)

Journal Title

DOI

Year and Date

Related Report

[Journal Article] Deficient-basis-complementary rank-constrained spatial covariance matrix estimation based on multivariate generalized Gaussian distribution for blind speech extraction2022

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Neural Machine Translation with Synchronous Latent Phrase Structure2022

Author(s)

Journal Title

DOI

ISSN

Related Report

[Journal Article] TriECCC: Trilingual Corpus of the Extraordinary Chambers in the Courts of Cambodia for Speech Recognition and Translation Studies2022

Author(s)

Journal Title

DOI

Related Report

[Journal Article] A cyclical approach to synthetic and natural speech mismatch refinement of neural post-filter for low-cost text-to-speech system2022

Author(s)

中村哲奈良先端科学技術大学院大学, 研究推進機構, 特任教授 (30263429)