• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Modality Crossing Based on Latent Structural Understanding in Multimodal Dialogue Translation

Planned Research

Project AreaEmbodied Semiotics: Understanding Gesture and Sign Language in Language Interaction
Project/Area Number 22H05015
Research Category

Grant-in-Aid for Transformative Research Areas (B)

Allocation TypeSingle-year Grants
Review Section Transformative Research Areas, Section (I)
Research InstitutionThe University of Tokyo

Principal Investigator

中山 英樹  東京大学, 大学院情報理工学系研究科, 准教授 (00643305)

Co-Investigator(Kenkyū-buntansha) 佐藤 真一  国立情報学研究所, コンテンツ科学研究系, 教授 (90249938)
西田 典起  国立研究開発法人理化学研究所, 革新知能統合研究センター, 研究員 (50890589)
Project Period (FY) 2022-05-20 – 2025-03-31
Project Status Granted (Fiscal Year 2024)
Budget Amount *help
¥35,880,000 (Direct Cost: ¥27,600,000、Indirect Cost: ¥8,280,000)
Fiscal Year 2024: ¥11,700,000 (Direct Cost: ¥9,000,000、Indirect Cost: ¥2,700,000)
Fiscal Year 2023: ¥11,700,000 (Direct Cost: ¥9,000,000、Indirect Cost: ¥2,700,000)
Fiscal Year 2022: ¥12,480,000 (Direct Cost: ¥9,600,000、Indirect Cost: ¥2,880,000)
Keywordsマルチモーダル / 手話認識 / インタラクション / 深層学習 / 大規模言語モデル / 手話翻訳 / 転移学習 / 対話インタラクション / 動画像認識 / 検索拡張 / 対話理解 / 画像生成 / 画像認識 / 自然言語処理 / クロスモーダル / 機械翻訳
Outline of Research at the Start

我々が普段何気なく行っている会話やインタラクションでは、発話に加えジェスチャーや表情など、さまざまな感覚表現を総合的に活用してコミュニケーションをとっています。このように複数の感覚を統合活用する仕組みをマルチモーダルと呼びますが、その仕組みは未だ十分に解明されていません。本研究では、人文系研究者と学際的に連携しながら、マルチモーダルな対話翻訳を可能とするAIを開発することを目的とします。これにより、手話の同時通訳などの先進的なアプリケーションを実現し、インクルーシブな社会の発展に貢献することを目指します。

Outline of Annual Research Achievements

本年度は、手話翻訳を中心に研究を進めた。まず、前年度に引き続き、深層学習による手話翻訳のベースライン手法の追実装と調整を行い、既存の欧米圏の手話翻訳データセットにおいて良好な精度を再現することに成功した。同時に、この実験を通じて得られた知見から、現状の日本手話(JSL)話し言葉コーパスは深層学習を実行するためには小さすぎることが分かった。この問題に対応するために二つの方向性から新たな研究を行った。
第一に、より大規模なJSLの日本語対訳付きコーパスの構築を開始した。具体的には、Youtubeの手話動画を収集し、動画に対応した字幕情報を対訳文として抽出することで、コーパスを構築する。本年度は、おおよそ11万件の手話動画・日本語対訳文のペアデータが得られており、このコーパスの基礎的な分析と手話翻訳モデルの構築を完了した。このようにして自動的に構築されるコーパスは必ずしも良質とは言えないが、量の面で深層学習を下支えすると期待できる。
第二に、豊富にデータが存在するドメインで構築されたモデルを、データが少ないドメインで活用する転移学習のアプローチも研究を進めた。ここでは、データが少ないドメインとしてアイルランド手話を対象とし、さまざまな大規模データセットからの転移性能を調査した[Holmes+, ICCVW'23]。本成果により得られた知見は、整ったデータが少ない日本手話においても有効であると考えられる。
以上に加え、手話翻訳の基盤となる、より一般的かつ基礎的な技術開発においても顕著な進展を得ており、検索拡張による画像キャプショニングへの外部知識の導入[Vo+, CVPR'23][Li+, CVPR'24]や、対話エージェントの個人性に関する調査研究[Chen+, LREC-COLING'24]等の成果論文が採択されている。

Current Status of Research Progress
Current Status of Research Progress

2: Research has progressed on the whole more than it was originally planned.

Reason

日本手話翻訳について、当初目標としていたソフトウェア実装にはまだ至っていないが、データの少なさという本質的な問題に対し、大規模データセット構築と転移学習という二つの方向性を定め、それぞれ研究が進展したことは大きな成果であると考える。特に後者は既に論文採択に至っており、確かな技術的基盤が確立できたといえる。
また、手話翻訳AIを下支えする深層学習基盤技術についても多くの進展があり、複数のトップ国際会議やトップジャーナルへ論文が採択されるなど顕著な成果が得られている。特に、今年度は大規模言語モデルの隆盛により人工知能研究を取り巻く環境が劇的に変化したが、そのような中でいち早く大規模言語モデルを研究に取り込み、手話翻訳へ活用する道筋が得られていることは特筆に値する。
以上総合的に見て、本年度は目標達成へ向けた十分な成果が得られており、順調に研究が進展していると考える。

Strategy for Future Research Activity

次年度は最終年度であるため、これまでに得られた多数の知見や技術的蓄積をとりまとめ研究として完成させ、成果の発表と公開を行う。まず、本年度に初版として開発したWebベース大規模手話動画データセットを質・量の両面で更に向上させ、最終バージョンとして完成させる。特に、現状では日本手話と日本語対応手話の区別がなされていないため、ろう者によるアノテーションを付与することで、より実際のろう者のコミュニケーションに踏み込んだデータセットとしていくことを目指す。完成されたデータセットを利用して、手話認識のためのマルチモーダル基盤モデルを構築する。さらに、基盤モデルを研究領域で提供される「次世代手話コーパス」や「次世代身振りコーパス」などの詳細かつ高品質なデータを用い、本年度開発した転移学習法[Holmes+, ICCVW'23]を適用することで、それぞれのタスクに最適化された高精度なマルチモーダル対話翻訳モデルを完成させる。これに限らず、文脈内学習や検索拡張生成[Li+, CVPR'24]など、小規模データの活用に関してこれまでの成果の中で得られている他の有望な技術も検討し、比較評価を行う。
以上に述べた大規模手話動画データセットやマルチモーダル対話翻訳システムを完成させ、論文投稿を行うと共に、ソフトウェアを一般に公開する。

Report

(2 results)
  • 2023 Annual Research Report
  • 2022 Annual Research Report
  • Research Products

    (45 results)

All 2024 2023 2022 Other

All Int'l Joint Research (10 results) Journal Article (18 results) (of which Int'l Joint Research: 13 results,  Peer Reviewed: 18 results,  Open Access: 14 results) Presentation (15 results) (of which Int'l Joint Research: 15 results,  Invited: 1 results) Book (1 results) Remarks (1 results)

  • [Int'l Joint Research] トリニティ・カレッジ・ダブリン(アイルランド)

    • Related Report
      2023 Annual Research Report
  • [Int'l Joint Research] ゲント大学(ベルギー)

    • Related Report
      2023 Annual Research Report
  • [Int'l Joint Research] トレント大学(イタリア)

    • Related Report
      2023 Annual Research Report
  • [Int'l Joint Research] 武漢大学(中国)

    • Related Report
      2023 Annual Research Report
  • [Int'l Joint Research] 南洋理工大学(シンガポール)

    • Related Report
      2023 Annual Research Report
  • [Int'l Joint Research]

    • Related Report
      2023 Annual Research Report
  • [Int'l Joint Research] Wuhan University/Wuhan University of Sci. Tech./Sichuan University(中国)

    • Related Report
      2022 Annual Research Report
  • [Int'l Joint Research] Nanyang Technological University(シンガポール)

    • Related Report
      2022 Annual Research Report
  • [Int'l Joint Research] University of California, Los Angeles(米国)

    • Related Report
      2022 Annual Research Report
  • [Int'l Joint Research] National Yang Ming Chiao Tung Univ.(台湾)/National Taiwan University(台湾)/National Tsing Hua University(台湾)(その他の国・地域)

    • Related Report
      2022 Annual Research Report
  • [Journal Article] Label Augmentation as Inter-class Data Augmentation for Conditional Image Synthesis with Imbalanced Data2024

    • Author(s)
      Katsumata Kai、Vo Duc Minh、Nakayama Hideki
    • Journal Title

      2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

      Volume: - Pages: 4932-4941

    • DOI

      10.1109/wacv57701.2024.00487

    • Related Report
      2023 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and Uncurated Unlabeled Data2024

    • Author(s)
      Katsumata Kai、Vo Duc Minh、Harada Tatsuya、Nakayama Hideki
    • Journal Title

      2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

      Volume: - Pages: 5311-5320

    • DOI

      10.1109/wacv57701.2024.00524

    • Related Report
      2023 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations2024

    • Author(s)
      Yi-Pei Chen, Noriki Nishida, Hideki Nakayama, Yuji Matsumoto
    • Journal Title

      Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING)

      Volume: -

    • Related Report
      2023 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension2024

    • Author(s)
      Jiaxuan Li, Duc Minh Vo, Akihiro Sugimoto, Hideki Nakayama
    • Journal Title

      Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

      Volume: -

    • Related Report
      2023 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] シソーラスの階層的構造を利用した弱教師あり固有表現抽出2024

    • Author(s)
      芝原隆善, 山田育矢, 西田典起, 寺西裕紀, 古崎昇司, 松本裕治
    • Journal Title

      自然言語処理

      Volume: 31巻3号

    • Related Report
      2023 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Improving Deep Metric learning via Self-distillation and Online Batch Diffusion Process2024

    • Author(s)
      Zelong Zeng, Fan Yang, Hong Liu, Shin'ichi Satoh
    • Journal Title

      Visual Intelligence

      Volume: -

    • Related Report
      2023 Annual Research Report
    • Peer Reviewed
  • [Journal Article] LED: A Dataset for Life Event Extraction from Dialogs2023

    • Author(s)
      Yi-Pei Chen, An-Zi Yen, Hen-Hsen Huang, Hideki Nakayama, Hsin-Hsi Chen
    • Journal Title

      Findings of the Association for Computational Linguistics: EACL 2023

      Volume: - Pages: 384-398

    • DOI

      10.18653/v1/2023.findings-eacl.29

    • Related Report
      2023 Annual Research Report 2022 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] A-CAP: Anticipation Captioning with Commonsense Knowledge2023

    • Author(s)
      Vo Duc Minh、Luong Quoc-An、Sugimoto Akihiro、Nakayama Hideki
    • Journal Title

      2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

      Volume: 1 Pages: 10824-10833

    • DOI

      10.1109/cvpr52729.2023.01042

    • Related Report
      2023 Annual Research Report 2022 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] From Scarcity to Understanding: Transfer Learning for the Extremely Low Resource Irish Sign Language2023

    • Author(s)
      Ruth Holmes, Ellen Rushe, Mathieu De Coster, Maxim Bonnaerens, Shin'ichi Satoh, Akihiro Sugimoto, Anthony Ventresque
    • Journal Title

      Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)

      Volume: - Pages: 2000-2009

    • DOI

      10.1109/iccvw60793.2023.00215

    • Related Report
      2023 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Referring Image Segmentation via Joint Mask Contextual Embedding Learning and Progressive Alignment Network2023

    • Author(s)
      Ziling Huang, Shin'ichi Satoh
    • Journal Title

      Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

      Volume: - Pages: 7753-7762

    • DOI

      10.18653/v1/2023.emnlp-main.481

    • Related Report
      2023 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Beyond Domain Gap: Exploiting Subjectivity in Sketch-Based Person Retrieval2023

    • Author(s)
      Lin Kejun, Zhixiang Wang, Zheng Wang, Yinqiang Zheng, Shin'ichi Satoh
    • Journal Title

      Proceedings of the 31st ACM International Conference on Multimedia

      Volume: - Pages: 2078-2089

    • DOI

      10.1145/3581783.3611732

    • Related Report
      2023 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Mitigating robust overfitting via self-residual-calibration regularization2023

    • Author(s)
      Liu Hong、Zhong Zhun、Sebe Nicu、Satoh Shin'ichi
    • Journal Title

      Artificial Intelligence

      Volume: 317 Pages: 103877-103877

    • DOI

      10.1016/j.artint.2023.103877

    • Related Report
      2023 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Progressive Motion Boosting for Video Frame Interpolation2023

    • Author(s)
      Jing Xiao, Kangmin Xu, Mengshun Hu, Liang Liao, Zheng Wang, Chia-Wen Lin, Mi Wang, Shin'ichi Satoh
    • Journal Title

      IEEE Transactions on Multimedia

      Volume: 25 Pages: 8076-8090

    • DOI

      10.1109/tmm.2022.3233310

    • Related Report
      2023 Annual Research Report 2022 Annual Research Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] Indirect Adversarial Losses via an Intermediate Distribution for Training GANs2023

    • Author(s)
      Rui Yang, Duc Minh Vo, Hideki Nakayama
    • Journal Title

      Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

      Volume: - Pages: 4641-4650

    • DOI

      10.1109/wacv56688.2023.00463

    • Related Report
      2022 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Towards Robust Person Re-identification by Defending Against Universal Attackers2023

    • Author(s)
      Fengxiang Yang, Juanjuan Weng, Zhun Zhong, Hong Liu, Zheng Wang, Zhiming Luo, Donglin Cao, Shaozi Li, Shin'ichi Satoh, Nicu Sebe
    • Journal Title

      IEEE Transactions on Pattern Analysis and Machine Intelligence

      Volume: 45 Pages: 5218-5235

    • DOI

      10.1109/tpami.2022.3199013

    • Related Report
      2022 Annual Research Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] Only a Few Classes Confusing: Pixel-wise Candidate Labels Disambiguation for Foggy Scene Understanding2023

    • Author(s)
      Liang Liao, Chen Wenyi, Zhen Zhang, Jing Xiao, Yan Yang, Chia-Wen Lin, and Shin'ichi Satoh
    • Journal Title

      Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI)

      Volume: -

    • Related Report
      2022 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Character-Centric Story Visualization via Visual Planning and Token Alignment2022

    • Author(s)
      Hong Chen, Rujun Han, Te-Lin Wu, Hideki Nakayama, Nanyun Peng
    • Journal Title

      Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)

      Volume: - Pages: 8259-8272

    • Related Report
      2022 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Towards Causality Inference for Very Important Person Localization2022

    • Author(s)
      Xiao Wang, Zheng Wang, Wu Liu, Xin Xu, Qijun Zhao, Shin'ichi Satoh
    • Journal Title

      Proceedings of the 30th ACM International Conference on Multimedia (ACMMM)

      Volume: - Pages: 6618-6626

    • DOI

      10.1145/3503161.3548014

    • Related Report
      2022 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Presentation] Label Augmentation as Inter-class Data Augmentation for Conditional Image Synthesis with Imbalanced Data2024

    • Author(s)
      Kai Katsumata, Duc Minh Vo, Hideki Nakayama
    • Organizer
      The 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and Uncurated Unlabeled Data2024

    • Author(s)
      Kai Katsumata, Duc Minh Vo, Tatsuya Harada, Hideki Nakayama
    • Organizer
      The 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations2024

    • Author(s)
      Yi-Pei Chen, Noriki Nishida, Hideki Nakayama, Yuji Matsumoto
    • Organizer
      The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING)
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research
  • [Presentation] EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension2024

    • Author(s)
      Jiaxuan Li, Duc Minh Vo, Akihiro Sugimoto, Hideki Nakayama
    • Organizer
      The 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research
  • [Presentation] LLM as a Scorer: The Impact of Output Order on Dialogue Evaluation2024

    • Author(s)
      Kuanchao Chu, Yi-Pei Chen, Hideki Nakayama
    • Organizer
      AAAI 2024 Spring Symposium on User-Aligned Assessment of Adaptive AI Systems
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research
  • [Presentation] A-CAP: Anticipation Captioning with Commonsense Knowledge2023

    • Author(s)
      Duc Minh Vo, Quoc-An Luong, Akihiro Sugimoto, Hideki Nakayama
    • Organizer
      The 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
    • Related Report
      2023 Annual Research Report 2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] From Scarcity to Understanding: Transfer Learning for the Extremely Low Resource Irish Sign Language2023

    • Author(s)
      Ruth Holmes, Ellen Rushe, Mathieu De Coster, Maxim Bonnaerens, Shin'ichi Satoh, Akihiro Sugimoto, Anthony Ventresque
    • Organizer
      The 11th Workshop on Assistive Computer Vision and Robotics
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Referring Image Segmentation via Joint Mask Contextual Embedding Learning and Progressive Alignment Network2023

    • Author(s)
      Ziling Huang, Shin'ichi Satoh
    • Organizer
      The 2023 Conference on Empirical Methods in Natural Language Processing
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Beyond Domain Gap: Exploiting Subjectivity in Sketch-Based Person Retrieval2023

    • Author(s)
      Lin Kejun, Zhixiang Wang, Zheng Wang, Yinqiang Zheng, Shin'ichi Satoh
    • Organizer
      The 31st ACM International Conference on Multimedia
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research
  • [Presentation] WalkingDynamicsH36M: a Benchmarking Dataset for Long-term Motion and Trajectory Forecasting2023

    • Author(s)
      Cecilia Curreli, Andreu Girbau, and Shin'ichi Satoh
    • Organizer
      The 5th IEEE/CVF CVPR Precognition Workshop
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Indirect Adversarial Losses via an Intermediate Distribution for Training GANs2023

    • Author(s)
      Rui Yang, Duc Minh Vo, Hideki Nakayama
    • Organizer
      The 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Only a Few Classes Confusing: Pixel-wise Candidate Labels Disambiguation for Foggy Scene Understanding2023

    • Author(s)
      Liang Liao, Chen Wenyi, Zhen Zhang, Jing Xiao, Yan Yang, Chia-Wen Lin, and Shin'ichi Satoh
    • Organizer
      The 37th AAAI Conference on Artificial Intelligence (AAAI)
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Character-Centric Story Visualization via Visual Planning and Token Alignment2022

    • Author(s)
      Hong Chen, Rujun Han, Te-Lin Wu, Hideki Nakayama, Nanyun Peng
    • Organizer
      The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Towards Causality Inference for Very Important Person Localization2022

    • Author(s)
      Xiao Wang, Zheng Wang, Wu Liu, Xin Xu, Qijun Zhao, Shin'ichi Satoh
    • Organizer
      The 30th ACM International Conference on Multimedia (ACMMM)
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Incorporating External Knowledge for Vision and Language Systems2022

    • Author(s)
      Hideki Nakayama
    • Organizer
      2nd Workshop on Trends and Advances in Machine Learning and Automated Reasoning for Intelligent Robots and Systems (in conjunction with IROS 2022)
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research / Invited
  • [Book] 深層学習からマルチモーダル情報処理へ2022

    • Author(s)
      中山 英樹、二反田 篤史、田村 晃裕、井上 中順、牛久 祥孝
    • Total Pages
      248
    • Publisher
      サイエンス社
    • ISBN
      9784781915548
    • Related Report
      2022 Annual Research Report
  • [Remarks] 身体記号学 領域ホームページ

    • URL

      https://research.nii.ac.jp/EmSemi/index.html

    • Related Report
      2023 Annual Research Report

URL: 

Published: 2022-05-25   Modified: 2024-12-25  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi