Discourse parsing for videos and its application to summarization

Research Project

Project/Area Number	21H03505
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	NTT Communication Science Laboratories
Principal Investigator	Hirao Tsutomu 日本電信電話株式会社NTTコミュニケーション科学基礎研究所, 協創情報研究部, 主任研究員 (40396148)
Co-Investigator(Kenkyū-buntansha)	木村昭悟日本電信電話株式会社NTTコミュニケーション科学基礎研究所, メディア情報研究部, 主幹研究員 (10396202) 奥村学東京工業大学, 科学技術創成研究院, 教授 (60214079)
Project Period (FY)	2021-04-01 – 2024-03-31
Project Status	Completed (Fiscal Year 2023)
Budget Amount *help	¥17,290,000 (Direct Cost: ¥13,300,000、Indirect Cost: ¥3,990,000) Fiscal Year 2023: ¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000) Fiscal Year 2022: ¥5,460,000 (Direct Cost: ¥4,200,000、Indirect Cost: ¥1,260,000) Fiscal Year 2021: ¥7,800,000 (Direct Cost: ¥6,000,000、Indirect Cost: ¥1,800,000)
Keywords	自然言語処理 / 視覚と言語 / 修辞構造解析 / 談話構造解析 / マルチモーダル要約 / マルチモーダル / ビデオキャプショニング / 自動要約 / キャプショニング
Outline of Research at the Start	膨大な量の動画が日々作成・蓄積される現代では，ユーザが所望の動画に効率的にアクセスするための技術の需要が高まっている.本研究では動画のイベント間の関係性を明らかにするための動画の談話構造解析技術を確立することを目的とする.動画の談話構造を明らかにすることで，イベント間の関係に焦点をあてた検索や，動画のストーリを把握しやすいビデオサムネイル生成などの応用が期待できる.具体的には，(1) 動画をイベントに分割しキャプションを生成する技術，(2)画像と対応するキャプションの双方から得られる特徴を用いて談話構造を解析する技術，(3)談話構造に基づき動画とテキストの要約を生成する技術の研究に取り組む.
Outline of Final Research Achievements	Videos that convey a story contain several events, and the relationships between these events contribute to the overall story of the video. Analyzing the relationships between such events helps improve video understanding and the performance of downstream tasks such as summarization and Video QA. In this research, we represent the underlying story structure of videos as trees based on Rhetorical Structure Theory, construct a dataset for training and evaluating parsers, and investigate the performance of baseline parsers. The results showed that transferring textual knowledge to the parser's encoder is effective. Furthermore, we demonstrated that the rhetorical structure of videos is beneficial for multimodal summarization.
Academic Significance and Societal Importance of the Research Achievements	SNSの発展に伴いインターネット上に投稿される動画は増加の一途をたどっている．しかし，テキストとは異なり，自然言語でそれらを検索することや概要を簡単に把握することは困難であり，人間の情報アクセスを支援する仕組みが必要である．動画の修辞構造を明らかにする研究成果はこうした課題の解決に貢献するという点で大きな意義がある．また，学術的にも視覚と言語の融合に基づく談話構造解析という新しい研究課題であり，その達成に向けた研究成果の意義は高い．

Report

(4 results)

2023 Annual Research Report Final Research Report ( PDF )
2022 Annual Research Report
2021 Annual Research Report

Research Products
(10 results)

All 2024 2023 2022 2021

All Journal Article (1 results) (of which Peer Reviewed: 1 results) Presentation (9 results) (of which Int'l Joint Research: 3 results)

[Journal Article] Neural RST-Style Discourse Parsing Exploiting Agreement Sub-trees as Silver Data2022
- Author(s)
  小林尚輝, 平尾努, 上垣外英剛, 奥村学, 永田昌明
- Journal Title
  
  Journal of Natural Language Processing
  
  Volume: 29 Issue: 3 Pages: 875-900
- DOI
  10.5715/jnlp.29.875
- ISSN
  1340-7619, 2185-8314
- Related Report
  2022 Annual Research Report
- Peer Reviewed
[Presentation] Can we obtain significant success in RST discourse parsing by using Large Language Models?2024
- Author(s)
  Aru Maekawa, Tsutomu Hirao, Hidetaka Kamigaito, Manabu Okumura
- Organizer
  Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] 大規模言語モデルによるシフト還元修辞構造解析の模倣2024
- Author(s)
  前川在, 平尾努, 上垣外英剛, 奥村学
- Organizer
  言語処理学会第30回年次大会
- Related Report
  2023 Annual Research Report
[Presentation] 動画談話構造解析：ベースライン解析器とその分析2023
- Author(s)
  平尾努, 小林尚輝, 上垣外英剛, 奥村学, 木村昭悟
- Organizer
  第26回画像の認識・理解シンポジウム
- Related Report
  2023 Annual Research Report
[Presentation] 逆翻訳を利用したデータ拡張による文間の修辞構造解析の改善2023
- Author(s)
  前川在, 小林尚輝, 平尾努, 上垣外英剛, 奥村学
- Organizer
  言語処理学会第29回年次大会
- Related Report
  2022 Annual Research Report
[Presentation] 動画談話構造解析へ向けたデータセット構築2022
- Author(s)
  平尾努, 小林尚輝, 上垣外英剛, 奥村学, 木村昭悟
- Organizer
  第25回画像の認識・理解シンポジウム
- Related Report
  2022 Annual Research Report
[Presentation] A Simple and Strong Baseline for End-to-End Neural RST-style Discourse Parsing2022
- Author(s)
  Naoki Kobayashi, Tsutomu Hirao, Hidetaka Kamigaito, Manabu Okumura, Masaaki Nagara
- Organizer
  Findings of the Association for Computational Linguistics: EMNLP 2022
- Related Report
  2022 Annual Research Report
[Presentation] 言語モデルと解析戦略の観点からの修辞構造解析器の比較2022
- Author(s)
  小林尚輝, 平尾努, 上垣外英剛, 奥村学, 永田昌明
- Organizer
  言語処理学会第28回年次大会
- Related Report
  2021 Annual Research Report
[Presentation] Improving Neural RST Parsing Model with Silver Agreement Subtrees2021
- Author(s)
  Naoki Kobayashi, Tsutomu Hirao, Hidetaka Kamigaito, Manabu Okumura, Masaaki Nagata
- Organizer
  Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] A Language Model-based Generative Classifier for Sentence-level Discourse Parsing2021
- Author(s)
  Ying Zhang, Hidetaka Kamigaito, Manabu Okumura
- Organizer
  Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
- Related Report
  2021 Annual Research Report
- Int'l Joint Research

Discourse parsing for videos and its application to summarization

Principal Investigator

Hirao Tsutomu 日本電信電話株式会社NTTコミュニケーション科学基礎研究所, 協創情報研究部, 主任研究員 (40396148)

¥17,290,000 (Direct Cost: ¥13,300,000、Indirect Cost: ¥3,990,000)

Report

Research Products

[Journal Article] Neural RST-Style Discourse Parsing Exploiting Agreement Sub-trees as Silver Data2022

Author(s)

Journal Title

DOI

ISSN

Related Report

[Presentation] Can we obtain significant success in RST discourse parsing by using Large Language Models?2024

Author(s)

Organizer

Related Report

[Presentation] 大規模言語モデルによるシフト還元修辞構造解析の模倣2024

Author(s)

Organizer

Related Report

[Presentation] 動画談話構造解析：ベースライン解析器とその分析2023

Author(s)

Organizer

Related Report

[Presentation] 逆翻訳を利用したデータ拡張による文間の修辞構造解析の改善2023

Author(s)

Organizer

Related Report

[Presentation] 動画談話構造解析へ向けたデータセット構築2022

Author(s)

Organizer

Related Report

[Presentation] A Simple and Strong Baseline for End-to-End Neural RST-style Discourse Parsing2022

Author(s)

Organizer

Related Report

[Presentation] 言語モデルと解析戦略の観点からの修辞構造解析器の比較2022

Author(s)

Organizer

Related Report

[Presentation] Improving Neural RST Parsing Model with Silver Agreement Subtrees2021

Author(s)

Organizer

Related Report

[Presentation] A Language Model-based Generative Classifier for Sentence-level Discourse Parsing2021

Author(s)

Organizer

Related Report