• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2023 Fiscal Year Final Research Report

Discourse parsing for videos and its application to summarization

Research Project

  • PDF
Project/Area Number 21H03505
Research Category

Grant-in-Aid for Scientific Research (B)

Allocation TypeSingle-year Grants
Section一般
Review Section Basic Section 61030:Intelligent informatics-related
Research InstitutionNTT Communication Science Laboratories

Principal Investigator

Hirao Tsutomu  日本電信電話株式会社NTTコミュニケーション科学基礎研究所, 協創情報研究部, 主任研究員 (40396148)

Co-Investigator(Kenkyū-buntansha) 木村 昭悟  日本電信電話株式会社NTTコミュニケーション科学基礎研究所, メディア情報研究部, 主幹研究員 (10396202)
奥村 学  東京工業大学, 科学技術創成研究院, 教授 (60214079)
Project Period (FY) 2021-04-01 – 2024-03-31
Keywords自然言語処理 / 視覚と言語 / 修辞構造解析
Outline of Final Research Achievements

Videos that convey a story contain several events, and the relationships between these events contribute to the overall story of the video. Analyzing the relationships between such events helps improve video understanding and the performance of downstream tasks such as summarization and Video QA. In this research, we represent the underlying story structure of videos as trees based on Rhetorical Structure Theory, construct a dataset for training and evaluating parsers, and investigate the performance of baseline parsers. The results showed that transferring textual knowledge to the parser's encoder is effective. Furthermore, we demonstrated that the rhetorical structure of videos is beneficial for multimodal summarization.

Free Research Field

自然言語処理

Academic Significance and Societal Importance of the Research Achievements

SNSの発展に伴いインターネット上に投稿される動画は増加の一途をたどっている.しかし,テキストとは異なり,自然言語でそれらを検索することや概要を簡単に把握することは困難であり,人間の情報アクセスを支援する仕組みが必要である.動画の修辞構造を明らかにする研究成果はこうした課題の解決に貢献するという点で大きな意義がある.また,学術的にも視覚と言語の融合に基づく談話構造解析という新しい研究課題であり,その達成に向けた研究成果の意義は高い.

URL: 

Published: 2025-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi