2014 Fiscal Year Final Research Report

Construction and Summarization of Lecture Contents Using Both Slides and Lecture Speech

Research Project

Project/Area Number	23700115
Research Category	Grant-in-Aid for Young Scientists (B)
Allocation Type	Multi-year Fund
Research Field	Media informatics/Database
Research Institution	Toyohashi University of Technology
Principal Investigator	TSUCHIYA Masatoshi 豊橋技術科学大学, 情報メディア基盤センター, 准教授 (70378256)
Project Period (FY)	2011-04-28 – 2015-03-31
Keywords	自動要約
Outline of Final Research Achievements	Because lecture speech contains spoken phenomena such as filled pauses and silent pauses, a robust automatic speech recoginition method is necessary in order to realize automatic summarization of lecture speech. Our method consists of two steps: 1st step is to predict filler insertion locations and pause insertion locations against loosely transcribed corpora which has no pause information using filler insertion model and pause insertion model learned from precisely transcribed corpora including filler information and pause information, and 2nd step is to construct a language model based on both loosely transcribed corpora and predicted information. And more, a method to detect lecture specific named entities was developed. The human annotation scheme to map lecture slides and lecture speech transcriptions was also established.
Free Research Field	自然言語処理