A Text Organization Method Based on Maximal Analogy
Project/Area Number |
16300039
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Hokkaido University |
Principal Investigator |
HARAGUCHI Makoto Hokkaido Univ, Grapduate School of Inf.Sci.and Tech., Prof., 大学院・情報科学研究科, 教授 (40128450)
|
Co-Investigator(Kenkyū-buntansha) |
TANAKA Yuzuru Hokkaido Univ., Grapduate School of Inf.Sci.and Tech., Prof., 大学院・情報科学研究科, 教授 (60002309)
KAKUTA Tokuyasu Nagoya Univ., Graduate School of Law, Assoc.Prof., 大学院・院情報科学研究科, 助教授 (80292001)
YOSHIOKA Masaharu Hokkaido Univ., Grapduate School of Inf.Sci.and Tech., Assoc.Prof., 大学院・情報科学研究科, 助教授 (40290879)
OKUBO Yoshiaki Hokkaido Univ., Grapduate School of Inf.Sci.and Tech., Inst., 大学院・情報科学研究科, 助手 (40271639)
|
Project Period (FY) |
2004 – 2005
|
Project Status |
Completed (Fiscal Year 2005)
|
Budget Amount *help |
¥8,600,000 (Direct Cost: ¥8,600,000)
Fiscal Year 2005: ¥5,800,000 (Direct Cost: ¥5,800,000)
Fiscal Year 2004: ¥2,800,000 (Direct Cost: ¥2,800,000)
|
Keywords | Maximal Analogy / Text Summarization / Similarity / Text Segmentation / Event Sequence / 文書構造 / 物語の構造解析 / コーパス / 特異値分解 / トピック・文脈解析 / ストーリー |
Research Abstract |
In the research project, an algorithm for extracting common abstract event sequences, given two or more documents, is presented. In order to avoid combinatorial explosions in matching more than two documents, the algorithm consists of two phases. The first phase is basically a text summarization system taking balances between the importance of sentences in each segment in a given document and the importance of sentences to connect several segments. The latter importance is used to extract contextual sentences involving contextual words. In order to separate the notion of importance into the two as in the above, we compute a chunk of core sentences in each segment by a clique finding algorithm, and then calculate the degree of latter importance of sentences from the chunk just in the way used in KeyGraph. Finally, the overall importance is determined by a scheme very similar to topic-sensitive PageRank. We have made some experiments for newspaper articles, and verified its effectiveness. In the second phase, we use the summarized document obtained in the first phase as a kind of source document. The summarized document preserves the structure of the original document at more abstract level. Therefore, for each event in the source, it suffices to find a similar event in a given target document. This drastically reduces the computational complexity need to build correspondence between the two documents. As a result, several document set is now summarized from a viewpoint of the source document by means of analogy.
|
Report
(3 results)
Research Products
(27 results)