A Text Organization Method Based on Maximal Analogy

Research Project

Project/Area Number	16300039
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Research Field	Intelligent informatics
Research Institution	Hokkaido University
Principal Investigator	HARAGUCHI Makoto Hokkaido Univ, Grapduate School of Inf.Sci.and Tech., Prof., 大学院・情報科学研究科, 教授 (40128450)
Co-Investigator(Kenkyū-buntansha)	TANAKA Yuzuru Hokkaido Univ., Grapduate School of Inf.Sci.and Tech., Prof., 大学院・情報科学研究科, 教授 (60002309) KAKUTA Tokuyasu Nagoya Univ., Graduate School of Law, Assoc.Prof., 大学院・院情報科学研究科, 助教授 (80292001) YOSHIOKA Masaharu Hokkaido Univ., Grapduate School of Inf.Sci.and Tech., Assoc.Prof., 大学院・情報科学研究科, 助教授 (40290879) OKUBO Yoshiaki Hokkaido Univ., Grapduate School of Inf.Sci.and Tech., Inst., 大学院・情報科学研究科, 助手 (40271639)
Project Period (FY)	2004 – 2005
Project Status	Completed (Fiscal Year 2005)
Budget Amount *help	¥8,600,000 (Direct Cost: ¥8,600,000) Fiscal Year 2005: ¥5,800,000 (Direct Cost: ¥5,800,000) Fiscal Year 2004: ¥2,800,000 (Direct Cost: ¥2,800,000)
Keywords	Maximal Analogy / Text Summarization / Similarity / Text Segmentation / Event Sequence / 文書構造 / 物語の構造解析 / コーパス / 特異値分解 / トピック・文脈解析 / ストーリー
Research Abstract	In the research project, an algorithm for extracting common abstract event sequences, given two or more documents, is presented. In order to avoid combinatorial explosions in matching more than two documents, the algorithm consists of two phases. The first phase is basically a text summarization system taking balances between the importance of sentences in each segment in a given document and the importance of sentences to connect several segments. The latter importance is used to extract contextual sentences involving contextual words. In order to separate the notion of importance into the two as in the above, we compute a chunk of core sentences in each segment by a clique finding algorithm, and then calculate the degree of latter importance of sentences from the chunk just in the way used in KeyGraph. Finally, the overall importance is determined by a scheme very similar to topic-sensitive PageRank. We have made some experiments for newspaper articles, and verified its effectiveness. In the second phase, we use the summarized document obtained in the first phase as a kind of source document. The summarized document preserves the structure of the original document at more abstract level. Therefore, for each event in the source, it suffices to find a similar event in a given target document. This drastically reduces the computational complexity need to build correspondence between the two documents. As a result, several document set is now summarized from a viewpoint of the source document by means of analogy.

Report

(3 results)

2005 Annual Research Report Final Research Report Summary
2004 Annual Research Report

Research Products

(27 results)

All 2006 2005 2004

All Journal Article (25 results) Book (2 results)

[Journal Article] An Extended Branch-and-Bound Search Algorithm for Finding Top-$N$ Formal Concepts of Documents2006
- Author(s)
  M.Haraguchi
- Journal Title
  
  Proceedings of the 4th Workshop on Learning with Logics and Logics for Learning-LLLL'06 (印刷中)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] A Method for Pinpoint Clustering of Web Pages with Pseudo-Clique Search2006
- Author(s)
  M.Haraguchi
- Journal Title
  
  Springer LNAI, Federation over the Web, International Workshop 3847
  
  Pages: 59-78
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] A Method for Pinpoint Clustering of Web Pages with Pseudo-Clique Search.2006
- Author(s)
  Makoto Haraguchi, Yoshiaki Okubo
- Journal Title
  
  Federation over the Web, Int'l Workshop, Dagstuhl Castle, Germany, May 1 - 6, 2005, Revised Selected Papers, Lecture Notes in Artificial Intelligence(Springer) 3847
  
  Pages: 59-78
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] An Extended Branch-and-Bound Search Algorithm for Finding Top-$N$ Formal Concepts of Documents.2006
- Author(s)
  Makoto Haraguchi, Yoshiaki Okubo
- Journal Title
  
  Proceedings of the 4th Workshop on Learning with Logics and Logics for Learning - LLLL'06 (to appear)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] A Method for Pinpoint Clustering of Web Pages with Pseudo-Clique Search2006
- Author(s)
  M.Haraguchi, Y.Okubo
- Journal Title
  
  Federation over the Web, International Workshop(Springer-LNAI) 3847
  
  Pages: 59-78
- Related Report
  2005 Annual Research Report
[Journal Article] An Algorithm for Mining Implicit Itemset Pairs Based on Differences of Correlations,2005
- Author(s)
  T.Taniguchi
- Journal Title
  
  Proceedings of the 8th International Conference on Discovery Science - DS'05, Springer-LNAI 3735
  
  Pages: 227-240
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] Towards Constructing Story Databases Using Maximal Analogies Between Stories2005
- Author(s)
  M.Yoshioka
- Journal Title
  
  Springer LNAI, In Intuitive Human Interfaces for Organizing and Accessing Intellectual Assets 3359
  
  Pages: 243-255
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] On a Combination of Probabilistic and Boolean IR Models for WWW Document Retrieval2005
- Author(s)
  M.Yoshioka
- Journal Title
  
  ACM Transactions on Asian Language Information Processing 4
  
  Pages: 340-356
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] Finding Significant Web Pages with Lower Ranks by Pseudo-Clique Search2005
- Author(s)
  Y.Okubo
- Journal Title
  
  Proceedings of the 8th International Conference on Discovery Science - DS'05, Springer-LNAI 3735
  
  Pages: 346-353
- NAID
  120000954272
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] Discovery of Hidden Correlations in a Local Transaction Database Based on Differences of Correlations2005
- Author(s)
  Tsuyoshi Taniguchi, Makoto Haraguchi, Yoshiaki Okubo
- Journal Title
  
  Proceedings of the 4th International Conference on Machine Learning and Data Mining in Pattern Recognition - MLDM'05(Springer-LNAI) 3587
  
  Pages: 537-548
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] An Algorithm for Mining Implicit Itemset Pairs Based on Differences of Correlations2005
- Author(s)
  Tsuyoshi Taniguchi, Makoto Haraguchi
- Journal Title
  
  Proceedings of the 8th International Conference on Discovery Science -DS'05(Springer-LNAI) 3735
  
  Pages: 227-240
- NAID
  120000956717
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] Towards Constructing Story Databases Using Maximal Analogies Between Stories.2005
- Author(s)
  Masaharu Yoshioka, Makoto Haraguchi, Akihito Mizoe
- Journal Title
  
  Intuitive Human Interfaces for Organizing and Accessing Intellectual Assets : International Workshop, Dagstuhl Castle, Germany, March 1-5, 2004, Revised Selected Papers, Gunter Grieser(Yuzuru Tanaka (eds))(Springer-Verlag GmbH, LNAI) 3359
  
  Pages: 243-255
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] On a Combination of Probabilistic and Boolean IR Models for WWW Document Retrieval.2005
- Author(s)
  Masaharu Yoshioka, Makoto Haraguchi
- Journal Title
  
  ACM Transactions on Asian Language Information Processing(TALIP) Vol.4
  
  Pages: 340-356
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] Finding Significant Web Pages with Lower Ranks by Pseudo-Clique Search2005
- Author(s)
  Yoshiaki Okubo, Makoto Haraguchi
- Journal Title
  
  Proceedings of the 8th International Conference on Discovery Science - DS'05(Springer-LNAI) 3735
  
  Pages: 346-353
- NAID
  120000954272
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] Finding Significant Web Pages with Lower Ranks by Pseudo-Clique Search2005
- Author(s)
  Y.Okubo, M.Haraguchi
- Journal Title
  
  Proceedings of the 8th International Conference on Discovery Science(Springer-LNAI) 3735
  
  Pages: 346-353
- NAID
  120000954272
- Related Report
  2005 Annual Research Report
[Journal Article] An Algorithm for Mining Implicit Itemset Pairs Based on Differences of Correlations2005
- Author(s)
  T.Taniguchi, M.Haraguchi
- Journal Title
  
  Proceedings of the 8th International Conference on Discovery Science(Springer-LNAI) 3735
  
  Pages: 227-240
- NAID
  120000956717
- Related Report
  2005 Annual Research Report
[Journal Article] Discovery of Hidden Correlations in a Local Transaction Database Based on Differences of Correlations2005
- Author(s)
  T.Taniguchi, M.Haraguchi, Y.Okubo
- Journal Title
  
  4th International Conference on Machin Learning and Data Mining in Pattern Recognition(Springer-LNAI) 3587
  
  Pages: 537-548
- Related Report
  2005 Annual Research Report
[Journal Article] Towards Constructing Story Databases Using Maximal Analogies Between Stories2005
- Author(s)
  M.Yoshioka, M.Haraguchi, A.Mizoe
- Journal Title
  
  In Intuitive Human Interfaces for Organizing and Accessing Intellectual Assets(Springer-LNAI) 3359
  
  Pages: 243-255
- Related Report
  2005 Annual Research Report
[Journal Article] 検索語の網羅性に注目した汎化概念により検索語選択支援を行う情報検索システムの研究2005
- Author(s)
  吉岡真治, 原口誠
- Journal Title
  
  人工知能学会論文誌 20・4
  
  Pages: 270-280
- NAID
  10022005347
- Related Report
  2005 Annual Research Report
[Journal Article] Towards Constructing Story Databases Using Maximal Analogies Between Stories2005
- Author(s)
  M.Yoshioka, M.Haraguchi, A.Mizoe
- Journal Title
  
  Intuitive Human Interfaces for Organizing and Accessing Intellectual Assets (Springer-LNCS) LNCS3359
  
  Pages: 243-255
- Related Report
  2004 Annual Research Report
[Journal Article] 検索後の網羅性に注目した汎化概念により検索語選択支援を行う情報検索システムの研究2005
- Author(s)
  吉岡真治, 原口誠
- Journal Title
  
  人工知能学会誌 Vol.20No.4(印刷中)
- Related Report
  2004 Annual Research Report
[Journal Article] Appropriate Boolean Query Reformulation Interface for Information Retrieval based on Adaptive Generalization2005
- Author(s)
  M.Yoshioka, M.Haraguchi
- Journal Title
  
  Proc.of the International Workshop on Challenges in Web Information Retrieval and Integration (発表予定)
- Related Report
  2004 Annual Research Report
[Journal Article] Multiple News Articles Summarization based on Event Reference Information2004
- Author(s)
  M.Yoshioka, M.Haraguchi
- Journal Title
  
  In Working Notes of the Fourth NTCIR Workshop Meeting
  
  Pages: 467-473
- Related Report
  2004 Annual Research Report
[Journal Article] Study on the Combination of Probabilistic and Boolean IR Models for WWW Documents Retrieval2004
- Author(s)
  M.Yoshioka, M.Haraguchi
- Journal Title
  
  In Working Notes of the Fourth NTCIR Workshop Meeting Supplement Vol.
  
  Pages: 9-16
- Related Report
  2004 Annual Research Report
[Journal Article] 様々な特徴のキャラクタに対する同化動作生成手法2004
- Author(s)
  本林正裕, 原口誠
- Journal Title
  
  電子情報通信学会論文誌 Vol.J87-D-II, No.7
  
  Pages: 1473-1486
- NAID
  110003171136
- Related Report
  2004 Annual Research Report
[Book] 人工知能学辞典(「類推による学習」の項を執筆)2005
- Author(s)
  原口誠(分担執筆)
- Total Pages
  976
- Publisher
  共立出版
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2005 Final Research Report Summary
[Book] 人工知能学辞典(「類推による学習」の項を執筆)(分担2頁)2005
- Author(s)
  原口誠(分担執筆)
- Total Pages
  972
- Publisher
  共立出版
- Related Report
  2005 Annual Research Report

A Text Organization Method Based on Maximal Analogy

Principal Investigator

HARAGUCHI Makoto Hokkaido Univ, Grapduate School of Inf.Sci.and Tech., Prof., 大学院・情報科学研究科, 教授 (40128450)

¥8,600,000 (Direct Cost: ¥8,600,000)

Report

Research Products

[Journal Article] An Extended Branch-and-Bound Search Algorithm for Finding Top-$N$ Formal Concepts of Documents2006

Author(s)

Journal Title

Description

Related Report

[Journal Article] A Method for Pinpoint Clustering of Web Pages with Pseudo-Clique Search2006

Author(s)

Journal Title

Description

Related Report

[Journal Article] A Method for Pinpoint Clustering of Web Pages with Pseudo-Clique Search.2006

Author(s)

Journal Title

Description

Related Report

[Journal Article] An Extended Branch-and-Bound Search Algorithm for Finding Top-$N$ Formal Concepts of Documents.2006

Author(s)

Journal Title

Description

Related Report

[Journal Article] A Method for Pinpoint Clustering of Web Pages with Pseudo-Clique Search2006

Author(s)

Journal Title

Related Report

[Journal Article] An Algorithm for Mining Implicit Itemset Pairs Based on Differences of Correlations,2005

Author(s)

Journal Title

Description

Related Report

[Journal Article] Towards Constructing Story Databases Using Maximal Analogies Between Stories2005

Author(s)

Journal Title

Description

Related Report

[Journal Article] On a Combination of Probabilistic and Boolean IR Models for WWW Document Retrieval2005

Author(s)

Journal Title

Description

Related Report

[Journal Article] Finding Significant Web Pages with Lower Ranks by Pseudo-Clique Search2005

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] Discovery of Hidden Correlations in a Local Transaction Database Based on Differences of Correlations2005

Author(s)

Journal Title

Description

Related Report

[Journal Article] An Algorithm for Mining Implicit Itemset Pairs Based on Differences of Correlations2005

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] Towards Constructing Story Databases Using Maximal Analogies Between Stories.2005

Author(s)

Journal Title

Description

Related Report

[Journal Article] On a Combination of Probabilistic and Boolean IR Models for WWW Document Retrieval.2005

Author(s)

Journal Title

Description

Related Report

[Journal Article] Finding Significant Web Pages with Lower Ranks by Pseudo-Clique Search2005

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] Finding Significant Web Pages with Lower Ranks by Pseudo-Clique Search2005

Author(s)