Webにおける対象物の曖昧性解消に関する研究

Research Project

Project/Area Number	07J01864
Research Category	Grant-in-Aid for JSPS Fellows
Allocation Type	Single-year Grants
Section	国内
Research Field	Intelligent informatics
Research Institution	The University of Tokyo
Principal Investigator	ボッレーガラダヌシカ The University of Tokyo, 大学院・情報理工学系研究科, 特別研究員(PD)
Project Period (FY)	2007 – 2010
Project Status	Completed (Fiscal Year 2009)
Budget Amount *help	¥2,800,000 (Direct Cost: ¥2,800,000) Fiscal Year 2009: ¥900,000 (Direct Cost: ¥900,000) Fiscal Year 2008: ¥900,000 (Direct Cost: ¥900,000) Fiscal Year 2007: ¥1,000,000 (Direct Cost: ¥1,000,000)
Keywords	関係抽出 / ウェブマイニング / クラスタリング / 共クラスタリング / エンティティ / 外延的定義 / 内包的定義 / アルゴリズム / 類似性 / 類似度尺度 / 関係類似性 / analogy / 曖昧性解消 / 機械学習 / Web Mining / 類似度計算 / 別名問題 / referential ambiguity / polysemy / 情報抽出 / Web検索
Research Abstract	二つの対象物(エンティティ)間の関係Rを定義するためには2種類の方法がある。一つの方法はその関係にあるエンティティのペアを挙げることである(外延的定義,extensional definition)。もう一方の方法は関係Rを語彙パターンで表現することである(内包的定義,intensional definition)。本研究では、この双対となる関係の定義に基づくクラスタリング手法を提案し、それを用い関係抽出を行う。提案するクラスタリング手法の一つの特徴としては語彙パターンとentityペアを「同時に」クラスタリングすることであり、このように「お互い何らかの制約を満たしている二つの量を同時にクラスタリングする」クラスタリングアルゴリズムは統一的にco-clustering(共クラスタリング)アルゴリズムと呼ばれている。本研究もこのco-clusteringアルゴリズムの一種であり、関係の異なる定義の双対性という制約に基づいて実現する点に特徴がある。教師なし学習であるクラスタリングによるので、訓練用データを必要としない。co-clusteringによりentityペアの関係種別クラスタリングに使う特徴量となる語彙パターンも同時にクラスタリングするので、特徴次元を圧縮し安定的なクラスタリングを可能にする特徴をゆうする。Webのような膨大なテキストコーパスからエンティティ間の関係を抽出する際に、膨大な数のエンティティペアと語彙パターンを同時にco-clusteringする必要があるため計算量の小さいアルゴリズムが重要である。本研究ではオーダー0(nlogn)の計算量でco-clusteringできるsequential co-clusteringアルゴリズムを提案し評価した。

Report

(3 results)

Research Products
(17 results)

All 2010 2009 2008 2007 Other

All Journal Article (1 results) (of which Peer Reviewed: 1 results) Presentation (14 results) Remarks (2 results)

[Journal Article] A bottom up approach to Sentence Ordering for Multi-document Summarization2010
- Author(s)
  Danushka Bollegala, Naoaki Okazaki, Mitsuru Ishizuka
- Journal Title
  
  Information Processing and Management 46
  
  Pages: 89-109
- Related Report
  2009 Annual Research Report
- Peer Reviewed
[Presentation] A Sequential Model for Discourse Segmentation2010
- Author(s)
  Hugo Hernault, Danushka Bollegala, Mitsuru Ishizuka
- Organizer
  International Conference on Intelligent Text Processing and Computational Linguistics(CICLing)
- Place of Presentation
  Romania, Iasi
- Year and Date
  2010-03-21
- Related Report
  2009 Annual Research Report
[Presentation] A Relational Model of Semantic Similarity between Words using Automatically Extracted Lexical Pattern Clusters from the Web2009
- Author(s)
  Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka
- Organizer
  Empirical Methods in Natural Language Processing
- Place of Presentation
  Singapore, Singapore
- Year and Date
  2009-08-06
- Related Report
  2009 Annual Research Report
[Presentation] Measuring the Similarity between Implicit Semantic Relations from the Web2009
- Author(s)
  Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka
- Organizer
  International World Wide Web Conference
- Place of Presentation
  Spain, Madrid
- Year and Date
  2009-04-21
- Related Report
  2009 Annual Research Report
[Presentation] Measuring the Similarity between Implicit Semantic Relations using Web Search En2009
- Author(s)
  D. Bollegala, Y. Matsuo, M. Ishizu
- Organizer
  2nd ACM Int'l Conf. on Web Search and Data Mining (WSDM)
- Place of Presentation
  Barcelona, Spain
- Year and Date
  2009-02-11
- Related Report
  2008 Annual Research Report
[Presentation] Social Network Mining from the Web2008
- Author(s)
  Y. Masuo, D. Bollegala, H. Tomob
- Organizer
  NSF Sponsered Symposium on Semantic Knowledge Discovery「ポスター」
- Place of Presentation
  New York, U.S.A.
- Year and Date
  2008-11-14
- Related Report
  2008 Annual Research Report
[Presentation] Automatically Extracting Personal Name Aliases from the Web2008
- Author(s)
  D. Bollegala, T. Honma, Y. Matsuo, M. Ishiz
- Organizer
  6th International Conference on Natural Language Processing (GoTA)
- Place of Presentation
  Gothenburg, Sweden
- Year and Date
  2008-08-25
- Related Report
  2008 Annual Research Report
[Presentation] WWW sits the SAT : Measuring Relational Similarity from the Web2008
- Author(s)
  D. Bollegala, Y. Matsuo, M. Ishizu
- Organizer
  18th European Conference on Artificial Intelligence (ECAI)
- Place of Presentation
  Patras, Greece
- Year and Date
  2008-07-21
- Related Report
  2008 Annual Research Report
[Presentation] Mining for Personal Name Aliases on the Web2008
- Author(s)
  D.Bollegala, T.Honma, Y.Matsuo, M.Ishizuka
- Organizer
  International WorId Wide Web Conference
- Place of Presentation
  Beijing,China
- Year and Date
  2008-04-23
- Related Report
  2007 Annual Research Report
[Presentation] Identification of Personal Name Aliases on the Web2008
- Author(s)
  D.Bollegala, T.Honma, Y.Matsuo, M.Ishizuka
- Organizer
  Workshop on Social Web Search and Mining, Intl.World Wide Web Conference
- Place of Presentation
  Beijing,China
- Year and Date
  2008-04-22
- Related Report
  2007 Annual Research Report
[Presentation] Mining for Personal Name Aliases on the Web2008
- Author(s)
  D. Bollegala, T. Honma Y. Matsuo, M. Ishiz
- Organizer
  17th Int'l World Wide Web Conference(WWW)[ボスター]
- Place of Presentation
  Beijing, China
- Year and Date
  2008-04-21
- Related Report
  2008 Annual Research Report
[Presentation] A Co-occurrence Graph-based Approach for Personal Name Alias Extraetion from Anchor Texts2008
- Author(s)
  D.Bollegala, Y.Matsuo, M.Ishizuka
- Organizer
  International Joint Conferences on Natural Language Processing(IJCNLP)
- Place of Presentation
  Hyderabad,India
- Year and Date
  2008-01-07
- Related Report
  2007 Annual Research Report
[Presentation] WebSim:A Web-based Semantic Similarity Measure2007
- Author(s)
  D.Bollegala, Y.Matsuo, M.Ishizuka
- Organizer
  人工知能学会全国大会
- Place of Presentation
  宮崎県、日本
- Year and Date
  2007-06-20
- Related Report
  2007 Annual Research Report
[Presentation] Measuring Semantic Similarity between Words Using Web Seareh Engines2007
- Author(s)
  D.Bollegala, Y.Matsuo, M.Ishizuka
- Organizer
  International World Wide Web Conference
- Place of Presentation
  Banff,Canada
- Year and Date
  2007-05-11
- Related Report
  2007 Annual Research Report
[Presentation] An Integrated Approach to Measuring Semantic Similarity between Words Using Information Available on the Web2007
- Author(s)
  D.Bollegala, Y.Matsuo, M.Ishizuka
- Organizer
  Human Language Technologies:Annual Conference of the North American Chapter of the Association for Computational Linguistics
- Place of Presentation
  Rochester NY.U.S.A.
- Year and Date
  2007-04-24
- Related Report
  2007 Annual Research Report
[Remarks]
- URL
  http://www.iba.t.u-tokyo.ac.jp/~danushka/publications.html
- Related Report
  2009 Annual Research Report
[Remarks]
- URL
  http://www.miv.t.u-tokyo.ac.jp/danushka/publications.html
- Related Report
  2008 Annual Research Report

Webにおける対象物の曖昧性解消に関する研究

Principal Investigator

ボッレーガラ ダヌシカ The University of Tokyo, 大学院・情報理工学系研究科, 特別研究員(PD)

¥2,800,000 (Direct Cost: ¥2,800,000)

Report

Research Products

[Journal Article] A bottom up approach to Sentence Ordering for Multi-document Summarization2010

Author(s)

Journal Title

Related Report

[Presentation] A Sequential Model for Discourse Segmentation2010

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] A Relational Model of Semantic Similarity between Words using Automatically Extracted Lexical Pattern Clusters from the Web2009

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Measuring the Similarity between Implicit Semantic Relations from the Web2009

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Measuring the Similarity between Implicit Semantic Relations using Web Search En2009

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Social Network Mining from the Web2008

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Automatically Extracting Personal Name Aliases from the Web2008

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] WWW sits the SAT : Measuring Relational Similarity from the Web2008

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Mining for Personal Name Aliases on the Web2008

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Identification of Personal Name Aliases on the Web2008

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Mining for Personal Name Aliases on the Web2008

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] A Co-occurrence Graph-based Approach for Personal Name Alias Extraetion from Anchor Texts2008

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] WebSim:A Web-based Semantic Similarity Measure2007

Author(s)

Organizer

Place of Presentation

ボッレーガラダヌシカ The University of Tokyo, 大学院・情報理工学系研究科, 特別研究員(PD)