2008 Fiscal Year Annual Research Report

Webにおける対象物の曖昧性解消に関する研究

Research Project

Project/Area Number	07J01864
Research Institution	The University of Tokyo
Principal Investigator	ボッレーガラダヌシカ The University of Tokyo, 大学院・情報理工学系研究科, 特別研究員(DC1)
Keywords	類似性 / 類似度尺度 / 関係類似性 / analogy / 曖昧性解消 / エンティティ / 機械学習 / クラスタリング
Research Abstract	類似性(similarity)は大きく「属性類似性」(attributional similarity)と「関係類似性」(relational similarity)として二つに分類することができます。属性類似性とは二つの概念が持つ属性同士の対応関係である。例えば、Jaguarとcatを考えるとそれぞれが哺乳類であり、4本の足を持っており、肉食動物である。Jaguarとcatはこのように沢山の属性を共通して持っているので高い属性類似性を持っているといえる。一方関係類似性だが、二つの単語対に対して定義される。つまり、それぞれの単語対に含まれる二つの単語間に成り立つ関係がどれくらい近いかということを関係類似性で評価される。例として(ダチョウ、鳥)単語対と(ライオン、猫)単語対が挙げられる。ダチョウは地球上に存在する最も大きな鳥であり、ライオンは同様に最も大きな猫である。それぞれの単語対では「〜は最も大きな〜である」という関係が成り立つのでこれらの単語対間には高い関係類似性があるといえる。関係類似性を高精度で測れることが本研究課題である「対象物の曖昧性解消」の他数多くのタスクで大変重要になる。本研究ではWeb検索エンジンを用いて単語対間の関係類似度を計算する手法を提案した。まず与えられた単語対をWeb検索エンジンに入力し得られたスニベットの中から関係を表す単語パターンを生成する。そのために部分列生成アルゴリズムであるprefixspanアルゴリズムに基づく高速パターン生成手法を提案した。一つの関係が複数のパターンで表現される場合(例:言い換えパターン)があるので抽出されたパターンをまずくラスクリングし意味的に似たパターンを認識できるようにした。そのためには膨大な数のパターンを効率良くクラスタリングできるアルゴリズムを提案した。生成されたクラスターがそれぞれ独立な関係を表現しているとは限らないので関係類似性を計測する際にこれらのクラスター間の関係を考慮する必要がある。そのためにMahalanobis距離を適用した。尚、どのクラスターが関係類似性を計測する際にどの程度貢献するのかを決めるために学習データを用いた。学習のためにInformation theoretic metric learningアプローチ(Davis 2007)を使用した。提案手法をSAT類推問題評価データと固有名詞からなる単語対(ENTデータセット)を使って評価した。いずれのデータセット上でも高い精度が得られ先行研究や様々なベースラインより優れていることが分かった。この研究成果はWeb Search and Data Mining(WSDM 2009)やWorld Wide Web Conference(WWW 2009)といったWeb分野で最高峰の国際会議でfull paperとして採択され国際的にも高い評価を受けた。

Research Products
(6 results)

All 2009 2008 Other

All Presentation (5 results) Remarks (1 results)

[Presentation] Measuring the Similarity between Implicit Semantic Relations using Web Search En2009
- Author(s)
  D. Bollegala, Y. Matsuo, M. Ishizu
- Organizer
  2nd ACM Int'l Conf. on Web Search and Data Mining (WSDM)
- Place of Presentation
  Barcelona, Spain
- Year and Date
  2009-02-11
[Presentation] Social Network Mining from the Web2008
- Author(s)
  Y. Masuo, D. Bollegala, H. Tomob
- Organizer
  NSF Sponsered Symposium on Semantic Knowledge Discovery「ポスター」
- Place of Presentation
  New York, U.S.A.
- Year and Date
  2008-11-14
[Presentation] Automatically Extracting Personal Name Aliases from the Web2008
- Author(s)
  D. Bollegala, T. Honma, Y. Matsuo, M. Ishiz
- Organizer
  6th International Conference on Natural Language Processing (GoTA)
- Place of Presentation
  Gothenburg, Sweden
- Year and Date
  2008-08-25
[Presentation] WWW sits the SAT : Measuring Relational Similarity from the Web2008
- Author(s)
  D. Bollegala, Y. Matsuo, M. Ishizu
- Organizer
  18th European Conference on Artificial Intelligence (ECAI)
- Place of Presentation
  Patras, Greece
- Year and Date
  2008-07-21
[Presentation] Mining for Personal Name Aliases on the Web2008
- Author(s)
  D. Bollegala, T. Honma Y. Matsuo, M. Ishiz
- Organizer
  17th Int'l World Wide Web Conference(WWW)[ボスター]
- Place of Presentation
  Beijing, China
- Year and Date
  2008-04-21
[Remarks]
- URL
  http://www.miv.t.u-tokyo.ac.jp/danushka/publications.html

2008 Fiscal Year Annual Research Report

Webにおける対象物の曖昧性解消に関する研究

Principal Investigator

ボッレーガラ ダヌシカ The University of Tokyo, 大学院・情報理工学系研究科, 特別研究員(DC1)

Research Products

[Presentation] Measuring the Similarity between Implicit Semantic Relations using Web Search En2009

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Social Network Mining from the Web2008

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Automatically Extracting Personal Name Aliases from the Web2008

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] WWW sits the SAT : Measuring Relational Similarity from the Web2008

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Mining for Personal Name Aliases on the Web2008

Author(s)

Organizer

Place of Presentation

Year and Date

[Remarks]

URL

ボッレーガラダヌシカ The University of Tokyo, 大学院・情報理工学系研究科, 特別研究員(DC1)