2009 Fiscal Year Annual Research Report

大規模テキストから自動獲得した知識に基づく言語解析の精度向上

Research Project

Project/Area Number	21700163
Research Institution	Kyoto University
Principal Investigator	柴田知秀 Kyoto University, 情報学研究科, 助教 (70452315)
Keywords	自然言語処理 / 大規模テキスト / 知識獲得 / 同義語 / 分布類似度
Research Abstract	主に以下の3点について研究を行つた。 1. 分布類似度計算各名詞に対して共記する動詞を大規模コーパスから抽出し、例えば「医者」と「医師」がどちらも「～が診察する」「～に診てもらう」などといった動詞と共起することからこれらの2語は類似しているといった分布類似度を計算した。また、同様に、各動詞に対して共起する名詞を抽出し、「購入する」と「買う」の分布類似度を計算した。評価セットを用いて、コーパスサイズを大きくすればするほど精度が向上することを確認した。 2. 固有表現解析の精度向上任意の名詞句に対する固有表現の解釈と、ボトムアップに最適な固有表現の解釈を行う2段階の機械学習(SVMを利用)を用いる固有表現解析器を構築した。日本語固有表現の評価として広く用いられているCRLコーパスを用いて実験を行ったところ、既存の研究を上回る精度を達成することができた。 3. Wikipediaからの上位語獲得 Wikipediaの各エントリの説明の1文目から、文末パターンを用いて、エントリの上位語を獲得した。獲得された知識としては例えば、「東京ディズニーランド」の上位語として「テーマパーク」、「松井秀喜」の上位語として「日本人メジャーリーガー」などであり、今後この知識を言語解析の精度向上に利用する。

Research Products
(5 results)

All 2010 2009

All Journal Article (3 results) (of which Peer Reviewed: 3 results) Presentation (2 results)

[Journal Article] 同一文抽出に基づく類似ページの検出と分類2010
- Author(s)
  柴田知秀, 姜ナウン, 黒橋禎夫
- Journal Title
  
  人工知能学会論文誌 Vol.25
  
  Pages: 224-232
- Peer Reviewed
[Journal Article] キーワード蒸留型クラスタリングによる大規模ウェブ情報の俯瞰2009
- Author(s)
  馬場康夫, 新里圭司, 柴田知秀, 黒橋禎夫
- Journal Title
  
  情報処理学会論文誌 Vol.50
  
  Pages: 90-103
- Peer Reviewed
[Journal Article] 二段階の機械学習を用いたボトムアップ型の固有表現認識2009
- Author(s)
  船山弘孝, 柴田知秀, 黒橋禎夫
- Journal Title
  
  第8回情報科学技術フォーラム(FIT2009) 第2分冊
  
  Pages: 19-26
- Peer Reviewed
[Presentation] Web Information Organization using Keyword Distillation Based Clustering2009
- Author(s)
  Tomohide Shibata, Yasuo Banba, Keiji Shinzato, Sadao Kurohashi
- Organizer
  In Proceedings of the 2009 IEEE/WIC/ACM International Conference on Web Intelligence (WI-09, short paper)
- Place of Presentation
  Milano, Italy
- Year and Date
  2009-09-18
[Presentation] Bottom-up Named Entity Recognition using a Two-stage Machine Learning Method2009
- Author(s)
  Hirotaka Funayama, Tomohide Shibata, Sadao Kurohashi
- Organizer
  In Proceedings of In proceedings of Association for Computational Linguistics/International Joint Conference on Natural Language Processing (ACL/IJ CNLP2009) : Workshop on Multiword Expressions
- Place of Presentation
  Singapore
- Year and Date
  2009-08-06

2009 Fiscal Year Annual Research Report

大規模テキストから自動獲得した知識に基づく言語解析の精度向上

Principal Investigator

柴田 知秀 Kyoto University, 情報学研究科, 助教 (70452315)

Research Products

[Journal Article] 同一文抽出に基づく類似ページの検出と分類2010

Author(s)

Journal Title

[Journal Article] キーワード蒸留型クラスタリングによる大規模ウェブ情報の俯瞰2009

Author(s)

Journal Title

[Journal Article] 二段階の機械学習を用いたボトムアップ型の固有表現認識2009

Author(s)

Journal Title

[Presentation] Web Information Organization using Keyword Distillation Based Clustering2009

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Bottom-up Named Entity Recognition using a Two-stage Machine Learning Method2009

Author(s)

Organizer

Place of Presentation

Year and Date

柴田知秀 Kyoto University, 情報学研究科, 助教 (70452315)