A study on adaptive indexing method for dedicated portals

Research Project

Project/Area Number	18500093
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	Media informatics/Database
Research Institution	National Institute of Informatics
Principal Investigator	AIZAWA Akiko National Institute of Informatics, Digital Content and Media Sciences Research Division, Professor (90222447)
Project Period (FY)	2006 – 2007
Project Status	Completed (Fiscal Year 2007)
Budget Amount *help	¥4,010,000 (Direct Cost: ¥3,500,000、Indirect Cost: ¥510,000) Fiscal Year 2007: ¥2,210,000 (Direct Cost: ¥1,700,000、Indirect Cost: ¥510,000) Fiscal Year 2006: ¥1,800,000 (Direct Cost: ¥1,800,000)
Keywords	natural language processing / compound extraction / dictionary construction / information retrieval / lexicon / dedicated portal sites / indexing tools / CRF / 専用ポータル / 語彙抽出 / 専門ポータル / EM法
Research Abstract	In recent years, constructing dedicated web portals has become a common practice for academic people. These portals are valuable information sources to maintain the diversity of the web contents and to disseminate academic or specialized knowledge to the public. Dedicated portals with specialized content require a good term extraction tool in order to identify multi-word expressions that are not found in general dictionaries. However, existing segmentation tools are not satisfactory for this purpose. Based on the above, this study focuses on a keyword extraction method that enhances the search capability of dedicated portal servers. During the two years research period, we addressed to the followings : 1. A framework of automatic multi-word expression (or compounds) extraction where the following two modules are applied sequentially but independently: (A) a segmentation module that identifies longest multi-word regions from a given text input, and (B) a parsing module that analyzes the cost of word connections within a same multi-word region. 2. A new method for (B) where the tree structure of multi-words was determined using a statistical cost function. The parameters for the function are obtained by applying CRF (conditional random field) to the technical terms extracted from handbooks' of academic societies. The future issues include (i) the implementation of a lightweight tool for automatic keyword extraction using the proposed method, and (ii) the utilization of the extracted terms for search navigation or text categorization.

Report

(3 results)

2007 Annual Research Report Final Research Report Summary
2006 Annual Research Report

Research Products
(25 results)

All 2008 2007 2006

All Journal Article (14 results) (of which Peer Reviewed: 4 results) Presentation (11 results)

[Journal Article] 大規模テキストコーパスを用いた語の類似度計算に関する考察2008
- Author(s)
  相澤彰子
- Journal Title
  
  情報処理学会論文誌 49-3
  
  Pages: 1426-1436
- NAID
  110006644536
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2007 Final Research Report Summary
- Peer Reviewed
[Journal Article] On calculating word similarity using large text corpora2008
- Author(s)
  Akiko Aizawa
- Journal Title
  
  IPSJ Journal 49-3
  
  Pages: 1426-1436
- NAID
  110006644536
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2007 Final Research Report Summary
[Journal Article] 類語関係抽出タスクにおけるコーパス規模拡大の影響2008
- Author(s)
  相澤彰子
- Journal Title
  
  情報処理学会論文誌 49-3
  
  Pages: 1426-1436
- NAID
  110004824217
- Related Report
  2007 Annual Research Report
- Peer Reviewed
[Journal Article] 名詞と動詞の依存関係を利用したテキストからのIS-A関係の発見方法2007
- Author(s)
  中渡瀬秀一, 相澤彰子
- Journal Title
  
  人工知能学会論文誌 22-6
  
  Pages: 585-594
- NAID
  10022008204
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2007 Final Research Report Summary
- Peer Reviewed
[Journal Article] 共起に基づく類似性尺度2007
- Author(s)
  相澤彰子
- Journal Title
  
  オペレーションズ・リサーチ 52-11
  
  Pages: 706-712
- NAID
  110006440287
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2007 Final Research Report Summary
[Journal Article] Discovering IS-A relationships from Text : a method based on Dependencies between Nouns and Verbs2007
- Author(s)
  Hidekazu Nakawatase, Akiko Aizawa
- Journal Title
  
  transaction of the Japanese Society for Artificial Intelligence Vol.22, No.6
  
  Pages: 585-594
- NAID
  10022008204
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2007 Final Research Report Summary
[Journal Article] Co-occurrence based similarity measures2007
- Author(s)
  Akiko Aizawa
- Journal Title
  
  Communications of the Operations Research Society of Japan Vol.52, No.11
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2007 Final Research Report Summary
[Journal Article] 名詞と動詞の依存関係を利用したテキストからのIS-A関係の発見方法2007
- Author(s)
  中渡瀬秀一、相澤彰子
- Journal Title
  
  人工知能学会論文誌 22-6
  
  Pages: 585-594
- NAID
  10022008204
- Related Report
  2007 Annual Research Report
- Peer Reviewed
[Journal Article] 共起に基づく類似性尺度2007
- Author(s)
  相澤彰子
- Journal Title
  
  オペレーションズ・リサーチ 52-11
  
  Pages: 706-712
- NAID
  110006440287
- Related Report
  2007 Annual Research Report
[Journal Article] テキストを媒体とする情報の伝達をめぐって2007
- Author(s)
  相澤彰子
- Journal Title
  
  人工知能学会学会誌 22, 1
  
  Pages: 14-14
- Related Report
  2006 Annual Research Report
[Journal Article] 語義の違いを検出するための大規模コーパス処理手法の検討2006
- Author(s)
  相澤彰子
- Journal Title
  
  電子情報通信学会人工知能と知識処理研究会、研究会資料 106, AI-38
  
  Pages: 57-62
- NAID
  110004744920
- Related Report
  2006 Annual Research Report
[Journal Article] 係り受け関係を利用した類語・例文辞書構築法と大規模コーパスへの適用2006
- Author(s)
  相澤彰子, 中渡瀬秀一
- Journal Title
  
  人工知能学会全国大会(第20回)講演論文集
- NAID
  130005023209
- Related Report
  2006 Annual Research Report
[Journal Article] 類語関係抽出タスクにおけるコーパス規模拡大の影響2006
- Author(s)
  相澤彰子
- Journal Title
  
  情報処理学会、第175回自然言語処理研究会, 研究会資料 NL-94
  
  Pages: 91-98
- NAID
  110004824217
- Related Report
  2006 Annual Research Report
[Journal Article] 書誌同定のためのリンケージシステムの試作2006
- Author(s)
  相澤彰子
- Journal Title
  
  大規模データ・リンケージ・データマイニングと統計手法予稿集,
  
  Pages: 87-87
- Related Report
  2006 Annual Research Report
[Presentation] Multi-class named entity recognition via bootstrapping with dependency tree-based patterns2008
- Author(s)
  Van B.Dang and Akiko Aizawa
- Organizer
  the 12nd Pacific-Asia Conference on Knowledge Discovery and Discovery and Data Mining (PAKDD2008)
- Place of Presentation
  Osaka,Japan
- Year and Date
  2008-05-23
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2007 Final Research Report Summary
[Presentation] 検索用キーフレーズの解析及び抽出に関する検討2008
- Author(s)
  長谷川新, 相澤彰子, 浜本隆之
- Organizer
  情報処理学会第70回全国大会予稿集
- Place of Presentation
  東京
- Year and Date
  2008-03-14
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2007 Annual Research Report 2007 Final Research Report Summary
[Presentation] Multi-class named entity recognition via bootstrapping with dependency tree-based patterns2008
- Author(s)
  Van B. Dang, Akiko Aizawa
- Organizer
  the 12nd Pacific-Asia Conference on Knowledge Discovery and Data Mining
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2007 Final Research Report Summary
[Presentation] Webコーパスを用いた語の類似度計算に関する考察2007
- Author(s)
  相澤彰子
- Organizer
  人工知能学会知識ベースシステム研究会
- Place of Presentation
  東京
- Year and Date
  2007-07-14
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2007 Annual Research Report 2007 Final Research Report Summary
[Presentation] On calculating word similarity using Web as corpus2007
- Author(s)
  Akiko Aizawa
- Organizer
  JSAI SIG Technical Reports
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2007 Final Research Report Summary
[Presentation] 類語関係抽出タスクにおけるコーパス規模拡大の影響2006
- Author(s)
  相澤彰子
- Organizer
  第175回自然言語処理研究会/第84回情報学基礎研究会・NL-94
- Place of Presentation
  東京
- Year and Date
  2006-09-12
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2007 Final Research Report Summary
[Presentation] 係り受け関係を利用した類語・例文辞書構築法と大規模コーパスへの適用2006
- Author(s)
  相澤彰子, 中渡瀬秀一
- Organizer
  人工知能学会全国大会(第20回)
- Place of Presentation
  東京
- Year and Date
  2006-06-08
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2007 Final Research Report Summary
[Presentation] 語義の違いを検出するための大規模コーパス処理方法の検討2006
- Author(s)
  相澤彰子
- Organizer
  電子情報通信学会人工知能と知識処理研究会
- Place of Presentation
  東京
- Year and Date
  2006-05-18
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2007 Final Research Report Summary
[Presentation] Detecting Semantic Diversity of Words in Large Scale Corpora2006
- Author(s)
  Akiko Aizawa
- Organizer
  IEICE Tech Reports, AI2006-11
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2007 Final Research Report Summary
[Presentation] Automatic Extraction of Synonyms with Sample Phrases using Dependency Analysis of Text and Its Application to Large-scale Corpora2006
- Author(s)
  Akiko Aizawa, Hidekazu Nakawatase
- Organizer
  The 20th Annual Conference of JSAI
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2007 Final Research Report Summary
[Presentation] On the Effect of Corpus Size in Words Similarity Calculation2006
- Author(s)
  Akiko Aizawa
- Organizer
  SIG-report of IPSJ
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2007 Final Research Report Summary

A study on adaptive indexing method for dedicated portals

Principal Investigator

AIZAWA Akiko National Institute of Informatics, Digital Content and Media Sciences Research Division, Professor (90222447)

¥4,010,000 (Direct Cost: ¥3,500,000、Indirect Cost: ¥510,000)

Report

Research Products

[Journal Article] 大規模テキストコーパスを用いた語の類似度計算に関する考察2008

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] On calculating word similarity using large text corpora2008

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] 類語関係抽出タスクにおけるコーパス規模拡大の影響2008

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 名詞と動詞の依存関係を利用したテキストからのIS-A関係の発見方法2007

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] 共起に基づく類似性尺度2007

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] Discovering IS-A relationships from Text : a method based on Dependencies between Nouns and Verbs2007

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] Co-occurrence based similarity measures2007

Author(s)

Journal Title

Description

Related Report

[Journal Article] 名詞と動詞の依存関係を利用したテキストからのIS-A関係の発見方法2007

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 共起に基づく類似性尺度2007

Author(s)

Journal Title

NAID

Related Report

[Journal Article] テキストを媒体とする情報の伝達をめぐって2007

Author(s)

Journal Title

Related Report

[Journal Article] 語義の違いを検出するための大規模コーパス処理手法の検討2006

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 係り受け関係を利用した類語・例文辞書構築法と大規模コーパスへの適用2006

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 類語関係抽出タスクにおけるコーパス規模拡大の影響2006

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 書誌同定のためのリンケージシステムの試作2006

Author(s)

Journal Title

Related Report

[Presentation] Multi-class named entity recognition via bootstrapping with dependency tree-based patterns2008