部分着目型XML情報検索システムの開発とその利用に関する研究

Research Project

Project/Area Number	14780325
Research Category	Grant-in-Aid for Young Scientists (B)
Allocation Type	Single-year Grants
Research Field	情報システム学(含情報図書館学)
Research Institution	Nara Institute of Science and Technology
Principal Investigator	波多野賢治奈良先端科学技術大学院大学, 情報科学研究科, 助手 (80314532)
Project Period (FY)	2002 – 2003
Project Status	Completed (Fiscal Year 2003)
Budget Amount *help	¥3,400,000 (Direct Cost: ¥3,400,000) Fiscal Year 2003: ¥1,700,000 (Direct Cost: ¥1,700,000) Fiscal Year 2002: ¥1,700,000 (Direct Cost: ¥1,700,000)
Keywords	部分着目型検索システム / XML / 統計量解析 / 部分文書粒度決定 / パフォーマンス / 検索精度 / 最適粒度決定 / 検索速度
Research Abstract	Web検索エンジンの検索対象はWebページであり,システムがページから抽出した出現単語を基に転置ファイルを生成し,それを利用することで検索作業を行っている.Web検索エンジンはWebページの特長であるリンク構造を考慮して単語の重み付けを行っているため,Webページを検索するという点では有用であるが,検索対象をWebページとしている以上,基本的にはページに出現している単語に着目しているため,Webページ中に利用者が検索要求として利用したキーワードが含まれていれば,その検索要求に対するWebページの類似度が高くなり,結果的に検索結果として返されてしまう.つまり,利用者が検索結果を閲覧する際にWebページのどの部分が検索要求を満たしているのか非常にわかりにくく,さらに検索要求を満たす部分を改めて探さなければならないという問題点が存在する. そのような問題点を解決するために,本研究では二年間に渡りXML文書の持つ単語の統計量を利用して,検索結果から利用者に返される回答として相応しいXML部分文書を決定するアルゴリズムを提案し,そのアルゴリズムを実装した部分着目型XML情報検索システムの構築を行った.また,従来のTF-IDFによる単語の重み付けを改良し,構造化文書に相応しい新しい重み付け法の提案を行った. 本年度の研究成果には,XML文書から抽出されるXML部分文書のうち,統計的に安定している(特異でない)XML部分文書だけを索引付けしたところ,索引付けされたXML部分文書数が提案手法を採用しない場合に比べ約12%に削減され,それに伴い,索引生成時間が約5倍,検索時間が約3倍高速となったのに加え,平均適合率も3%改善される結果となった.以上の結果より,提案手法によって,検索システムの処理速度および検索精度の両方が改善されることが判明した.

Report

(2 results)

2003 Annual Research Report
2002 Annual Research Report

Research Products

(7 results)

All Other

All Publications (7 results)

[Publications] 波多野賢治, 絹谷弘子, 吉川正俊, 植村俊亮: "キーワードを利用したXML文書検索のための検索結果粒度決定法"日本データベース学会Letters. Vol.2, No.1. 123-126 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Kazunari Sugiyama et al.: "Refinement of TF-IDF Schemes for Web Pages using their Hyperlinked Neighboring Pages"Proceedings of the 14th Conference on Hypertext and Hypermedia (HT'03). 198-207 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Kenji Hatano et al.: "An Evaluation of INEX 2003 Relevance Assessments"INEX 2003 Workshop Proceedings. 25-32 (2003)
- Related Report
  2003 Annual Research Report
[Publications] 杉山一成, 波多野賢治, 吉川正俊, 植村俊亮: "ハイバーリンクで結ばれた隣接ページの内容に基づくWebページのためのTF-IDF法の改良"電子情報通信学会論文誌. Vol.J87-D-I No.2. 113-125 (2004)
- Related Report
  2003 Annual Research Report
[Publications] K.Hatano, H.Kinutani, M.Yoshikawa, S.Uemura: "Extraction of Partial XML Documents Using IR-based Structure and Content Analysis"Conceptual Modeling for New Information Systems Technologies. LNCS Vol.2465. 334-347 (2002)
- Related Report
  2002 Annual Research Report
[Publications] K.Hatano, H.Kinutani, M.Yoshikawa, S.Uemura: "Information Retrieval System for XML Documents"Proceedings of the 13^<th> International Conference on Database and Expert Systems Applications (DEXA 2002). LNCS Vol.2453. 758-767 (2002)
- Related Report
  2002 Annual Research Report
[Publications] K.Hatano, H.Kunutani, M.Yoshikawa, S.Uemura: "Determining the Unit of Retrieval Results for XML Documents"Proceedings of the First Workshop of the Initiative for the Evaluation of XML Retrieval (INEX). (in press). (2003)
- Related Report
  2002 Annual Research Report

部分着目型XML情報検索システムの開発とその利用に関する研究

Principal Investigator

波多野 賢治 奈良先端科学技術大学院大学, 情報科学研究科, 助手 (80314532)

¥3,400,000 (Direct Cost: ¥3,400,000)

Report

Research Products

[Publications] 波多野賢治, 絹谷弘子, 吉川正俊, 植村俊亮: "キーワードを利用したXML文書検索のための検索結果粒度決定法"日本データベース学会Letters. Vol.2, No.1. 123-126 (2003)

Related Report

[Publications] Kazunari Sugiyama et al.: "Refinement of TF-IDF Schemes for Web Pages using their Hyperlinked Neighboring Pages"Proceedings of the 14th Conference on Hypertext and Hypermedia (HT'03). 198-207 (2003)

Related Report

[Publications] Kenji Hatano et al.: "An Evaluation of INEX 2003 Relevance Assessments"INEX 2003 Workshop Proceedings. 25-32 (2003)

Related Report

[Publications] 杉山一成, 波多野賢治, 吉川正俊, 植村俊亮: "ハイバーリンクで結ばれた隣接ページの内容に基づくWebページのためのTF-IDF法の改良"電子情報通信学会論文誌. Vol.J87-D-I No.2. 113-125 (2004)

Related Report

[Publications] K.Hatano, H.Kinutani, M.Yoshikawa, S.Uemura: "Extraction of Partial XML Documents Using IR-based Structure and Content Analysis"Conceptual Modeling for New Information Systems Technologies. LNCS Vol.2465. 334-347 (2002)

Related Report

[Publications] K.Hatano, H.Kinutani, M.Yoshikawa, S.Uemura: "Information Retrieval System for XML Documents"Proceedings of the 13^<th> International Conference on Database and Expert Systems Applications (DEXA 2002). LNCS Vol.2453. 758-767 (2002)

Related Report

[Publications] K.Hatano, H.Kunutani, M.Yoshikawa, S.Uemura: "Determining the Unit of Retrieval Results for XML Documents"Proceedings of the First Workshop of the Initiative for the Evaluation of XML Retrieval (INEX). (in press). (2003)

Related Report

波多野賢治奈良先端科学技術大学院大学, 情報科学研究科, 助手 (80314532)