2001 Fiscal Year Annual Research Report

ネットサーチエンジンにおける表構造の索引化と意味的多義性解消への応用

Research Project

Project/Area Number	13780336
Research Category	Grant-in-Aid for Encouragement of Young Scientists (A)
Research Institution	The University of Tokushima
Principal Investigator	獅々堀正幹徳島大学, 工学部, 助教授 (50274262)
Keywords	WWW / 表構造 / 索引化 / HTML / 知識獲得 / 情報検索 / 情報抽出
Research Abstract	本研究は,WWW空間上に存在するHTML形式の表構造から言語学的な知識を自動獲得することを目的としている.従来,WWW空間上のデータを対象にしたネットサーチエンジンに代表される全文検索技術では,HTMLタグ情報を考慮していないため,表構造内の単語については,各項目間の関係が無視されていた.しかしながら,表構造内の各項目には,属性と属性値の関係が成り立つものが多数存在しており,大規模な表構造を収集すれば,言語学的な知識が抽出できると考えている. そこで,上記の目的を実現するため,本研究の実施計画として,1年目には,「表構造から各項目の位置情報を生成する表構造解析アルゴリズムの確立と効率的な索引化手法の考案・評価」を計画していた. 位置情報の表現方法に関しては,領域分割のセグメンテーション法を応用し,位置情報をコンパクトなビット列で表現する手法を考案した.本手法を用いると,位置情報がコンパクトに表現できるだけでなく,表構造内において縦横の位置に存在する項目を高速に検索することが可能になる.また,単純な表構造だけでなく,COLSPANやROWSPANが使用された複雑な表構造にも対応可能である.更に,WWW空間上の表構造は,ブラウザーでの表示形式の関係から,横には短いが,縦には長い構造を持つ.特に,組織表のように大規模な表構造に対しては,位置情報を表すビット列が非常に長くなるが,本手法では,表構造を階層的に分割管理することにより,その問題点を解決している.本手法により作成した位置情報を研究代表者が以前に考案したPaCB-treeと呼ばれる索引に格納することで,より高速な検索を可能にした. 本研究成果は,情報処理学会の自然言語処理研究会およびデータベースシステム研究会にて既に口頭発表しているが,研究成果をまとめ,情報処理学会論文誌に投稿する予定である.

Research Products
(7 results)

All Other

All Publications (7 results)

[Publications] Masami Shishibori, Kazuaki ando, Jun-ichi Aoe: "A Filtering Method for E-mail Documents based on Personal Profiles"Proceedings of the 19th Int'l Conf. on Computer Processing of Oriental Languages. 69-72 (2001)
[Publications] Masami Shishibori, Minsoo Jung, Satoru Tsuge, Jun-ichi Aoe: "Improvement of the Hierarchical Compact Patricia Trie for a Dynamic Large Key Set"Proceeding of 5th International Conference on Knowledge-Based Intelligent Information Engineering Systems & Allied Technologies. 7. 581-585 (2001)
[Publications] Masami Shishibori, Kazuaki Ando, Jun-ichi Aoe: "A E-mail Filtering System Based on Personal Profiles"Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium. 609-616 (2001)
[Publications] EL-Sayed Atlam, Makoto Okada, Masami Shishibori, Jun-ichi Aoe: "An Evaluation Method of Words Tendency Depending on Time-series Variation and Its Improvements"Journal of Information Processing & Management. Vol38, No2. 157-171 (2002)
[Publications] Sangkon Lee, Masami Shishibori, Toru Sumitomo, Jun-ichi Aoe: "Extraction of Field-coherent Passages"Journal of Information Processing & Management. Vol38, No.2. 173-207 (2002)
[Publications] Minsoo Jung, Masami Shishibori, Akihiro Tanaka, Jun-ichi Aoe: "A Dynamic Construction Algorithm for the Compact Patricia Trie using the Hierarchical Structure"Journal of Information Processing & Management. Vol38, No2. 221-236 (2002)
[Publications] 北研二, 津田和彦, 獅々堀正幹: "情報検索アルゴリズム"共立出版株式会社. 212 (2002)

2001 Fiscal Year Annual Research Report

ネットサーチエンジンにおける表構造の索引化と意味的多義性解消への応用

Principal Investigator

獅々堀 正幹 徳島大学, 工学部, 助教授 (50274262)

Research Products

[Publications] Masami Shishibori, Kazuaki ando, Jun-ichi Aoe: "A Filtering Method for E-mail Documents based on Personal Profiles"Proceedings of the 19th Int'l Conf. on Computer Processing of Oriental Languages. 69-72 (2001)

[Publications] Masami Shishibori, Kazuaki Ando, Jun-ichi Aoe: "A E-mail Filtering System Based on Personal Profiles"Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium. 609-616 (2001)

[Publications] EL-Sayed Atlam, Makoto Okada, Masami Shishibori, Jun-ichi Aoe: "An Evaluation Method of Words Tendency Depending on Time-series Variation and Its Improvements"Journal of Information Processing & Management. Vol38, No2. 157-171 (2002)

[Publications] Sangkon Lee, Masami Shishibori, Toru Sumitomo, Jun-ichi Aoe: "Extraction of Field-coherent Passages"Journal of Information Processing & Management. Vol38, No.2. 173-207 (2002)

[Publications] Minsoo Jung, Masami Shishibori, Akihiro Tanaka, Jun-ichi Aoe: "A Dynamic Construction Algorithm for the Compact Patricia Trie using the Hierarchical Structure"Journal of Information Processing & Management. Vol38, No2. 221-236 (2002)

[Publications] 北研二, 津田和彦, 獅々堀正幹: "情報検索アルゴリズム"共立出版株式会社. 212 (2002)

獅々堀正幹徳島大学, 工学部, 助教授 (50274262)