最適パターン発見に基づく大規模半構造データからの知的情報獲得システムの開発

Research Project

Project/Area Number	13224073
Research Category	Grant-in-Aid for Scientific Research on Priority Areas (C)
Allocation Type	Single-year Grants
Review Section	Science and Engineering
Research Institution	Kyushu University
Principal Investigator	有村博紀九州大学, 大学院・システム情報科学研究院, 助教授 (20222763)
Co-Investigator(Kenkyū-buntansha)	篠原歩九州大学, 大学院・システム情報科学研究院, 助教授 (00226151) 竹田正幸九州大学, 大学院・システム情報科学研究院, 助教授 (50216909) 坂本比呂志九州大学, 大学院・システム情報科学研究院, 助手 (50315123) 下薗真一九州工業大学, 情報工学部, 助教授 (70243988)
Project Period (FY)	2001
Project Status	Completed (Fiscal Year 2001)
Keywords	ウェブマイニング / 最適パターン発見 / エピソードパターン / ウェブからの情報抽出 / 半構造データ / 知識獲得 / データマイニング
Research Abstract	ネットワーク上に分散したウェブページやXML等の半構造データの急速な増大に対して,これらのコンテンツに直接アクセスするための効率良い手法の開発が緊急の課題となっている.本研究では,大規模半構造データからのデータマイニング(ウェブマイニング)に基づき,大量のデータ解析を対話的に支援する効率的なツールとして,従来の情報検索システムを超えた新しい情報アクセスシステムの開発を目指す. そのために,鍵となる技術として,最適パターン発見を木やグラフ構造に拡張して,半構造データに対する頑健かつ高速な最適化パターン発見アルゴリズムを開発する.さらに,ウェブマイニングを(a)有用な情報源の発見,および(b)特徴的なパターンの発見,(c)情報抽出の3つの過程からなると考え,これらを有機的に結合して,半構造データを対象とした知識獲得システムの効率良い実現方式を明らかにすることを目標とする,また,計算量理論と計算学習理論の最新の成果を援用して,計算量に徹底的に配慮した高速なアルゴリズムの開発を目指すことも特色である. 平成13年度は,次の研究成果を得た. (a)「有用な情報源の発見」に関しては,部分系列パターンとエピソードパターンと呼ぶ組合せパターンに対する効率よい最適化マイニングアルゴリズムを開発し,これを文字列分類のための決定木学習アルゴリズムBONSAIに組み込んだ. (b)「特徴的なパターンの発見」に関しては,半構造データを最も基本的なラベル付き順序木(labeled ordered trees)のクラスとしてモデル化し,データ中の頻出共通部分構造に対する高速な発見アルゴリズムを開発した.木に関するパターン発見問題は,一般に高い計算量をもつことが多い.そこで,最右枝拡張法という効率よい発見手法を与え,これを複数の最適化手法と組み合わせて,半構造データに対する高速なマイニングアルゴリズムを与えた. (c)「情報抽出」に関しては,ウェブからの情報抽出問題を考察し,HTMLデータから木構造の情報を利用して必要な情報を効率よく切り出すTree-Wrapperアルゴリズムを開発した.

Report

(1 results)

2001 Annual Research Report

Research Products
(6 results)

All Other

All Publications (6 results)

[Publications] T.Asai, et al.(第4著者): "Efficient Substructure Discovery from Large Semi-structured Data"Proc.Second SIAM International Conference on Data Mining 2002 (SDM'02). (発表予定). (2002)
- Related Report
  2001 Annual Research Report
[Publications] H.Arimura, et al.: "Efficient Learning of Semi-structured Data from Queries"Proc.the 12th International Conference on Algorithmic Learning Theory (ALT'O1). LNAI 2225. 315-331 (2001)
- Related Report
  2001 Annual Research Report
[Publications] K.Taniguchi, et al.(第3著者): "Mining Semi-Structured Data by Path Expressions"Proc.the 4th International Conference on Discovery Science. LNAI 2226. 378-388 (2001)
- Related Report
  2001 Annual Research Report
[Publications] A.Yamamolo et al.(第4著者): "Deductive and inductive reasoning on semi-structured documents modelled with hedges"Proc.the 11th International Conference on Inductive Logic Programming (ILP'O1). LNAI 2157. 140-147 (2001)
- Related Report
  2001 Annual Research Report
[Publications] M.Hirao et al.: "A Practical Algorithm to Find the Best Episode Patterns"Proc.the 4th International Conference on Discovery Science. LNCS 2226. 435-440 (2001)
- Related Report
  2001 Annual Research Report
[Publications] 村上他(第3著者): "HTMLからのテキストの自動切りだしアルゴリズムと実装"情報処理学会論文誌:数理モデル化と応用. 42:SIG14 (TOM5). 39-49 (2001)
- Related Report
  2001 Annual Research Report

最適パターン発見に基づく大規模半構造データからの知的情報獲得システムの開発

Principal Investigator

有村 博紀 九州大学, 大学院・システム情報科学研究院, 助教授 (20222763)

Report

Research Products

[Publications] T.Asai, et al.(第4著者): "Efficient Substructure Discovery from Large Semi-structured Data"Proc.Second SIAM International Conference on Data Mining 2002 (SDM'02). (発表予定). (2002)

Related Report

[Publications] H.Arimura, et al.: "Efficient Learning of Semi-structured Data from Queries"Proc.the 12th International Conference on Algorithmic Learning Theory (ALT'O1). LNAI 2225. 315-331 (2001)

Related Report

[Publications] K.Taniguchi, et al.(第3著者): "Mining Semi-Structured Data by Path Expressions"Proc.the 4th International Conference on Discovery Science. LNAI 2226. 378-388 (2001)

Related Report

[Publications] A.Yamamolo et al.(第4著者): "Deductive and inductive reasoning on semi-structured documents modelled with hedges"Proc.the 11th International Conference on Inductive Logic Programming (ILP'O1). LNAI 2157. 140-147 (2001)

Related Report

[Publications] M.Hirao et al.: "A Practical Algorithm to Find the Best Episode Patterns"Proc.the 4th International Conference on Discovery Science. LNCS 2226. 435-440 (2001)

Related Report

[Publications] 村上 他(第3著者): "HTMLからのテキストの自動切りだしアルゴリズムと実装"情報処理学会論文誌:数理モデル化と応用. 42:SIG14 (TOM5). 39-49 (2001)

Related Report

有村博紀九州大学, 大学院・システム情報科学研究院, 助教授 (20222763)

[Publications] 村上他(第3著者): "HTMLからのテキストの自動切りだしアルゴリズムと実装"情報処理学会論文誌:数理モデル化と応用. 42:SIG14 (TOM5). 39-49 (2001)