最適パターン発見に基づく大規模半構造データからの知的情報獲得システムの開発

Research Project

Project/Area Number	14019070
Research Category	Grant-in-Aid for Scientific Research on Priority Areas
Allocation Type	Single-year Grants
Review Section	Science and Engineering
Research Institution	Kyushu University
Principal Investigator	有村博紀九州大学, 大学院・システム情報科学研究院, 助教授 (20222763)
Co-Investigator(Kenkyū-buntansha)	下薗真一九州工業大学, 情報工学部, 助教授 (70243988) 篠原歩九州大学, 大学院・システム情報科学研究院, 助教授 (00226151) 竹田正幸九州大学, 大学院・システム情報科学研究院, 助教授 (50216909)
Project Period (FY)	2002
Project Status	Completed (Fiscal Year 2002)
Budget Amount *help	¥6,200,000 (Direct Cost: ¥6,200,000) Fiscal Year 2002: ¥6,200,000 (Direct Cost: ¥6,200,000)
Keywords	ウェブマイニング / 最適パターン発見 / XML / 情報抽出 / データストリームマイニング / 知識獲得 / データマイニング
Research Abstract	ネットワーク上に分散したウェブページやXML等の半構造データの急速な増大に対して,これらのコンテンツに直接アクセスするための効率良い手法の開発が緊急の課題となっている.本研究では,大規模半構造データからのデータマイニング(ウェブマイニング)に基づき,大量のデータ解析を対話的に支援する効率的なツールとして,従来の情報検索システムを超えた新しい情報アクセスシステムの開発を目指す. そのための鍵となる技術として,最適パターン発見を木やグラフ構造に拡張して,半構造データに対する頑健かつ高速な最適化パターン発見アルゴリズムを開発する.さらに,ウェブマイニングを(a)有用な情報源の発見,および(b)特徴的なパターンの発見,(c)情報抽出の3つの過程ととらえ,これらを有機的に結合して,半構造データを対象とした知識獲得システムの効率良い実現方式を明らかにすることを目標とする.また,計算量理論と計算学習理論の最新成果を援用し,計算量に徹底的に配慮した高速なアルゴリズムの開発を目指すのも特色である. 平成14年度は,次の研究成果を得た. (a)「有用な情報源の発見」に関しては,従来より表現力の高いパターンである可変長変数パターン(VLDCパターン)に対する新しいテキスト索引構造を開発し,これを用いて,効率よい最適化マイニングアルゴリズムを開発した. (b)「特徴的なパターンの発見」に関しては,XML-MessagingとSOAPへの応用を目指して,昨年開発した半構造データマイニング手法FREQTを元に,高速な半構造データストリームマイニングSTREAMTを開発した.これは,非常に少なく資源を使いながらデータストリームを監視し、有用なパターンを逐次報告するアルゴリズムである.また,FREQTの最適化マイニングへの拡張と理論的性能解析を行い,この方式の最適性を示した. (c)「情報抽出」に関しては,XMLデータ処理の基本技術である符号語列上のパターン照合機械技術を開発し,XMLパターン検索への応用を考察した.

Report

(1 results)

2002 Annual Research Report

Research Products
(6 results)

All Other

All Publications (6 results)

[Publications] 竹田正幸, 篠原歩: "圧縮されたテキスト上のパターン照合-データ圧縮とパターン照合の新展開-"情報処理学会学会誌. 43(7). 763-769 (2002)
- Related Report
  2002 Annual Research Report
[Publications] 有村博紀, 坂本比呂志: "データマイニングにおける最適パターン発見"応用数理,応用数理学会. 12(4). 32-44 (2002)
- Related Report
  2002 Annual Research Report
[Publications] T.Asai et al.: "Online Algorithms for Mining Semi-structured Data Stream"Proc. IEEE International Conference on Data Mining(ICDM'02). 27-34 (2002)
- Related Report
  2002 Annual Research Report
[Publications] K.Abe et al.: "Optimized Substructure Discovery for Semi-structured Data"Proc. 6th European Conf. on Principles and Practice of Knowledge Discovery in Databases(PKDD-2002). LNAI2431. 1-14 (2002)
- Related Report
  2002 Annual Research Report
[Publications] H.Arimura: "Efficient Text Mining with Optimized Pattern Discovery"Proc. the 13th Annual Symposium on Combinatorial Pattern Matching(CPM'02). LNCS2373. 17-19 (2002)
- Related Report
  2002 Annual Research Report
[Publications] T.Asai et al.: "Efficient Substructure Discovery from Large Semi-structured Data"Proc. Second SIAM International Conference on Data Mining(SDM'02). 158-174 (2002)
- Related Report
  2002 Annual Research Report

最適パターン発見に基づく大規模半構造データからの知的情報獲得システムの開発

Principal Investigator

有村 博紀 九州大学, 大学院・システム情報科学研究院, 助教授 (20222763)

¥6,200,000 (Direct Cost: ¥6,200,000)

Report

Research Products

[Publications] 竹田 正幸, 篠原 歩: "圧縮されたテキスト上のパターン照合-データ圧縮とパターン照合の新展開-"情報処理学会学会誌. 43(7). 763-769 (2002)

Related Report

[Publications] 有村 博紀, 坂本 比呂志: "データマイニングにおける最適パターン発見"応用数理,応用数理学会. 12(4). 32-44 (2002)

Related Report

[Publications] T.Asai et al.: "Online Algorithms for Mining Semi-structured Data Stream"Proc. IEEE International Conference on Data Mining(ICDM'02). 27-34 (2002)

Related Report

[Publications] K.Abe et al.: "Optimized Substructure Discovery for Semi-structured Data"Proc. 6th European Conf. on Principles and Practice of Knowledge Discovery in Databases(PKDD-2002). LNAI2431. 1-14 (2002)

Related Report

[Publications] H.Arimura: "Efficient Text Mining with Optimized Pattern Discovery"Proc. the 13th Annual Symposium on Combinatorial Pattern Matching(CPM'02). LNCS2373. 17-19 (2002)

Related Report

[Publications] T.Asai et al.: "Efficient Substructure Discovery from Large Semi-structured Data"Proc. Second SIAM International Conference on Data Mining(SDM'02). 158-174 (2002)

Related Report

有村博紀九州大学, 大学院・システム情報科学研究院, 助教授 (20222763)

[Publications] 竹田正幸, 篠原歩: "圧縮されたテキスト上のパターン照合-データ圧縮とパターン照合の新展開-"情報処理学会学会誌. 43(7). 763-769 (2002)

[Publications] 有村博紀, 坂本比呂志: "データマイニングにおける最適パターン発見"応用数理,応用数理学会. 12(4). 32-44 (2002)