2001 年度研究成果報告書概要

大規模半構造化テキストデータからの高速データマイニング・システムの開発

研究課題

研究課題/領域番号	11558040
研究種目	基盤研究(B)
配分区分	補助金
応募区分	展開研究
研究分野	知能情報学
研究機関	九州大学
研究代表者	有村博紀九州大学, 大学院・システム情報科学研究院, 助教授 (20222763)
研究分担者	篠原歩九州大学, 大学院・システム情報科学研究院, 助教授 (00226151) 竹田正幸九州大学, 大学院・システム情報科学研究院, 助教授 (50216909) 正代隆義九州大学, 大学院・システム情報科学研究院, 助教授 (50226304) 平田耕一九州工業大学, 情報工学部, 助教授 (20274558) 石野明九州大学, 大学院・システム情報科学研究院, 助手 (10315129)
研究期間 (年度)	1999 – 2001
キーワード	Webマイニング / 半構造化テキスト / HTML / XML / 最適パターン発見 / 接尾辞配列 / データ圧縮 / パターン照合
研究概要	本研究では,以下の三つの研究項目について研究を展開した. 1.半構造化文書からのデータマイニング方式.大量テキストからのテキストマイニング問題を考察し,これを情報検索の逆問題として定式化し,とくに,雑音の多い不完全なデータにおける頑健なパターン発見のために,統計的尺度を最適化するパターンを発見する最適パターン発見の枠組みを採用した.近接部分語パターンと呼ばれる単純なテキストパターンに対して,ランダムテキスト上できわめて高速にはたらく,最適パターン発見アルゴリズムを開発し,ウェブからのキーワード獲得問題や,対話的文書ブラウジングに適用した.さらに,ウェブやXMLデータなどの大規模半構造化文書を,「半構造化文書=テキスト+構造+属性データ」ととらえて,テキストマイニングの枠組みを木やグラフ構造に拡張した. 2.大量テキストデータに対する高速パターン照合方式.現実の大規模テキストデータベースシステムでは,大量のテキストデータを格納するため,テキストを圧縮して扱うことが多い.そのため,圧縮データ上のパターン照合アルゴリズムに力点をおいて研究した.これは,圧縮されたデータを陽に展開することなくパターン照合を行おうとするものである.本アプローチの独創的な点は,単にデータを圧縮することで記憶領域を削減するだけでなく,さらに,圧縮することでパターン照合そのものを高速化させるという狙いをもつことである.本研究では,一連の研究を通じて,一番目の目標だけでなく,二番目の目標も達成できることを実証した.さらに,既存のさまざまな圧縮方式に対して,その圧縮方式に適した圧縮照合アルゴリズムを開発すると同時に,より高い見地から多様な圧縮照合アルゴリズムを統一的にとらえる枠組みを提案することに成功した. 3.機械学習に基づくデータマイニング方式.一連の半構造化文書からの情報抽出問題を理論的に定式化し,与えられたデータからパターンを発見する問題の学習可能性と限界を理論的に明らかにした.次に,Tree Wrapperや生垣とよばれる木と文字列の双方の性質をもつ木構造パターンに対して,半構造化文書からの情報抽出のための効率よい情報抽出アルゴリズムを開発した.さらに,実際のウェブデータを対象として,さまざまなタイプの半構造化文書から,利用者が必要とする情報を獲得するという情報獲得実験を行い,その有効性を検証した.

研究成果
(34件)

すべてその他

すべて文献書誌 (34件)

[文献書誌] H.Arimura et al.: "Efficient Learning of Semi-Structured Data from Queries"Lecture Notes in Artificial Intelligence. 2225. 315-331 (2001)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] M.Takeda et al.: "Mining from Literary Texts : Pattern Discovery and Similarity Computation"Lecture Notes in Computer Science. 2281. 520-533 (2002)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] T.Shoudai et al.: "Polynomial Time Algorithms for Finding Unordered Tree Patterns with Internal Variables"Lecture Notes in Computer Science. 2138. 335-346 (2001)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] K.Yamamoto et al.: "Discovering Repetitive Expressions and Affinities from Anthologies of Classical Japanese Poems"Lecture Notes in Artificial Intelligence. 2226. 413-425 (2001)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] A.Yamamoto et al.: "Deductive and Inductive Reasoning on Semi-Structured Documents Modeled with Hedges"Lecture Notes in Artificial Intelligence. 2157. 140-147 (2001)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] K.Hirata et al.: "Prediction-Preserving Reducibility with Membership Queries on Formal Languages"Lecture Notes in Computer Science. 2138. 172-183 (2001)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] T.Kodota et al.: "Musical Sequence Comparison for Melodic and Rhythmic Similarities"Proc. 8th International Symposium on String Processing and Information Retrieval (SPJRE2001). 111-122 (2001)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] S.Inenaga et al.: "On-Line Construction of Symmetric Compact Directed Acyclic Word Graphs"Proc. 8th International Symposium on String Processing and Information Retrieval (SPJRE2001). 96-110 (2001)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] H.Hori et al.: "Fragmentary Pattern Matching : Complexity, Algorithms aid Applications for Analyzing Classic Literary Works"Proc. 12th Annual International Symposium on Algorithms and Computation (ISAAC' 01). 719-730 (2001)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] M.Takeda: "String resemblace system : A unifying framework for string similarity with applications to literature and music"Lecture Notes in Computer Science. 2089. 147-151 (2001)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] T.Kida et al.: "Multiple pattern matching algorithms on collage system"Lecture Notes in Computer Science. 2089. 193-206 (2001)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] Tetsuya Nasukawa et al.: "Base Technology for Text Mining"Journal of Japanese Society for Artificial Intelligence. 16(2). 201-211 (2001)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] Hiroshi Sakamoto et al.: "Web Mining"Journal of Japanese Society for Artificial Intelligence. 16(2). 233-238 (2001)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] Hiroshi Sakamoto et al.: "Extracting Partial Structures from HTML Documents"Proc. the 14th Florida Artificial Intelligence Research Symposium (FLAIRS' 2001). 264-268 (2001)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] Hiroki Arimura et al.: "Efficient Discovery of Proximity Patterns with Suffix Arrays"Lecture Notes in Computer Science. 2089. 152-156 (2001)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] Toru Kasai et al.: "Linear-time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications"Lecture Notes in Computer Science. 2089. 181-192 (2001)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] Katsuaki Taniguchi et al.: "Mining Semi-Structured Data by Path Expressions"Lecture Notes in Artificial Intelligence. 2226. 378-388 (2001)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] H. Arimura et al.: "Efficient Learning of Semi-Structured Data from Queries"Lecture Notes in Artificial Intelligence. 2225. 315-331 (2001)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] M. Takeda et al.: "Mining from Literary Texts : Pattern Discovery and Similarity Computation"Lecture Notes in Computer Science. 2281. 520-533 (2002)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] T. Shoudai et al.: "Polynomial Time Algorithms for Finding Unordered Tree Patterns with Internal Variables"Lecture Notes in Computer Science. 2138. 335-346 (2001)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] K. Yamamoto et al.: "Discovering Repetitive Expressions and Affinities from Anthologies of Classical Japanese Poems"Lecture Notes in Artificial Intelligence. 2226. 413-425 (2001)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] A. Yamamoto et al.: "Deductive and Inductive Reasoning on Semi-Structured Documents Modeled with Hedges"Lecture Notes in Artificial Intelligence. 2157. 140-147 (2001)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] K. Hirata et al.: "Prediction-Preserving Reducibility with Membership Queries on Formal Languages"Lecture Notes in Computer Science. 2138. 172-183 (2001)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] T. Kadota et al.: "Musical Sequence Comparison for Melodic and Rhythmic Similarities"Proc. 8th International Symposium on String Processing and Information Retrieval (SPIRE2001). 111-122 (2001)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] S. Inenaga et al.: "On-Line Construction of Symmetric Compact Directed Acyclic Word Graphs"Proc. 8th International Symposium on String Processing and Information Retrieval (SPIRE2001). 96-110 (2001)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] H. Hori et al.: "Fragmentary Pattern Matching : Complexity, Algorithms and Applications for Analyzing Classic Literary Works"Proc. 12th Annual International Symposium on Algorithms and Computation (ISAAC'01). 719-730 (2001)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] M. Takeda: "String resemblance system : A unifying framework for string similarity with applications to literature and music"Lecture Notes in Computer Science. 2089. 147-151 (2001)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] T. Kida et al.: "Multiple pattern matching algorithms on college system"Lecture Notes in Computer Science. 2089. 193-206 (2001)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] Tetsuya Nasukawa et al.: "Base Technology for Text Mining"Journal of Japanese Society for Artificial Intelligence. 16 (2). 201-211 (2001)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] Hiroshi Sakamoto et al.: "Web Mining"Journal of Japanese Society for Artificial Intelligence. 16 (2). 233-238 (2001)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] Hiroshi Sakamoto et al.: "Extracting Partial Structures from HTML Documents"Proc. the 14th Florida Artificial Intelligence Research Symposium (FLAIRS'2001). 264-268 (2001)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] Hiroki Arimura et al.: "Efficient Discovery of Proximity Patterns with Suffix Arrays"Lecture Notes in Computer Science. 2089. 152-256 (2001)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] Toru Kasai et al.: "Linear-time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications"Lecture Notes in Computer Science. 2089. 181-192 (2001)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] Katsuaki Taniguchi et al.: "Mining Semi-Structured Data by Path Expressions"Lecture Notes in Artificial Intelligence. 2226. 378-388 (2001)
- 説明
  「研究成果報告書概要(欧文)」より

2001 年度 研究成果報告書概要

大規模半構造化テキストデータからの高速データマイニング・システムの開発

研究代表者

有村 博紀 九州大学, 大学院・システム情報科学研究院, 助教授 (20222763)

研究成果

[文献書誌] H.Arimura et al.: "Efficient Learning of Semi-Structured Data from Queries"Lecture Notes in Artificial Intelligence. 2225. 315-331 (2001)

説明

[文献書誌] M.Takeda et al.: "Mining from Literary Texts : Pattern Discovery and Similarity Computation"Lecture Notes in Computer Science. 2281. 520-533 (2002)

説明

[文献書誌] T.Shoudai et al.: "Polynomial Time Algorithms for Finding Unordered Tree Patterns with Internal Variables"Lecture Notes in Computer Science. 2138. 335-346 (2001)

説明

[文献書誌] K.Yamamoto et al.: "Discovering Repetitive Expressions and Affinities from Anthologies of Classical Japanese Poems"Lecture Notes in Artificial Intelligence. 2226. 413-425 (2001)

説明

[文献書誌] A.Yamamoto et al.: "Deductive and Inductive Reasoning on Semi-Structured Documents Modeled with Hedges"Lecture Notes in Artificial Intelligence. 2157. 140-147 (2001)

説明

[文献書誌] K.Hirata et al.: "Prediction-Preserving Reducibility with Membership Queries on Formal Languages"Lecture Notes in Computer Science. 2138. 172-183 (2001)

説明

[文献書誌] T.Kodota et al.: "Musical Sequence Comparison for Melodic and Rhythmic Similarities"Proc. 8th International Symposium on String Processing and Information Retrieval (SPJRE2001). 111-122 (2001)

説明

[文献書誌] S.Inenaga et al.: "On-Line Construction of Symmetric Compact Directed Acyclic Word Graphs"Proc. 8th International Symposium on String Processing and Information Retrieval (SPJRE2001). 96-110 (2001)

説明

[文献書誌] H.Hori et al.: "Fragmentary Pattern Matching : Complexity, Algorithms aid Applications for Analyzing Classic Literary Works"Proc. 12th Annual International Symposium on Algorithms and Computation (ISAAC' 01). 719-730 (2001)

説明

[文献書誌] M.Takeda: "String resemblace system : A unifying framework for string similarity with applications to literature and music"Lecture Notes in Computer Science. 2089. 147-151 (2001)

説明

[文献書誌] T.Kida et al.: "Multiple pattern matching algorithms on collage system"Lecture Notes in Computer Science. 2089. 193-206 (2001)

説明

[文献書誌] Tetsuya Nasukawa et al.: "Base Technology for Text Mining"Journal of Japanese Society for Artificial Intelligence. 16(2). 201-211 (2001)

説明

[文献書誌] Hiroshi Sakamoto et al.: "Web Mining"Journal of Japanese Society for Artificial Intelligence. 16(2). 233-238 (2001)

説明

[文献書誌] Hiroshi Sakamoto et al.: "Extracting Partial Structures from HTML Documents"Proc. the 14th Florida Artificial Intelligence Research Symposium (FLAIRS' 2001). 264-268 (2001)

説明

[文献書誌] Hiroki Arimura et al.: "Efficient Discovery of Proximity Patterns with Suffix Arrays"Lecture Notes in Computer Science. 2089. 152-156 (2001)

説明

[文献書誌] Toru Kasai et al.: "Linear-time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications"Lecture Notes in Computer Science. 2089. 181-192 (2001)

説明

[文献書誌] Katsuaki Taniguchi et al.: "Mining Semi-Structured Data by Path Expressions"Lecture Notes in Artificial Intelligence. 2226. 378-388 (2001)

説明

[文献書誌] H. Arimura et al.: "Efficient Learning of Semi-Structured Data from Queries"Lecture Notes in Artificial Intelligence. 2225. 315-331 (2001)

説明

[文献書誌] M. Takeda et al.: "Mining from Literary Texts : Pattern Discovery and Similarity Computation"Lecture Notes in Computer Science. 2281. 520-533 (2002)

説明

[文献書誌] T. Shoudai et al.: "Polynomial Time Algorithms for Finding Unordered Tree Patterns with Internal Variables"Lecture Notes in Computer Science. 2138. 335-346 (2001)

説明

[文献書誌] K. Yamamoto et al.: "Discovering Repetitive Expressions and Affinities from Anthologies of Classical Japanese Poems"Lecture Notes in Artificial Intelligence. 2226. 413-425 (2001)

説明

[文献書誌] A. Yamamoto et al.: "Deductive and Inductive Reasoning on Semi-Structured Documents Modeled with Hedges"Lecture Notes in Artificial Intelligence. 2157. 140-147 (2001)

説明

[文献書誌] K. Hirata et al.: "Prediction-Preserving Reducibility with Membership Queries on Formal Languages"Lecture Notes in Computer Science. 2138. 172-183 (2001)

説明

[文献書誌] T. Kadota et al.: "Musical Sequence Comparison for Melodic and Rhythmic Similarities"Proc. 8th International Symposium on String Processing and Information Retrieval (SPIRE2001). 111-122 (2001)

説明

[文献書誌] S. Inenaga et al.: "On-Line Construction of Symmetric Compact Directed Acyclic Word Graphs"Proc. 8th International Symposium on String Processing and Information Retrieval (SPIRE2001). 96-110 (2001)

説明

[文献書誌] H. Hori et al.: "Fragmentary Pattern Matching : Complexity, Algorithms and Applications for Analyzing Classic Literary Works"Proc. 12th Annual International Symposium on Algorithms and Computation (ISAAC'01). 719-730 (2001)

説明

[文献書誌] M. Takeda: "String resemblance system : A unifying framework for string similarity with applications to literature and music"Lecture Notes in Computer Science. 2089. 147-151 (2001)

説明

[文献書誌] T. Kida et al.: "Multiple pattern matching algorithms on college system"Lecture Notes in Computer Science. 2089. 193-206 (2001)

説明

[文献書誌] Tetsuya Nasukawa et al.: "Base Technology for Text Mining"Journal of Japanese Society for Artificial Intelligence. 16 (2). 201-211 (2001)

説明

[文献書誌] Hiroshi Sakamoto et al.: "Web Mining"Journal of Japanese Society for Artificial Intelligence. 16 (2). 233-238 (2001)

説明

[文献書誌] Hiroshi Sakamoto et al.: "Extracting Partial Structures from HTML Documents"Proc. the 14th Florida Artificial Intelligence Research Symposium (FLAIRS'2001). 264-268 (2001)

説明

[文献書誌] Hiroki Arimura et al.: "Efficient Discovery of Proximity Patterns with Suffix Arrays"Lecture Notes in Computer Science. 2089. 152-256 (2001)

説明

[文献書誌] Toru Kasai et al.: "Linear-time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications"Lecture Notes in Computer Science. 2089. 181-192 (2001)

説明

[文献書誌] Katsuaki Taniguchi et al.: "Mining Semi-Structured Data by Path Expressions"Lecture Notes in Artificial Intelligence. 2226. 378-388 (2001)

説明

2001 年度研究成果報告書概要

有村博紀九州大学, 大学院・システム情報科学研究院, 助教授 (20222763)