2007 年度実績報告書

大規模半構造データからの高速知識発見システムの開発

研究課題

研究課題/領域番号	17200011
研究機関	株式会社富士通研究所
研究代表者	岡本青史株式会社富士通研究所, ナレッジ研究センター, 主任研究員 (90399717)
研究分担者	竹田正幸九州大学, 大学院・システム情報科学研究院, 教授 (50216909) 篠原歩東北大学, 大学院・情報科学研究科, 教授 (00226151) 喜田拓也北海道大学, 大学院・情報科学研究科, 准教授 (70343316) 坂本比呂志九州工業大学, 情報工学部, 准教授 (50315123) 平田耕一九州工業大学, 情報工学部, 准教授 (20274558)
キーワード	半構造データ / XML / 知識発見 / パターン発見 / パターン照合 / データ圧縮
研究概要	【半構造データ処理基盤】XMLデータストリームを対象とした研究では、XPath対応を強化し、XPath質問式を高速に評価するアルゴリズムを開発した。テキスト圧縮による文字列照合の高速化の研究については、圧縮率及び照合速度をさらに向上させるアルゴリズムの開発に成功した。また、圧縮文字列中の最長共通部分文字列と、すべての回文を高速に発見するアルゴリズムも開発した。半構造データ処理のための効率のよい索引構造の研究では、昨年度までの研究成果を用いてWeb文書からのランタイムかつ精度の高いキーワード抽出機構を構築した。また、有向グラフに対するラベル付け問題に対して、前処理時間、応答時間および領域計算量を削減した効率的な索引付けを提案し、実験によってその有効性を示した。さらに、このラベル付けアルゴリズムを応用し、グラフ上の距離を高速に計算する手法を提案した。木の類似性発見の理論的研究では、高速な無順浮木に適用可能な木カーネルとして、二葉木カーネルを設計した。これは葉が高々2つであるような木の頻度を数え上げることで計算することができる。【半構造データからのパターン発見】時系列データを対象とした研究では、エピソードマイニングの研究を深化させ、細菌感受性検査データに適用することでその効果を検証すると共に、直列エピソードの情報だけから構成できるエピソードと非並列エピソードが等価になることを理論的に示した。また、述語を組み合わせた複雑な時系列パターンの効率よい照合アルゴリズムを開発した。 Webデータからの知識発見への応用としては、パターン発見基盤技術を援用したスパム検出に取り組んだ。我々は、文字列の「異質性」を定量化し、異種性の計算に必要な文字列上の同値関係に基づく同値類の効率的計算法を提案することでスパム検出の開発に成功した。

研究成果
(24件)

すべて 2008 2007

すべて雑誌論文 (20件) (うち査読あり 20件) 学会発表 (4件)

[雑誌論文] プロパティ接尾辞木のオフライン線形時間構築アルゴリズム2008
- 著者名/発表者名
  上村卓史, 他2名
- 雑誌名
  
  電子情報通信学会論文誌D-1 Vol.J91-D, No.3
  
  ページ: 595-607
- 査読あり
[雑誌論文] Mining Maximal Flexible Patterns in a Sequence2008
- 著者名/発表者名
  Hiroki Arimura, 他1名
- 雑誌名
  
  Lecture Notes in Artificial Intelligence(Post Proceedings of 5th Workshop on Learning with Logics and Logics for Learning) 4914(To appear)
- 査読あり
[雑誌論文] An Adaptive Algorithm for Splitting Large Sets of Strings and Its Application to Efficient External Sorting2008
- 著者名/発表者名
  Tatsuya Asai, 他2名
- 雑誌名
  
  Working Notes of Workshops on Algorithms for Large-Scale Information Processing in Knowledge Discovery, et. al.
  
  ページ: 17-28
- 査読あり
[雑誌論文] Improving Named Entity Extraction Accuracy Using Unlabeled Data and Several Extractors2008
- 著者名/発表者名
  Tomoya Iwakura, 他1名
- 雑誌名
  
  Proc. the 8th International Conference on Intelligent Text Processing and Computational Linguistics (To appear)
- 査読あり
[雑誌論文] A Simple Characterization on Serially Constructible Episodes2008
- 著者名/発表者名
  Takashi Katoh, 他1名
- 雑誌名
  
  Lecture Notes in Artificial Intelligence(Proc. the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining) (To appear)
- 査読あり
[雑誌論文] An Efficient Unordered Tree Kernel and Its Application to Glycan Classification2008
- 著者名/発表者名
  Tetsuji Kuboyama, 他2名
- 雑誌名
  
  Lecture Notes in Artificial Intelligence(Proc. the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining) (To appear)
- 査読あり
[雑誌論文] Computing Longest Common Substring and All Palindromes from Compressed Strings2008
- 著者名/発表者名
  Wataru Matsubara, 他5名
- 雑誌名
  
  Lecture Notes in Computer Science(Proc. the 34th International Conference on Current Trends in Theory and Practice of Computer Science) 4910
  
  ページ: 364-375
- 査読あり
[雑誌論文] 漸増的なパストライ構築に基づく高速・軽量XML文書フィルタリング2007
- 著者名/発表者名
  萩尾一仁, 他3名
- 雑誌名
  
  DBSJ Letters 6(2)
  
  ページ: 5-8
- 説明
  「研究成果報告書概要(和文)」より
- 査読あり
[雑誌論文] Efficient Schema Matching Algorithm Based on Pre-Checking2007
- 著者名/発表者名
  Kengo Kubo, 他3名
- 雑誌名
  
  Systems and Computers in Japan 38
  
  ページ: 143-151
- 説明
  「研究成果報告書概要(和文)」より
- 査読あり
[雑誌論文] 高速な到達可能性判定のための規模耐性の高い索引付け2007
- 著者名/発表者名
  中村有作, 他2名
- 雑誌名
  
  DBSJ Letters 6(1)
  
  ページ: 77-80
- 説明
  「研究成果報告書概要(和文)」より
- 査読あり
[雑誌論文] Time and Space Efficient Discovery of Maximal Geometric Graphs.2007
- 著者名/発表者名
  Hiroki Arimura, 他2名
- 雑誌名
  
  Lecture Notes in Artificial Intelligence(Proc. 10th International Conference on Discovery Science) 4755
  
  ページ: 42-55
- 査読あり
[雑誌論文] An Assistant Tool for Concealing Personal Information in Text2007
- 著者名/発表者名
  Tomoya Iwakura, 他1名
- 雑誌名
  
  Lecture Notes in Computer Science(Proc. the 12th International Conference on Human-Computer Interaction) 4558
  
  ページ: 38-46
- 査読あり
[雑誌論文] Fast Training Methods of Boosting Algorithms for Text Analysis2007
- 著者名/発表者名
  Tomoya Iwakura, 他1名
- 雑誌名
  
  Proc. the International Conference on Recent Advances in Natural Language Processing
  
  ページ: 274-279
- 査読あり
[雑誌論文] An Assistant Interface for Finding Query-Related Proper Nouns2007
- 著者名/発表者名
  Tomoya Iwakura, 他2名
- 雑誌名
  
  Lecture Notes in Computer Science(Proc. the 11th International Conference on Knowledge-Based Intelligent Information and Engineering Systems) 4693
  
  ページ: 1238-1245
- 査読あり
[雑誌論文] Mining Frequent Elliptic Episodes from Event Sequence2007
- 著者名/発表者名
  Takashi Katoh, 他1名
- 雑誌名
  
  Proc. the 5th Workshop on Learning with Logic and Logics for Learning
  
  ページ: 46-52
- 査読あり
[雑誌論文] Mining Frequent Diamond Episodes from Event Sequences2007
- 著者名/発表者名
  Takashi Katoh, 他2名
- 雑誌名
  
  Lecture Notes in Artificial Intelligence(Proc. the 4th International Conference on Modeling Decisions for Artificial Intelligence) 4617
  
  ページ: 477-488
- 査読あり
[雑誌論文] Extraction of Sectorial Episodes Representing Changes for Drug Resistance and Replacements of Bacteria2007
- 著者名/発表者名
  Takashi Katoh, 他4名
- 雑誌名
  
  Proc. the IEEE/ICME International Conference on Complex Medical Engineering
  
  ページ: 304-309
- 査読あり
[雑誌論文] Reducing Trials by Thinning-Out in Skill Discovery2007
- 著者名/発表者名
  Hayato Kobayashi, 他3名
- 雑誌名
  
  Lecture Notes in Computer Science(Proc. the 10th International Conference on Discovery Science) 4755
  
  ページ: 127-138
- 査読あり
[雑誌論文] A Minimal Acyclic Generalization with Tractable Removal of Redundancy2007
- 著者名/発表者名
  Megumi Kuwabara, 他2名
- 雑誌名
  
  Proc. the 5th Workshop on Learning with Logic and Logics for Learning
  
  ページ: 25-31
- 査読あり
[雑誌論文] Unsupervised Spam Detection Based on String Alienness Measures2007
- 著者名/発表者名
  Kazuyuki Narisawa, 他3名
- 雑誌名
  
  Lecture Notes in Computer Science(Proc. the 10th International Conference on Discovery Science) 4755
  
  ページ: 161-172
- 査読あり
[学会発表] 有向グラフ上の最短距離の効率的な計算2008
- 著者名/発表者名
  原口新平, 他2名
- 学会等名
  第19回データ工学ワークショップ(DEWS 2008)
- 発表場所
  フェニックス・シーガイア・リゾート(宮崎県)
- 年月日
  2008-03-11
- 説明
  「研究成果報告書概要(和文)」より
[学会発表] ウェブ閲覧における効率的なキーワード抽出とその利用2007
- 著者名/発表者名
  上村卓史, 他2名
- 学会等名
  データベースとWeb情報システムに関するシンポジウム(DBWeb 2007)
- 発表場所
  東京大学生産技術研究所
- 年月日
  2007-11-28
- 説明
  「研究成果報告書概要(和文)」より
[学会発表] ビット並列手法に基づく大規模連続ストリームパターン照合2007
- 著者名/発表者名
  斉藤智哉, 他2名
- 学会等名
  第6回情報科学技術フォーラム(FIT2007)
- 発表場所
  中京大学豊田キャンパス
- 年月日
  2007-09-05
- 説明
  「研究成果報告書概要(和文)」より
[学会発表] 圧縮アルゴリズムLCA法の改良と実験による評価2007
- 著者名/発表者名
  丸山史郎, 他1名
- 学会等名
  電子情報通信学会コンピュテーション研究会
- 発表場所
  京都大学(桂キャンパス)
- 年月日
  2007-04-26
- 説明
  「研究成果報告書概要(和文)」より

2007 年度 実績報告書

大規模半構造データからの高速知識発見システムの開発

研究代表者

岡本 青史 株式会社富士通研究所, ナレッジ研究センター, 主任研究員 (90399717)

研究成果

[雑誌論文] プロパティ接尾辞木のオフライン線形時間構築アルゴリズム2008

著者名/発表者名

雑誌名

[雑誌論文] Mining Maximal Flexible Patterns in a Sequence2008

著者名/発表者名

雑誌名

[雑誌論文] An Adaptive Algorithm for Splitting Large Sets of Strings and Its Application to Efficient External Sorting2008

著者名/発表者名

雑誌名

[雑誌論文] Improving Named Entity Extraction Accuracy Using Unlabeled Data and Several Extractors2008

著者名/発表者名

雑誌名

[雑誌論文] A Simple Characterization on Serially Constructible Episodes2008

著者名/発表者名

雑誌名

[雑誌論文] An Efficient Unordered Tree Kernel and Its Application to Glycan Classification2008

著者名/発表者名

雑誌名

[雑誌論文] Computing Longest Common Substring and All Palindromes from Compressed Strings2008

著者名/発表者名

雑誌名

[雑誌論文] 漸増的なパストライ構築に基づく高速・軽量XML文書フィルタリング2007

著者名/発表者名

雑誌名

説明

[雑誌論文] Efficient Schema Matching Algorithm Based on Pre-Checking2007

著者名/発表者名

雑誌名

説明

[雑誌論文] 高速な到達可能性判定のための規模耐性の高い索引付け2007

著者名/発表者名

雑誌名

説明

[雑誌論文] Time and Space Efficient Discovery of Maximal Geometric Graphs.2007

著者名/発表者名

雑誌名

[雑誌論文] An Assistant Tool for Concealing Personal Information in Text2007

著者名/発表者名

雑誌名

[雑誌論文] Fast Training Methods of Boosting Algorithms for Text Analysis2007

著者名/発表者名

雑誌名

[雑誌論文] An Assistant Interface for Finding Query-Related Proper Nouns2007

著者名/発表者名

雑誌名

[雑誌論文] Mining Frequent Elliptic Episodes from Event Sequence2007

著者名/発表者名

雑誌名

[雑誌論文] Mining Frequent Diamond Episodes from Event Sequences2007

著者名/発表者名

雑誌名

[雑誌論文] Extraction of Sectorial Episodes Representing Changes for Drug Resistance and Replacements of Bacteria2007

著者名/発表者名

雑誌名

[雑誌論文] Reducing Trials by Thinning-Out in Skill Discovery2007

著者名/発表者名

雑誌名

[雑誌論文] A Minimal Acyclic Generalization with Tractable Removal of Redundancy2007

著者名/発表者名

雑誌名

[雑誌論文] Unsupervised Spam Detection Based on String Alienness Measures2007

著者名/発表者名

雑誌名

[学会発表] 有向グラフ上の最短距離の効率的な計算2008

著者名/発表者名

学会等名

発表場所

年月日

説明

[学会発表] ウェブ閲覧における効率的なキーワード抽出とその利用2007

著者名/発表者名

学会等名

発表場所

年月日

説明

2007 年度実績報告書

岡本青史株式会社富士通研究所, ナレッジ研究センター, 主任研究員 (90399717)