• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

機械学習手法による半構造データマイニングと情報抽出

Research Project

Project/Area Number 16016275
Research Category

Grant-in-Aid for Scientific Research on Priority Areas

Allocation TypeSingle-year Grants
Review Section Science and Engineering
Research InstitutionHiroshima City University

Principal Investigator

宮原 哲浩  広島市立大学, 情報科学部, 助教授 (90209932)

Co-Investigator(Kenkyū-buntansha) 内田 智之  広島市立大学, 情報科学部, 助教授 (70264934)
平田 耕一  九州工業大学, 情報工学部, 助教授 (20274558)
久保山 哲二  東京大学, 国際産学共同研究センター, 助手 (80302660)
Project Period (FY) 2004 – 2005
Project Status Completed (Fiscal Year 2005)
Budget Amount *help
¥5,500,000 (Direct Cost: ¥5,500,000)
Fiscal Year 2005: ¥2,300,000 (Direct Cost: ¥2,300,000)
Fiscal Year 2004: ¥3,200,000 (Direct Cost: ¥3,200,000)
Keywords機械学習 / 半構造データマイニング / 情報抽出 / 木構造パターン / タグ木パターン
Research Abstract

本研究課題では,機械学習手法による半構造データマイニングと情報抽出について研究を行い,本年度は次の成果を得た.
Web空間の爆発的な発展に伴い,インターネット上の大規模分散コンテンツを流通・提供・活用するための手法やシステムの研究開発が求められている.機械学習技術を適用することは,人に優しい情報通信技術やフレンドリーな情報処理システムの開発に有効である.本研究課題では,HTML/XMLファイルなどのWeb文書が半構造性を持つことに注目し,機械学習技術を活用して,半構造データからのデータマイニング技術を発展させ,Webデータから有用なコンテンツを発見する手法,すなわち情報抽出技術を確立することを目的とする.
均質でない半構造文書からの情報抽出に応用するため,半構造文書に共通する構造的特徴を表現する木構造パターンである,高さ制約変数付きの極大頻出タグ木パターンを発見するアルゴリズムを与えた.電気図面などTTSPグラフでモデル化できる半構造データに共通する構造パターンを表現するTTSP項グラフを多項式時間で帰納推論する学習アルゴリズムを与えた.半構造データを効率よく比較・照合するための様々な手法を統一的に記述できる一般的なフレームワークを与え,従来知られていなかった編集距離に基づく木の近似照合クラス間の関係を明らかにした.複数の半構造データを統合するため,近似照合から2つの木を結合する効率的なアルゴリズムを提案した.半構造データのフィルタリングやクラスタリングを行うために,半構造データからの局所位相情報に基づく距離を考案し,これを高速に計算する手法を開発した.

Report

(2 results)
  • 2005 Annual Research Report
  • 2004 Annual Research Report
  • Research Products

    (18 results)

All 2006 2005 2004

All Journal Article (18 results)

  • [Journal Article] On Generalization and Subsumption for Ordered Clauses2006

    • Author(s)
      Megumi Kuwabara et al.
    • Journal Title

      Proc.19th Annual Conferences of the Japanese Society for Artificial Intelligence, Lecture Notes in Artificial Intelligence

    • Related Report
      2005 Annual Research Report
  • [Journal Article] Discovery o Maximally Frequent Tag Tree Patterns with Height-Constrained Variables from Semistructured Web Documents2005

    • Author(s)
      Yusuke Suzuki et al.
    • Journal Title

      Proc.International Workshop on Challenges in Web Information Retrieval and Integration (WIRI 2005), IEEE Computer Society

      Pages: 104-112

    • Related Report
      2005 Annual Research Report
  • [Journal Article] Polynomial Time Inductive Inference of TTSP Graph Languages from Positive Data2005

    • Author(s)
      Ryoji Takami et al.
    • Journal Title

      Proc.ILP 2005, Lecture Notes in Artificial Intelligence (Springer-Verlag) 3625

      Pages: 366-383

    • Related Report
      2005 Annual Research Report
  • [Journal Article] The q-Gram Distance for Ordered Unlabeled Tree2005

    • Author(s)
      Nobuhito Ohkura et al.
    • Journal Title

      Proc.DS 2005, Lecture Notes in Artificial Intelligence (Springer-Verlag) 3735

      Pages: 189-202

    • Related Report
      2005 Annual Research Report
  • [Journal Article] On Finding Acyclis Subhypergrahs2005

    • Author(s)
      Kouichi Hirata et al.
    • Journal Title

      Proc.FCT 2005, Lecture Notes in Computer Science (Springer-Verlag) 3623

      Pages: 491-503

    • Related Report
      2005 Annual Research Report
  • [Journal Article] Extraction of Frequent Few-Overlapped Monotone DNF Formulas with Depth-First Pruning2005

    • Author(s)
      Yoshikazu Shima et al.
    • Journal Title

      Proc.PAKDD 2005, Lecture Notes in Artificial Intelligence (Springer-Verlag) 3518

      Pages: 50-60

    • Related Report
      2005 Annual Research Report
  • [Journal Article] Disjunctive Rules Extracted from MRSA Data with Verification2005

    • Author(s)
      Kouichi Hirata et al.
    • Journal Title

      Proc.1st International Conference on Complex Medical Engineering (CME 2005)

      Pages: 326-330

    • Related Report
      2005 Annual Research Report
  • [Journal Article] Extracting Disjunctive Closed Rules from MRSA Data2005

    • Author(s)
      Yoshikazu Shima et al.
    • Journal Title

      Proc.1st International Conference on Complex Medical Engineering (CME 2005)

      Pages: 321-325

    • Related Report
      2005 Annual Research Report
  • [Journal Article] A Theoretical Analysis of Tree Edit Distance Measures2005

    • Author(s)
      Tetsuji Kuboyama et al.
    • Journal Title

      情報処理学会論文誌 数理モデル化と応用(TOM) Vol.46,No.SIG17

      Pages: 31-45

    • NAID

      130000058410

    • Related Report
      2005 Annual Research Report
  • [Journal Article] A Theoretical Analysis of Alignment and Edit Problems for Trees2005

    • Author(s)
      Tetsuji Kuboyama et al.
    • Journal Title

      Proc.ICTCS 2005, Lecture Notes in Computer Science (Springer-Verlag) 3701

      Pages: 323-337

    • Related Report
      2005 Annual Research Report
  • [Journal Article] Tractable and Intractable Second-Order Matching Problems2004

    • Author(s)
      Kouichi Hirata
    • Journal Title

      Journal of Symbolic Computation Vol.37,No.5

      Pages: 611-628

    • NAID

      120002440590

    • Related Report
      2004 Annual Research Report
  • [Journal Article] Discovery of Maximally Frequent Tag Tree Patterns with Contractible Variables from Semistructured Documents2004

    • Author(s)
      Tetsuhiro Miyahara
    • Journal Title

      Proc.PAKDD 2004, Lecture Notes in Artificial Intelligence, Springer-Verlag 3056

      Pages: 133-144

    • Related Report
      2004 Annual Research Report
  • [Journal Article] Polynomial Time Inductive Inference of Ordered Tree Languages with Height-Constrained Variables from Positive Data2004

    • Author(s)
      Yusuke Suzuki
    • Journal Title

      Proc.PRICAI 2004, Lecture Notes in Artificial Intelligence, Springer-Verlag 3157

      Pages: 211-220

    • Related Report
      2004 Annual Research Report
  • [Journal Article] Automatic Wrapper Generation for Metasearch using Ordered Tree Structured Patterns2004

    • Author(s)
      Kazuhide Aikou
    • Journal Title

      Proc.AI 2004, Lecture Notes in Artificial Intelligence, Springer-Verlag 3339

      Pages: 1030-1035

    • Related Report
      2004 Annual Research Report
  • [Journal Article] Finding Frequent Structural Features among Words in Tree-Structured Documents2004

    • Author(s)
      Tomoyuki Uchida
    • Journal Title

      Proc.PAKDD 2004, Lecture Notes in Artificial Intelligence, Springer-Verlag 3056

      Pages: 351-350

    • Related Report
      2004 Annual Research Report
  • [Journal Article] Extracting Multiple Layers from Data Having Graph Structures2004

    • Author(s)
      Yuko Itokawa
    • Journal Title

      Proc.2nd Asian Symposium on Geographic Information Systems from-Computer Science & Engineering View (ASGIS 2004)

      Pages: 283-291

    • Related Report
      2004 Annual Research Report
  • [Journal Article] Generalization Algorithms for Second-Order Terms2004

    • Author(s)
      Kouichi Hirata
    • Journal Title

      Proc.ILP 2004, Lecture Notes in Artificial Intelligence, Springer-Verlag 3194

      Pages: 147-163

    • Related Report
      2004 Annual Research Report
  • [Journal Article] Extracting Minimal and Closed Monotone DNF Formulas2004

    • Author(s)
      Yoshikazu Shima
    • Journal Title

      Proc.DS 2004, Lecture Notes in Artificial Intelligence, Springer-Verlag 3245

      Pages: 298-305

    • Related Report
      2004 Annual Research Report

URL: 

Published: 2004-04-01   Modified: 2018-03-28  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi