• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Speedup of Text Database by Data Compression

Research Project

Project/Area Number 07558159
Research Category

Grant-in-Aid for Scientific Research (A)

Allocation TypeSingle-year Grants
Section展開研究
Research Field 計算機科学
Research InstitutionKyushu Institute of Technology

Principal Investigator

SHINOHARA Takeshi  Kyushu Institute of Technology, Department of Atrificial Intelligence, Professor, 情報工学部, 教授 (60154225)

Co-Investigator(Kenkyū-buntansha) FUKAMACHI Shuichi  Kyushu Institute of Technology, Department of Atrificial Intelligence, Research, 情報工学部, 助手 (30274559)
SHIMOZONO Shinichi  Kyushu Institute of Technology, Department of Atrificial Intelligence, Associate, 情報工学部, 助教授 (70243988)
ISHIZAKA Hiroki  Kyushu Institute of Technology, Department of Atrificial Intelligence, Associate, 情報工学部, 助教授 (70260726)
杉本 典子  九州工業大学, 情報工学部, 教務職員 (80271120)
有村 博紀  九州工業大学, 情報工学部, 助教授 (20222763)
Project Period (FY) 1995 – 1997
Project Status Completed (Fiscal Year 1997)
Budget Amount *help
¥3,700,000 (Direct Cost: ¥3,700,000)
Fiscal Year 1997: ¥1,300,000 (Direct Cost: ¥1,300,000)
Fiscal Year 1996: ¥2,400,000 (Direct Cost: ¥2,400,000)
KeywordsInformation Retrieval / Sequential Pattern Matching / Data Compression / Text Database
Research Abstract

The objective of this research is to establish a speedup method for sequential pattern matching by data compression and demonstrate its availability in text database.
We design a pattern matching machine for compressed data by Huffman codes without decoding. In the experiment on this algorithm, although the effect of this method depends on the characteristics of data, the text size and the response time of searching are reduced to 60% and 70%, respectively, for English text.
We also design a similar technique for new compression scheme, called Byte-Pair-Encoding (BPE,for short). This technique compresses English text to around 50% and reduces search time to 60%. BPE is basically a fixed length code, and therefore compresses text by BPE is efficiently distributed to processors in parallel environment.

Report

(4 results)
  • 1997 Annual Research Report   Final Research Report Summary
  • 1996 Annual Research Report
  • 1995 Annual Research Report
  • Research Products

    (29 results)

All Other

All Publications (29 results)

  • [Publications] 宮崎 哲司: "圧縮された日本語テキストのためのパターン照合機械の設計" 情報処理学会第51回全国大会講演論文集. 4. 239-240 (1995)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1997 Final Research Report Summary
  • [Publications] 深町 修一: "文字列パターン照合のための損失のあるデータ圧縮" 電子情報通信学会技術研究報告. 95. 41-48 (1995)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1997 Final Research Report Summary
  • [Publications] Michiyo Yamaguchi: "Finding minimal generalization over regular patterns with alphabet indexing" Proc.the 7th Workshop on Genome Informatics. 51-60 (1996)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1997 Final Research Report Summary
  • [Publications] Takeshi Shinohara: "Inductive inference of unbounded unions of pattern languages from positive data" Proc the 7th International Workshop on Algorithmic Learning Theory(Lecture Notes in Artificial Intelligence). 1160. 256-271 (1996)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1997 Final Research Report Summary
  • [Publications] Naoyuki Harada: "A class of elementary formal systems that has an efficient parsing algorithm" The 7th European-Japanese Conference on Information Modelling and Knowledge Bases. 89-101 (1997)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1997 Final Research Report Summary
  • [Publications] Hiroki Arimura: "Learning unions of tree patterns using queries" Theoretical Computer Science(Netherlands). 185. 47-62 (1997)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1997 Final Research Report Summary
  • [Publications] K.P.Jantke, T.Shinohara, T.Zeugmann(Eds.): "Algorithmic Learning Theory" Springer-Verlag, 319 (1995)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1997 Final Research Report Summary
  • [Publications] H.Arimura, T.Shinohara: "Logical genetalization for learning with background knowledge" ICLP '95 post-Conference Workshop on Inductive Logic Programming. IA-TR-95-03. 1-11 (1995)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1997 Final Research Report Summary
  • [Publications] T.Shinohara, H.Arimura: "Inductive inference of unbounded unions of pattern languages from positive data" Proc.the 7th International Workshop on Algorithmic Learning Theory (Lecture Notes in Artificial Intelligennce 1160). 256-271 (1996)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1997 Final Research Report Summary
  • [Publications] M.Yamaguchi, S.Shimozono, T.Shinohara: "Finding minimal generalization over regular patterns with alphabet indexing" Proc.the 7th-Workshop on Genome Informatics. 51-60 (1996)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1997 Final Research Report Summary
  • [Publications] S.Matsumoto, A.Shinohara, H.Arimura, T.Shinohara: "Learning subsequence languages" Information Modelling and Knowledge Bases, VIII,IOS Press. 335-344 (1997)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1997 Final Research Report Summary
  • [Publications] N,Harada, S.Arikawa, H.Ishizaka: "A class of elementary formal systems that has an efficient parsing algorithm" Information Modelling and knowledge Bases, VIII,IOS Press. 89-101 (1997)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1997 Final Research Report Summary
  • [Publications] H.Arimura, H.Ishizaka, T.Shinohara: "Learning unions of tree patterns using queries" Theretical Computer Science (Netherlands). 47-62 (1997)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1997 Final Research Report Summary
  • [Publications] K.P.Jantke, T.Shinohara, T.Zeugmann (Eds.): Algorithmic Learning Theory. (Lecuture Notes in Artificial Intelligence 997), Springer-Verlag, 319 (1995)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1997 Final Research Report Summary
  • [Publications] H.Arimura,H.Ishizaka,T.Shinohara: "Learning unions of tree patterns using queries" Theoretical Computer Science(Netherlands). 185. 47-62 (1997)

    • Related Report
      1997 Annual Research Report
  • [Publications] 有村博紀,渡木厚,下薗真一: "Maximum agreement problem for word association patterns" 電子情報通信学会コンピュテーション研究会. 92-102 (1997)

    • Related Report
      1997 Annual Research Report
  • [Publications] 林,石坂,篠原歩: "局面検索方式将棋棋譜データベースの開発" 平成9年度電気関係学会九州支部連合大会. (1997)

    • Related Report
      1997 Annual Research Report
  • [Publications] 宮崎 哲司,深町 修一,篠原,武: "マルコフモデルを用いた圧縮データのための文字列パターン照合" 情報基礎論ワークショップ(LAシンポジウム). (1996)

    • Related Report
      1996 Annual Research Report
  • [Publications] 遠里 由佳子,有村 博紀,篠原 武: "概念階層を持つパターン言語の学習可能性" 情報基礎論ワークショップ(LAシンポジウム). (1996)

    • Related Report
      1996 Annual Research Report
  • [Publications] K.Hirata,S.Shimozono,A.Shinohara: "On the hardness of approximating the minimum consistent OBDD problem" Lecture Notes in Computer Science. 1097. 112-123 (1996)

    • Related Report
      1996 Annual Research Report
  • [Publications] N.Sugimoto,K.Hirata,H.Ishizaka: "Constructive learning of translations based on dictionaries" Proc.the 7th International Workshop on Algorithmic Learning Theory (Lecture Notes in Artificial Intelligence). 1160. 177-184 (1996)

    • Related Report
      1996 Annual Research Report
  • [Publications] T.Shinohara,H.Arimura: "Inductive inference of unbounded unions of pattern languages from positive data" Proc.the 7th International Workshop on Algorithmic Learning Theory (Lecture Notes in Artifical Intelligence). 1160. 256-271 (1996)

    • Related Report
      1996 Annual Research Report
  • [Publications] M.Yamaguchi,S.Shimozono,T.Shinohara: "Finding minimal generalization over regular patterns with alphabet indexing" Proc.the 7th Workshop on Genome Informatics. 51-60 (1996)

    • Related Report
      1996 Annual Research Report
  • [Publications] 宮崎哲司: "圧縮された日本語テキストのためのパターン照合機械の設計" 情報処理学会第51回全国大会講演論文集. 4. 239-240 (1995)

    • Related Report
      1995 Annual Research Report
  • [Publications] 深町修一: "文字列パターン照合のための損失のあるデータ圧縮" 電子情報通信学会技術研究報告. 95. 41-48 (1995)

    • Related Report
      1995 Annual Research Report
  • [Publications] 篠原 武: "複数文字列パターンによる正例からのタンパク質モチーフの発見" 1995年度人工知能学会全国大会(第9回)講演論文集. 93-96 (1995)

    • Related Report
      1995 Annual Research Report
  • [Publications] 有村博紀: "木パターン言語の和の質問による学習" 1995年度人工知能学会全国大会(第9回)講演論文集. 73-76 (1995)

    • Related Report
      1995 Annual Research Report
  • [Publications] 山口美千代: "複数文字列パターンによるアミノ酸配列からのタンパク質モティーフの発見" 情報処理学会研究報告,情報学基礎. No.38. 33-40 (1995)

    • Related Report
      1995 Annual Research Report
  • [Publications] Hiroki Arimura: "Learning Unions of Tree Patterns Using Queries" Proc.the 6th International Workshop on Algorithmic Learning Theory (Lecture Notes in Artificial Intelligence 997,Springer-Verlag). 997. 66-79 (1995)

    • Related Report
      1995 Annual Research Report

URL: 

Published: 1996-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi