1996 Fiscal Year Annual Research Report

情報圧縮によるテキストデータベースの高速化

Research Project

Project/Area Number	07558159
Section	試験
Research Institution	Kyushu Institute of Technology
Principal Investigator	篠原武九州工業大学, 情報工学部, 教授 (60154225)
Co-Investigator(Kenkyū-buntansha)	杉本典子九州工業大学, 情報工学部, 教務職員 (80271120) 深町修一九州工業大学, 情報工学部, 助手 (30274559) 下薗真一九州工業大学, 情報工学部, 助教授 (70243988) 石坂裕毅九州工業大学, 情報工学部, 助教授 (70260726)
Keywords	情報検索 / 逐次パターン照合 / 情報圧縮 / テキストデータベース
Research Abstract	本研究の目的は,情報圧縮による逐次パターン照合処理の高速化技法を確立するとともに,そのテキストデータベースにおける有効性を実証することにある. 逐次処理の遅さの主な原因として,データの転送コストが考えられる.このコストを軽減するためには,情報圧縮の技術を用い,圧縮したデータを復号することなく探索する手法が有効である. 本研究では,テキストデータの標本として, ・遺伝子情報データ・図書館データ・英文テキストデータの3種のものを取り扱うこととしている.平成8年度の研究では,主として日本語テキストを含む図書館データを対象にして,平成7年度に設計したアルゴリズムを実際の情報検索システムに組み込む実験を行った. 日本語テキストは,字種が多いため符号の複雑さやパターン照合アルゴリズムに必要なメモリ量を減らす工夫が必要である.符号を単純化すると圧縮効率が下がるので,漢字・かな・英数字などの字種の生起特性を考慮して,単純だが効率のよい符号を設計し,さらにパターン照合機械へ組み込む符号中の冗長さを除去して必要なメモリ量を減らすアルゴリズムを開発した.これを実際の情報検索システムに組み込んで実験を行ったところ,検出数が比較的に少ない場合には高速化を確認できた.しかし,実際にはパターンの検出数は少なくともレコード数であるので,アルゴリズム単体のときほど高速化できないこともわかった.

Research Products
(6 results)

All Other

All Publications (6 results)

[Publications] 宮崎哲司,深町修一,篠原,武: "マルコフモデルを用いた圧縮データのための文字列パターン照合" 情報基礎論ワークショップ(LAシンポジウム). (1996)
[Publications] 遠里由佳子,有村博紀,篠原武: "概念階層を持つパターン言語の学習可能性" 情報基礎論ワークショップ(LAシンポジウム). (1996)
[Publications] K.Hirata,S.Shimozono,A.Shinohara: "On the hardness of approximating the minimum consistent OBDD problem" Lecture Notes in Computer Science. 1097. 112-123 (1996)
[Publications] N.Sugimoto,K.Hirata,H.Ishizaka: "Constructive learning of translations based on dictionaries" Proc.the 7th International Workshop on Algorithmic Learning Theory (Lecture Notes in Artificial Intelligence). 1160. 177-184 (1996)
[Publications] T.Shinohara,H.Arimura: "Inductive inference of unbounded unions of pattern languages from positive data" Proc.the 7th International Workshop on Algorithmic Learning Theory (Lecture Notes in Artifical Intelligence). 1160. 256-271 (1996)
[Publications] M.Yamaguchi,S.Shimozono,T.Shinohara: "Finding minimal generalization over regular patterns with alphabet indexing" Proc.the 7th Workshop on Genome Informatics. 51-60 (1996)

1996 Fiscal Year Annual Research Report

情報圧縮によるテキストデータベースの高速化

Principal Investigator

篠原 武 九州工業大学, 情報工学部, 教授 (60154225)

Research Products

[Publications] 宮崎 哲司,深町 修一,篠原,武: "マルコフモデルを用いた圧縮データのための文字列パターン照合" 情報基礎論ワークショップ(LAシンポジウム). (1996)

[Publications] 遠里 由佳子,有村 博紀,篠原 武: "概念階層を持つパターン言語の学習可能性" 情報基礎論ワークショップ(LAシンポジウム). (1996)

[Publications] K.Hirata,S.Shimozono,A.Shinohara: "On the hardness of approximating the minimum consistent OBDD problem" Lecture Notes in Computer Science. 1097. 112-123 (1996)

[Publications] N.Sugimoto,K.Hirata,H.Ishizaka: "Constructive learning of translations based on dictionaries" Proc.the 7th International Workshop on Algorithmic Learning Theory (Lecture Notes in Artificial Intelligence). 1160. 177-184 (1996)

[Publications] T.Shinohara,H.Arimura: "Inductive inference of unbounded unions of pattern languages from positive data" Proc.the 7th International Workshop on Algorithmic Learning Theory (Lecture Notes in Artifical Intelligence). 1160. 256-271 (1996)

[Publications] M.Yamaguchi,S.Shimozono,T.Shinohara: "Finding minimal generalization over regular patterns with alphabet indexing" Proc.the 7th Workshop on Genome Informatics. 51-60 (1996)

篠原武九州工業大学, 情報工学部, 教授 (60154225)

[Publications] 宮崎哲司,深町修一,篠原,武: "マルコフモデルを用いた圧縮データのための文字列パターン照合" 情報基礎論ワークショップ(LAシンポジウム). (1996)

[Publications] 遠里由佳子,有村博紀,篠原武: "概念階層を持つパターン言語の学習可能性" 情報基礎論ワークショップ(LAシンポジウム). (1996)