Development of Intelligent full text retrieval system based on data compression and fast string pattern matching algorithms

Research Project

Project/Area Number	13558029
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	展開研究
Research Field	計算機科学
Research Institution	Kyushu University
Principal Investigator	SHINOHARA Ayumi Kyushu University, Department of Informatics, Ass.Prof., 大学院・システム情報科学研究院, 助教授 (00226151)
Co-Investigator(Kenkyū-buntansha)	KIDA Takuya Kyushu University Library, lecturer, 附属図書館, 講師 (70343316) SAKAMOTO Hiroshi Kyushu Institute of Technology, Faculty of Computer Science and Systems Engineering, Ass.Prof., 情報工学部, 助教授 (50315123) TAKEDA Masayuki Kyushu University, Department of Informatics, Ass.Prof., 大学院・システム情報科学研究院, 助教授 (50216909) SHIMOZONO Shinichi Kyushu Institute of Technology, Faculty of Computer Science and Systems Engineering, Ass.Prof., 情報工学部, 助教授 (70243988)
Project Period (FY)	2001 – 2003
Project Status	Completed (Fiscal Year 2003)
Budget Amount *help	¥11,200,000 (Direct Cost: ¥11,200,000) Fiscal Year 2003: ¥2,600,000 (Direct Cost: ¥2,600,000) Fiscal Year 2002: ¥4,000,000 (Direct Cost: ¥4,000,000) Fiscal Year 2001: ¥4,600,000 (Direct Cost: ¥4,600,000)
Keywords	Pattern matching algorithm / Data compression / Full-text retrieval system / Knowledge discovery / Optimal pattern discovery / Suffix tree / Indexing structure / Machine learning
Research Abstract	Suffix trees and Directed Acyclic Word Graphs(DAWGs) are well-known data structures as efficient indexingstructures for strings. We focus on Compact Directed Acyclic Word Graphs(CDAWGs) which are more compact indexing structures, and showed online construction algorithms for them. We also showed an online construction algorithm for an indexing structure consists of every DAWGs for all prefixes of given strings, and proved a lower-bound of the number of states of subsequence automata accepting all subsequences of a given string. We then introduced a new implementation technique based on ternary trees for DAWGs, which balances space efficiency and search time for a large alphabet, such as Japanese texts. We proposed an inverse problem in which we infer an original string from a given unlabelled graph corresponding to the indexing structures of the string. We showed linear-time algorithms for DAWG, subsequence automata, and suffix arrays in this setting. Moreover, we succeeded to prove a tight upper-bound of the length of solutions of world equations containing one variable. Concerning with data compression, we showed a space, efficient algorithm which outputs a compact context-free grammar representing a given string, and proved its approximation ratio. We also showed a linear-time compression algorithm using longest first replacement heuristics. In order to find patterns from large database in reasonable time, we developed several algorithms for classes of generalized patterns. Especially, we proposed an efficient pattern discovery algorithm in which we allow small mismatches of the pattern with data, and verified that it is practical by a series of computational experiments.

Report

(4 results)

2003 Annual Research Report Final Research Report Summary
2002 Annual Research Report
2001 Annual Research Report

Research Products
(95 results)

All Other

All Publications (95 results)

[Publications] Zdenek Tronicek et al.: "The Size of Subsequence Automaton"Lecture Notes in Computer Science. 2857. 304-310 (2003)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Shunsuke Inenaga et al.: "Linear-Time Off-Line Text Compression by Longest First Substitution"Lecture Notes in Computer Science. 2857. 137-152 (2003)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Masayuki Takeda et al.: "Discovering Most Classificatory Patterns for Very Expressive Pattern Classes"Lecture Notes in Computer Science. 2843. 486-493 (2003)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Makoto Toyomasu et al.: "Developing Dynamic Gaits for Four Legged Robots"Proc.International Symposium on Information Science and Electrical Engineering 2003. 577-580 (2003)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Hideo Bannai et al.: "Inferring Strings from Graphs and Arrays"Lecture Notes in Computer Science. 2747. 208-217 (2003)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Kensuke Baba et al.: "On the Length of the Minimum Solution of Word Equations in One Variable"Lecture Notes in Computer Science. 2747. 189-197 (2003)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Satoru Miyamoto et al.: "Ternary Directed Acyclic Word Graphs"Lecture Notes in Computer Science. 2843. 486-493 (2003)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Hiroshi Sakamoto: "A Fully Linear-Time Approximation Algorithm for Grammar-Based Compression"Proc.14th Annual Symposium on Combinatorial Pattern Matching (CPM 2003). 348-360 (2003)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Kensuke Baba et al.: "A Note on Randomized Algorithm for String Matching with Mismatches"Nordic Journal of Computing. Vol.10. 2-10 (2003)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Takuya Kida et al.: "Collage system : A unifying framework for compressed pattern matching"Theoretical Computer Science. Vol.298. 253-272 (2003)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Masahiro Hirao et al.: "A practical algorithm to find the best subsequences patterns"Theoretical Computer Science. Vol.292. 465-479 (2003)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Hideo Bannai et al.: "A String Pattern Regression Algorithm and Its Application to Pattern Discovery in Long Introns"In Genome Informatics (GIW2002). Vol.13. 3-11 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Shunsuke Inenaga et al.: "Discovering Best Variable-Length-Don't-Care Patterns"Lecture Notes in Computer Science. 2534. 86-97 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Kensuke Baba et al.: "A Note on Randomized Algorithm for String Matching with Mismatches"Proc.The Prague Stringology Conference '02 (PSC'02). 29-30 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Shunsuke Inenaga et al.: "Compact Directed Acyclic Word Graphs for a Sliding Window"Lecture Notes in Computer Science. 2476. 310-324 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Masayuki Takeda et al.: "Processing Text Files as Is : Pattern Matching over Compressed Texts, Multi-Byte Character Texts, and Semi-Structured Texts"Lecture Notes in Computer Science. 2476. 170-186 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Shunsuke Inenaga et al.: "Space-Economical Construction of Index Structures for All Suffixes of a String"Lecture Notes in Computer Science. 2420. 341-352 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Shunsuke Inenaga et al.: "The Minimum DAWG for All Suffixes of a String and its Applications"Lecture Notes in Computer Science. 2373. 153-167 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Ayumi Shinohara et al.: "Finding Best Patterns Practically"Lecture Notes in Artificial Intelligence. 2281. 307-317 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Hideo Bannai et al.: "More Speed and More Pattern Variations for Knowledge Discovery System BONSAI"In Genome Informatics (GIW2001). Vol.12. 454-455 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Hideaki Hori et al.: "Fragmentary Pattern Matching : Complexity, Algorithms and Applications for Analyzing Classic Literary Works"Lecture Notes in Computer Science. 2223. 719-730 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Koichiro Yamamoto et al.: "Discovering repetitive expressions and affinities from anthologies of classical Japanese poems"Lecture Notes in Artificial Intelligence. 2226. 416-428 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Masahiro Hirao et al.: "A practical algorithm to find the best episode patterns"Lecture Notes in Artificial Intelligence. 2226. 435-440 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] T.Kadota et al.: "Musical Sequence Comparison for Melodic and Rhythmic Similarities"Proc.8th Symposium on String Processing and Information Retrieval (SPIRE2001). 111-122 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Shunsuke Inenaga et al.: "On-Line Construction of Symmetric Compact Directed Acyclic Word Graphs"Proc.8th Symposium on String Processing and Information Retrieval (SPIRE2001). 96-110 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Shunsuke Inenaga et al.: "Construction of the CDAWG for a Trie"Proc.the Prague Stringology Conference '01 (PSC'01). 37-48 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Shunsuke Inenaga et al.: "On-Line Construction of Compact Directed Acyclic Word Graphs"Lecture Notes in Computer Science. 2089. 169-180 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Takuya Kida et al.: "multiple Pattern Matching Algorithms on Collage System"Lecture Notes in Computer Science. 2089. 193-206 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Hiroshi Sakamoto et al.: "Extracting Partial Structures from HTML Documents"Proc.14th International FLAIRS Conference : Knowledge Discovery and Data Mining. 264-268 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Katsuaki Taniguchi et al.: "Mining Semi-Structured Data by Path Expressions"Lecture Notes in Artificial Intelligence. 2226. 378-388 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Hiroki Arimura et al.: "Efficient Discovery of Proximity Patterns with Suffix Arrays"Lecture Notes in Computer Science. 2089. 152-156 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Zdenek Tronicek, Ayumi Shinohara: "The Size of Subsequence Automaton"Lecture Notes in Computer Science. 2857(SPIRE 2003). 304-310 (2003)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Shunsuke Inenaga, Takashi Funamotp, Masayuki Takeda, Ayumi Shinohara: "Linear-Time Off-Line Text Compression by Longest-First Substitution"Lecture Notes in Computer Science. 2857(SPIRE 2003). 137-152 (2003)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Masayuki Takeda, Shunsuke Inenaga, Hideo Bannai, Ayumi Shinohara, Setsuo Arikawa: "Discovering Most Classificatory Patterns for Very Expressive Pattern Glasses"Lecture Notes in Computer Science. 2843(DS 2003). 486-493 (2003)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Makoto Toyomasu, Ayumi Shinohara: "Developing Dynamic Gaits for Four Legged Robots"Proc.International Symposium on Information Science and Electrical Engineering. 2003. 577-580 (2003)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Hideo Bannai, Shunsuke Inenaga, Ayumi Shinohara, Masayuki Takeda: "Inferring Strings from Graphs and Arrays"Lecture Notes in Computer Science. 2747(MFCS2003). 208-217 (2003)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Kensuke Baba, Satoshi Tsuruta, Ayumi Shinohara, Masayuki Takeda: "On the Length of the Minimum Solution of Word Equations in One Variable"Lecture Notes in Computer Science. 2747(MFCS2003). 189-197 (2003)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Satoru Miyamoto, Shunsuke Inenaga, Masayuki Takeda, Ayumi Shinohara: "Ternary Directed Acyclic Word Graphs"Lecture Notes in Computer Science. 2843(CIAA2003). 486-493 (2003)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Hiroshi Sakamoto: "A Fully Linear-Time Approximation Algorithm for Grammar-Based Compression"Proc.14th Annual Symposium on Combinatorial Pattern Matching. (CPM 2003). 348-360 (2003)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Kensuke Babai, Ayumi Shinohara, Masayuki Takeda, Shunsuke Inenaga, Setsuo Arikawa: "A Note on Randomized Algorithm for String Matching with Mismatches"Nordic Journal of Computing. Vol.10. 2-10 (2003)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Takuya Kida, Tetsuya Matsumoto, Y.Shibata, Masayuki Takeda, Ayumi Shinohara, Setsuo Arikawa: "Collage system : A unifying framework for compressed pattern matching"Theoretical Computer Science. Vol.298, Isse 1. 253-272 (2003)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Masahiro Hirao, Hiromasa Hoshino, Ayumi Shinohara, Masayuki Takeda, Setsuo Arikawa: "A practical algorithm to find the best subsequences patterns"Theoretical Computer Science. Vol.292, Isse 2. 465-479 (2003)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Hideo Bannai, Shunsuke Inenaga, Ayumi Shinohara, Masayuki Takeda, Satoru Miyano: "A String Pattern Regression Algorithm and Its Application to Pattern Discovery in Long Introns"In Genome Informatics. Vol.13, (GIW2002). 3-11 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Shunsuke Inenaga, Hideo Bannai, Ayumi Shinohara, Masayuki Takeda, Setsuo Arikawa: "Discovering Best Variable-Length-Don't-Care Patterns"Lecture Notes in Computer Science. 2534(DS2002). 86-97 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Kensuke Baba, Ayumi Shinohara, Masayuki Takeda, Shunsuke Inenaga, Setsuo Arikawa: "A Note on Randomized Algorithm for String Matching with Mismatches"Proc.The Prague Stringology Conference '02. (PSC'02). 29-30 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Shunsuke Inenaga, Ayumi Shinohara, Masayuki Takeda, Setsuo Arikawa: "Compact Directed Acyclic Word Graphs for a Sliding Window"Lecture Notes in Computer Science. 2476(SPIRE2002). 310-324 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Masayuki Takeda, Satoru Miyamoto, Takuya Kida, Ayumi Shinohara, Shuichi Fukamachi, Takeshi Shinohara, Setsuo Arikawa: "Processing Text Files as Is : Pattern Matching over Compressed Texts, Multi-Byte Character Texts, and Semi-Structured Texts"Lecture Notes in Computer Science. 2476(SPIRE2002). 170-186 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Shunsuke Inenaga, Ayumi Shinohara, Masayuki Takeda, Hideo Bannai, Setsuo Arikawa: "Space-Economical Construction of Index Structures for All Suffixes of a String"Lecture Notes in Computer Science. 2420(MFCS2002). 341-352 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Shunsuke Inenaga, Masayuki Takeda, Ayumi Shinohara, Hiromasa Hoshino, Setsuo Arikawa: "The Minimum DAWG for All Suffixes of a String and its Applications"Lecture Notes in Computer Science. 2373(CPM2002). 153-167 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Ayumi Shinohara, Masayuki Takeda, Setsuo Arikawa, Masahiro Hirao, Hiromasa Hoshino, Shunsuke Inenaga: "Finding Best Patterns Practically"Lecture Notes in Artificial Intelligence(Final Report of the Japanese Discovery Science Project). 2281. 307-317 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Hideo Bannai, Keisuke Iida, Ayumi Shinohara, Masayuki Takeda, Satoru Miyano: "More Speed and More Pattern Variations for Knowledge Discovery System BONSA"In Genome Informatics. Vol.12(GIW2001). 454-455 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Hideaki Hori, Shinichi Shimozono, Masayuki Takeda, Ayumi Shinohara: "Fragmentary Pattern Matching : Complexity, Algorithms and Applications for Analyzing Classic Literary Works"Lecture Notes in Computer Science. 2223(ISAAC'01). 719-730 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Koichiro Yamamoto, Masayuki Takeda, Ayumi Shinohara, Tomoko Fukuda, Ichiro Nanri: "Discovering repetitive expressions and affinities from anthologies of classical Japanese poems"Lecture Notes in Artificial Intelligence. 2226(DS2001). 416-428 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Masahiro Hirao, Shunsuke Inenaga, Ayumi Shinohara, Masayuki Takeda, Setsuo Arikawa: "A practical algorithm to find the best episode patterns"Lecture Notes in Artificial Intelligence. 2226(DS2001). 435-440 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] T.Kadota, Masahiro Hirao, A.Ishino, Masayuki Takeda, Ayumi Shinohara, F.Matsuo: "Musical Sequence Comparison for Melodic and Rhythmic Similarities"Proc.8th Symposium on String Processing and Information Retrieval. (SPIRE2001). 111-122 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Shunsuke Inenaga, Hiromasa Hoshino, Ayumi Shinohara, Masayuki Takeda, Setsuo Arikawa, Giancarlo Mauri, Giulio Pavesi: "On-Line Construction of Symmetric Compact Directed Acyclic Word Graphs"Proc.8th Symposium on String Processing and Information Retrieval. (SPIRE2001). 96-110 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Shunsuke Inenaga, Hiromasa Hoshino, Ayumi Shinohara, Masayuki Takeda, Setsuo Arikawa: "Construction of the CDAWG for a Trie"Proc.the Prague Stringology Conference '01. (PSC'01). 37-48 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Shunsuke Inenaga, Hiromasa Hoshino, Ayumi Shinohara, Masayuki Takeda, Setsuo Arikawa: "On-Line Construction of Compact Directed Acyclic Word Graphs"Lecture Notes in Computer Science. 2089(CPM2001). 169-180 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Takuya Kida, Tetsuya Matsumoto, Masayuki Takeda, Ayumi Shinohara, Setsuo Arikawa: "Multiple Pattern Matching Algorithms on Collage System"Lecture Notes in Computer Science. 2089(CPM2001). 193-206 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Hiroshi Sakamoto, Hiroki Arimura, Setsuo Arikawa: "Extracting Partial Structures from HTML Documents"Proc.14th International FLAIRS Conference : Knowledge Discovery and Data Mining. 264-268 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Katsuaki Taniguchi, Hiroshi Sakamoto, Hiroki Arimura, Sinich Simozono, Setsuo Arikawa: "Mining Semi-Structured Data by Path Expressions^"Lecture Notes in Artificial Intelligence. 2226(DS2001). 378-388 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Hiroki Arimura, Hiroki Asaka, Hiroshi Sakamoto, Setsuo Arikawa: "Efficient Discovery of Proximity Patterns with Suffix Arrays"Lecture Notes in Computer Science. 2089(CPM2001). 152-156 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2003 Final Research Report Summary
[Publications] Kensuke Baba et al.: "A Note on Randomized Algorithm for String Matching with Mismatches"Nordic Journal of Computing. Vol.10. 2-10 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Takuya Kida et al.: "Collage system : A unifying framework for compressed pattern matching"Theoretical Computer Science. Vol.298. 253-272 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Masayuki Takeda et al.: "Discovering Most Classificatory Patterns for Very Expressive Pattern Classes"Lecture Notes in Computer Science. 2843. 486-493 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Masahiro Hirao et al.: "A practical algorithm to find the best subsequences patterns"Theoretical Computer Science. Vol.292. 465-479 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Zdenek Tronicek et al.: "The Size of Subsequence Automaton"Lecture Notes in Computer Science. 2857. 304-310 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Shunsuke Inenaga et al.: "Linear-Time Off-Line Text Compression by Longest-First Substitution"Lecture Notes in Computer Science. 2857. 137-152 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Hideo Bannai et al.: "Inferring Strings from Graphs and Arrays"Lecture Notes in Computer Science. 2747. 208-217 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Kensuke Baba et al.: "On the Length of the Minimum Solution of Word Equations in One Variable"Lecture Notes in Computer Science. 2747. 189-197 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Satoru Miyamoto et al.: "Ternary Directed Acyclic Word Graphs"Lecture Notes in Computer Science. 2759. 120-130 (2003)
- Related Report
  2003 Annual Research Report
[Publications] 喜田拓也: "VLDCパタンと文字列とのハミング距離を求めるアルゴリズム"情報科学技術フォーラム(FIT)2003. (A-062). 137-138 (2003)
- Related Report
  2003 Annual Research Report
[Publications] 喜田拓也: "誤りを許したVLDCパタン照合アルゴリズム"コンピュテーション研究会. (2004)
- Related Report
  2003 Annual Research Report
[Publications] Hiroshi Sakamoto: "A Fully Linear-Time Approximation Algorithm for Grammar-Based Compression"Proc. 14th Annual Symposium on Combinatorial Pattern Matching (CPM 2003). 348-360 (2003)
- Related Report
  2003 Annual Research Report
[Publications] 坂本比呂志: "最適データ圧縮のための省スペースな近似アルゴリズム"情報科学技術フォーラム(FIT)講演論文集. 29-30 (2003)
- Related Report
  2003 Annual Research Report
[Publications] 竹田正幸他: "ストリーム指向の高速XMLデータ処理技法について"データベースとWeb情報システムに関するシンポジウム(DBWeb2003). 26-27 (2003)
- Related Report
  2003 Annual Research Report
[Publications] 辻寿嗣他: "高速正則生垣パターン照合アルゴリズムに基づくXMLテキスト変換"電子情報通信学会「コンピュテーション」研究会. (2003)
- Related Report
  2003 Annual Research Report
[Publications] 菅智明他: "楽譜検索のための幾何点列の近似パタン照合"電子情報通信学会「コンピュテーション」研究会. (2004)
- Related Report
  2003 Annual Research Report
[Publications] Takuya Kida et al.: "Collage system : A unifying framework for compressed pattern matching"Theoretical Computer Science. (to appear). (2003)
- Related Report
  2002 Annual Research Report
[Publications] Y.Hayashi et al.: "Uniform characterization of polynomial-query learnabilities"Theoretical Computer Science. 292(2). 377-385 (2003)
- Related Report
  2002 Annual Research Report
[Publications] M.Hirao et al.: "A practical algorithm to find the best subsequences patterns"Theoretical Computer Science. 292(29). 465-479 (2003)
- Related Report
  2002 Annual Research Report
[Publications] H.Bannai et al.: "A String Pattern Regression Algorithm and Its Application to Pattern Discovery in Long Introns"Genome Informatics. 13. 3-11 (2002)
- Related Report
  2002 Annual Research Report
[Publications] S.Inenaga et al.: "Discovering Best Variable-Length-Don't Care Patterns"Lecture Notes in Artificial Intelligence. 2534. 86-97 (2002)
- Related Report
  2002 Annual Research Report
[Publications] K.Baba et al.: "A note on Randomized Algorithm for String Matching with Mismatches"Proc. The Prague Stringology Conference'02(PSC'02). 9-17 (2002)
- Related Report
  2002 Annual Research Report
[Publications] S.Inenaga et al.: "Compact Directed Acyclic Word Graphs for a Sliding Window"Lecture Notes in Computer Science. 2476. 310-324 (2002)
- Related Report
  2002 Annual Research Report
[Publications] M.Takeda et al.: "Processing Text Files as Is : Pattern Matching over Compressed Texts, Multi-Byte Character Texts, and Semi-Structured Texts"Lecture Notes in Computer Science. 2476. 170-186 (2002)
- Related Report
  2002 Annual Research Report
[Publications] S.Inenaga et al.: "Space-Economical Construction of Index Structures for All-Suffixes of a String"Lecture Notes in Computer Science. 2420. 341-352 (2002)
- Related Report
  2002 Annual Research Report
[Publications] S.Inenaga et al.: "The Minimum DAWG for All Suffixes of a String and Its Applications"Lecture Notes in Computer Science. 2373. 151-165 (2002)
- Related Report
  2002 Annual Research Report
[Publications] 竹田正幸他: "圧縮されたテキスト上のパターン照合-データ圧縮とパターン照合の新展開-"情報処理学会学会誌. 43-47. 763-769 (2002)
- Related Report
  2002 Annual Research Report
[Publications] H.Hori et al.: "Fragmentary Pattern Matching : Complexity, Algorithms and Applications for Analyzing Classic Literary Works"Proc. 12^<th> Annual International Symposium on Algorithms and Computation. 719-730 (2001)
- Related Report
  2001 Annual Research Report
[Publications] S.Inenaga et al.: "On-Line Construction of Symmetric Compact. Directed Acyclic Word Graphs"Proc. 8^<th> International Symposium on String Processing and Information Retrieval. 96-110 (2001)
- Related Report
  2001 Annual Research Report
[Publications] T.Kida et al.: "Multiple Pattern Matching Algorithms on Collage System"Lecture Notes in Computer Science. 2089. 193-206 (2001)
- Related Report
  2001 Annual Research Report
[Publications] K.Yamamoto et al.: "Discovering Repetitive Expressions and Affinities from Anthologies of Classical Japanese Poems"Lecture Notes in Artificial Intelligence. 2226. 413-425 (2001)
- Related Report
  2001 Annual Research Report
[Publications] H.Arimura et al.: "Efficient Learning of Semi-Structured Data from Queries"Lecture Notes in Artificial Intelligence. 2225. 315-331 (2001)
- Related Report
  2001 Annual Research Report
[Publications] K.Hirata et al.: "Prediction-Preserving Reducibility with Membership Queries on Formal Languages"Lecture Notes in Computer Science. 2138. 172-183 (2001)
- Related Report
  2001 Annual Research Report

Development of Intelligent full text retrieval system based on data compression and fast string pattern matching algorithms

Principal Investigator

SHINOHARA Ayumi Kyushu University, Department of Informatics, Ass.Prof., 大学院・システム情報科学研究院, 助教授 (00226151)

¥11,200,000 (Direct Cost: ¥11,200,000)

Report

Research Products

[Publications] Zdenek Tronicek et al.: "The Size of Subsequence Automaton"Lecture Notes in Computer Science. 2857. 304-310 (2003)

Description

Related Report

[Publications] Shunsuke Inenaga et al.: "Linear-Time Off-Line Text Compression by Longest First Substitution"Lecture Notes in Computer Science. 2857. 137-152 (2003)

Description

Related Report

[Publications] Masayuki Takeda et al.: "Discovering Most Classificatory Patterns for Very Expressive Pattern Classes"Lecture Notes in Computer Science. 2843. 486-493 (2003)

Description

Related Report

[Publications] Makoto Toyomasu et al.: "Developing Dynamic Gaits for Four Legged Robots"Proc.International Symposium on Information Science and Electrical Engineering 2003. 577-580 (2003)

Description

Related Report

[Publications] Hideo Bannai et al.: "Inferring Strings from Graphs and Arrays"Lecture Notes in Computer Science. 2747. 208-217 (2003)

Description

Related Report

[Publications] Kensuke Baba et al.: "On the Length of the Minimum Solution of Word Equations in One Variable"Lecture Notes in Computer Science. 2747. 189-197 (2003)

Description

Related Report

[Publications] Satoru Miyamoto et al.: "Ternary Directed Acyclic Word Graphs"Lecture Notes in Computer Science. 2843. 486-493 (2003)

Description

Related Report

[Publications] Hiroshi Sakamoto: "A Fully Linear-Time Approximation Algorithm for Grammar-Based Compression"Proc.14th Annual Symposium on Combinatorial Pattern Matching (CPM 2003). 348-360 (2003)

Description

Related Report

[Publications] Kensuke Baba et al.: "A Note on Randomized Algorithm for String Matching with Mismatches"Nordic Journal of Computing. Vol.10. 2-10 (2003)

Description

Related Report

[Publications] Takuya Kida et al.: "Collage system : A unifying framework for compressed pattern matching"Theoretical Computer Science. Vol.298. 253-272 (2003)

Description

Related Report

[Publications] Masahiro Hirao et al.: "A practical algorithm to find the best subsequences patterns"Theoretical Computer Science. Vol.292. 465-479 (2003)

Description

Related Report

[Publications] Hideo Bannai et al.: "A String Pattern Regression Algorithm and Its Application to Pattern Discovery in Long Introns"In Genome Informatics (GIW2002). Vol.13. 3-11 (2002)

Description

Related Report

[Publications] Shunsuke Inenaga et al.: "Discovering Best Variable-Length-Don't-Care Patterns"Lecture Notes in Computer Science. 2534. 86-97 (2002)

Description

Related Report

[Publications] Kensuke Baba et al.: "A Note on Randomized Algorithm for String Matching with Mismatches"Proc.The Prague Stringology Conference '02 (PSC'02). 29-30 (2002)

Description

Related Report

[Publications] Shunsuke Inenaga et al.: "Compact Directed Acyclic Word Graphs for a Sliding Window"Lecture Notes in Computer Science. 2476. 310-324 (2002)

Description

Related Report

[Publications] Masayuki Takeda et al.: "Processing Text Files as Is : Pattern Matching over Compressed Texts, Multi-Byte Character Texts, and Semi-Structured Texts"Lecture Notes in Computer Science. 2476. 170-186 (2002)

Description

Related Report

[Publications] Shunsuke Inenaga et al.: "Space-Economical Construction of Index Structures for All Suffixes of a String"Lecture Notes in Computer Science. 2420. 341-352 (2002)

Description

Related Report

[Publications] Shunsuke Inenaga et al.: "The Minimum DAWG for All Suffixes of a String and its Applications"Lecture Notes in Computer Science. 2373. 153-167 (2002)

Description

Related Report

[Publications] Ayumi Shinohara et al.: "Finding Best Patterns Practically"Lecture Notes in Artificial Intelligence. 2281. 307-317 (2002)

Description

Related Report

[Publications] Hideo Bannai et al.: "More Speed and More Pattern Variations for Knowledge Discovery System BONSAI"In Genome Informatics (GIW2001). Vol.12. 454-455 (2001)

Description

Related Report

[Publications] Hideaki Hori et al.: "Fragmentary Pattern Matching : Complexity, Algorithms and Applications for Analyzing Classic Literary Works"Lecture Notes in Computer Science. 2223. 719-730 (2001)

Description

Related Report

[Publications] Koichiro Yamamoto et al.: "Discovering repetitive expressions and affinities from anthologies of classical Japanese poems"Lecture Notes in Artificial Intelligence. 2226. 416-428 (2001)

Description

Related Report

[Publications] Masahiro Hirao et al.: "A practical algorithm to find the best episode patterns"Lecture Notes in Artificial Intelligence. 2226. 435-440 (2001)

Description

Related Report

[Publications] T.Kadota et al.: "Musical Sequence Comparison for Melodic and Rhythmic Similarities"Proc.8th Symposium on String Processing and Information Retrieval (SPIRE2001). 111-122 (2001)

Description

Related Report

[Publications] Shunsuke Inenaga et al.: "On-Line Construction of Symmetric Compact Directed Acyclic Word Graphs"Proc.8th Symposium on String Processing and Information Retrieval (SPIRE2001). 96-110 (2001)

Description