パターン圧縮に基づく機械発見における計算限界の打破

Research Project

Project/Area Number	09J01104
Research Category	Grant-in-Aid for JSPS Fellows
Allocation Type	Single-year Grants
Section	国内
Research Field	Fundamental theory of informatics
Research Institution	Kyushu University
Principal Investigator	丸山史郎九州大学, 大学院・システム情報科学研究院, 特別研究員(DC2)
Project Period (FY)	2009 – 2010
Project Status	Completed (Fiscal Year 2010)
Budget Amount *help	¥1,400,000 (Direct Cost: ¥1,400,000) Fiscal Year 2010: ¥700,000 (Direct Cost: ¥700,000) Fiscal Year 2009: ¥700,000 (Direct Cost: ¥700,000)
Keywords	テキストデータ圧縮 / 文法変換に基づく圧縮 / 圧縮データ構造 / 文字列パターン検索 / データ圧縮 / 文字列パターン照合 / 情報検索
Research Abstract	冗長度の高いテキストデータのための軽量なオンライン圧縮アルゴリズムを提案した.このアルゴリズムの特徴として,オンラインで動作するため,次々に追加されていくデータを蓄積することなく逐次的に圧縮することが可能である.また,特別なデータ構造を使わずにデータの局所的な整数演算のみで共通の部分文字列を圧縮できるため,テキストが極端に圧縮可能な場合は十分に少ない主記憶領域で実行できる.実験の結果,重複部分を多く含む実データを約10分の1から1000分の1以下にまで圧縮可能であり,文字列索引を利用しているLZMA圧縮法と比較して約10分の1から100分の1以下の主記憶領域で高速に動作することを確認した.文法圧縮テキスト中の部分文字列の高速な参照のための索引付けに関する研究を行った.圧縮テキストを復元せずに元のテキストのように扱うためには,圧縮テキスト上でランダムアクセスを行い,任意の部分文字列を高速に参照できなければならない.本研究では,そのような操作を可能にする文法圧縮テキストのための索引付け手法を提案した.この索引付けは,索引領域も圧縮テキストの圧縮率に応じて圧縮されるという特徴を持っており,極端に圧縮されている圧縮データに対しても,その索引領域は十分に小さい.また,どんな位置にある部分文字列でも一定の時間で抽出できることが保障される.様々なコーパスに対する実験の結果,元の圧縮テキストサイズの1.2倍から1.5倍程度の主記憶領域で1秒間に500万から700万文字の部分文字列を参照できることを確認した.文法圧縮に基づく圧縮索引構造に関する研究を行った.Edit Sensitive Parsingという手法により圧縮された文法データの特性を使い,入力パターンを圧縮することで圧縮テキスト中の高速な検索が可能であり,本研究では,パターンの出現回数,出現位置,任意の部分文字列の報告を行えるように拡張し,実験による評価を行った.

Report

(2 results)

2010 Annual Research Report
2009 Annual Research Report

Research Products
(5 results)

All 2011 2010

All Journal Article (1 results) (of which Peer Reviewed: 1 results) Presentation (4 results)

[Journal Article] Context-Sensitive Grammar Transform : Compression and Pattern Matching2010
- Author(s)
  Shirou.Maruyama, et al.
- Journal Title
  
  IEICE TRANSACTIONS on Information and Systems E92-D, No.2
  
  Pages: 219-226
- Related Report
  2009 Annual Research Report
- Peer Reviewed
[Presentation] An Online Algorithm for Lightweight Compression of Highly Repetitive Text2011
- Author(s)
  Shirou Maruyama, Masaya Nakahara
- Organizer
  The 4^<th> Annual Meeting of Asian Association for Algorithms and Computation (AAAC 2011)
- Place of Presentation
  National Tsing Hua University (Taiwan)
- Year and Date
  2011-04-16
- Related Report
  2010 Annual Research Report
[Presentation] Practical Random Access to Grammar-Based Compression2011
- Author(s)
  Shirou Maruyama
- Organizer
  The 4^<th> Annual Meeting of Asian Association for Algorithms and Computation (AAAC 2011)
- Place of Presentation
  National Tsing Hua University (Taiwan)
- Year and Date
  2011-04-16
- Related Report
  2010 Annual Research Report
[Presentation] 文法型圧縮法の全二分木表現による符号化とランダムアクセス手法の提案2011
- Author(s)
  丸山史郎
- Organizer
  第134回情報処理学会アルゴリズム研究会(SIG-AL)
- Place of Presentation
  琉球大学(沖縄県)
- Year and Date
  2011-03-07
- Related Report
  2010 Annual Research Report
[Presentation] Edit Sensitive Parsingを用いた文法圧縮による効率的な索引構造2010
- Author(s)
  丸山史郎
- Organizer
  第78回人工知能学会基本問題研究会(SIG-FPA.1)
- Place of Presentation
  兵庫県立大学(兵庫県)
- Year and Date
  2010-08-01
- Related Report
  2010 Annual Research Report

パターン圧縮に基づく機械発見における計算限界の打破

Principal Investigator

丸山 史郎 九州大学, 大学院・システム情報科学研究院, 特別研究員(DC2)

¥1,400,000 (Direct Cost: ¥1,400,000)

Report

Research Products

[Journal Article] Context-Sensitive Grammar Transform : Compression and Pattern Matching2010

Author(s)

Journal Title

Related Report

[Presentation] An Online Algorithm for Lightweight Compression of Highly Repetitive Text2011

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Practical Random Access to Grammar-Based Compression2011

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 文法型圧縮法の全二分木表現による符号化とランダムアクセス手法の提案2011

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Edit Sensitive Parsingを用いた文法圧縮による効率的な索引構造2010

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

丸山史郎九州大学, 大学院・システム情報科学研究院, 特別研究員(DC2)