文字列圧縮技術を基盤とした知識発見アルゴリズムの開発

Research Project

Project/Area Number	09J05720
Research Category	Grant-in-Aid for JSPS Fellows
Allocation Type	Single-year Grants
Section	国内
Research Field	Intelligent informatics
Research Institution	Tohoku University
Principal Investigator	松原渉東北大学, 大学院・情報科学研究科, 特別研究員(DC1)
Project Period (FY)	2009 – 2011
Project Status	Completed (Fiscal Year 2011)
Budget Amount *help	¥2,100,000 (Direct Cost: ¥2,100,000) Fiscal Year 2011: ¥700,000 (Direct Cost: ¥700,000) Fiscal Year 2010: ¥700,000 (Direct Cost: ¥700,000) Fiscal Year 2009: ¥700,000 (Direct Cost: ¥700,000)
Keywords	文字列処理 / 繰り返し構造 / データ圧縮 / アルゴリズム
Research Abstract	本研究では,データ圧縮を単に保存領域の削減にとどまらず,処理の効率化を目指して,圧縮文字列のための文字列アルゴリズムの開発を行った.とりわけ,文字列の繰り返し構造に着目し,圧縮文字列から繰り返し構造を検出するアルゴリズムの開発を目指して研究を行った. ひとつに,繰り返し構造に込められた制約をより明確にするために,繰り返し構造から,もとの文字列を推測するという逆問題に取り組んだ.本年度は,繰り返し構造の中でも局所周期に着目して逆問題に取り組み,部分的な成果を得た.結果として,計算複雑性がアルファベットサイズに依存して変化し,アルファベットサイズに制約がない場合か,アルファベットサイズが2以下であるとき,効率良く逆問題を解くアルゴリズムを与えた.アルファベットサイズが3以上の定数である場合の計算複雑性を明らかにすることが今後の課題である.本研究の成果は,学術論文誌Discrete Applied Mathematicsの特集号に投稿済であり,査読中である. ふたつに,圧縮文字列照合について,文字列に含まれる繰り返し構造を求めるアルゴリズムの開発に取り組んだ.既存研究として,圧縮データ長をn,展開文字列長をNとしたとき,繰り返し構造の存在判定を行う0(nlogN)時間アルゴリズムが知られている.本研究では,この結果をベースとして,スクエア(2回繰り返し部分文字列)および連(極大な周期的部分文字列)の個数を求めるアルゴリズムに拡張した.この成果について,国際学会への投稿を準備している.

Report

(3 results)

Research Products
(7 results)

All 2011 2010 2009 Other

All Journal Article (2 results) (of which Peer Reviewed: 2 results) Presentation (4 results) Remarks (1 results)

[Journal Article] An Efficient Algorithm to Test Square-Freeness of String Compressed by Balanced Straight Line Programs.2010
- Author(s)
  Wataru Matsubara, Shunsuke Inenaga, Ayumi Shinohara.
- Journal Title
  
  Chicago Journal of Theoretical Computer Science (未定)(印刷中)
- Related Report
  2009 Annual Research Report
- Peer Reviewed
[Journal Article] Average Value of Sum of Exponents of Runs in a String2009
- Author(s)
  Kazuhiko Kusano, Wataru Matsubara, Akira Ishino, Ayumi Shinohara.
- Journal Title
  
  International Journal of Foundations of Computer Science (special issue for Prague Stringology Conference) 20
  
  Pages: 1135-1146
- Related Report
  2009 Annual Research Report
- Peer Reviewed
[Presentation] 文字列に含まれる連構造2011
- Author(s)
  松原渉
- Organizer
  電子通信情報学会2011年総合大会
- Place of Presentation
  東京
- Year and Date
  2011-03-15
- Related Report
  2010 Annual Research Report
[Presentation] Inferring strings from runs2010
- Author(s)
  Wataru Matsubara
- Organizer
  Prague Stringology Conference 2010
- Place of Presentation
  チェコプラハ
- Year and Date
  2010-08-31
- Related Report
  2010 Annual Research Report
[Presentation] 移調を許した圧縮文字列照合アルゴリズム2010
- Author(s)
  松原渉
- Organizer
  電子通信情報学会コンピュテーション研究会
- Place of Presentation
  滋賀
- Year and Date
  2010-04-22
- Related Report
  2010 Annual Research Report
[Presentation] 繰り返し構造からの文字列推測の困難さ2009
- Author(s)
  松原渉
- Organizer
  夏のLAシンポジウム
- Place of Presentation
  松島
- Year and Date
  2009-07-22
- Related Report
  2009 Annual Research Report
[Remarks]
- URL
  http://www.shino.ecei.tohoku.ac.jp/runs/
- Related Report
  2009 Annual Research Report

文字列圧縮技術を基盤とした知識発見アルゴリズムの開発

Principal Investigator

松原 渉 東北大学, 大学院・情報科学研究科, 特別研究員(DC1)

¥2,100,000 (Direct Cost: ¥2,100,000)

Report

Research Products

[Journal Article] An Efficient Algorithm to Test Square-Freeness of String Compressed by Balanced Straight Line Programs.2010

Author(s)

Journal Title

Related Report

[Journal Article] Average Value of Sum of Exponents of Runs in a String2009

Author(s)

Journal Title

Related Report

[Presentation] 文字列に含まれる連構造2011

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Inferring strings from runs2010

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 移調を許した圧縮文字列照合アルゴリズム2010

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 繰り返し構造からの文字列推測の困難さ2009

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Remarks]

URL

Related Report

松原渉東北大学, 大学院・情報科学研究科, 特別研究員(DC1)