2015 Fiscal Year Final Research Report
Analysis of Repetition Structure in Huge Sequences
Project/Area Number |
25280079
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Partial Multi-year Fund |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Hokkaido University |
Principal Investigator |
|
Co-Investigator(Kenkyū-buntansha) |
KUDO Mineichi 北海道大学, 大学院情報科学研究科, 教授 (60205101)
TAKIGAWA Ichigaku 北海道大学, 大学院情報科学研究科, 准教授 (10374597)
|
Co-Investigator(Renkei-kenkyūsha) |
MAMITSUKA Hiroshi 京都大学, 化学研究所, 教授 (00346107)
KIDA Takuya 北海道大学, 大学院情報科学研究科, 准教授 (70343316)
OKUBO Yoshiaki 北海道大学, 大学院情報科学研究科, 助教 (40271639)
|
Project Period (FY) |
2013-04-01 – 2016-03-31
|
Keywords | 知識発見とデータマイニング / シーケンスマイニング / ゲノム情報処理 / 頻出パターンマイニング |
Outline of Final Research Achievements |
We developed an algorithm for enumerating frequent approximate string patterns, and proposed a method of extracting occurrence regions of the enumerated patterns as a method of extracting interspersed repetitive elements in a huge sequence like a DNA sequence. Patterns of proposed methods have occurrences of clear boundaries, so there is little chance to count essentially the same region more than once. Furthermore, our enumeration algorithm runs very fast and with small memory. According to our empirical results using human chromosome 21, a half of the known Alu regions, which are famous interspersed repetitive elements, is extracted as occurrence regions of 100 representative patterns that were selected from enumerated frequent approximate patterns.
|
Free Research Field |
知能情報学
|