Developing fast algorithm for analyzing Giga-sequence data
Project/Area Number |
22700319
|
Research Category |
Grant-in-Aid for Young Scientists (B)
|
Allocation Type | Single-year Grants |
Research Field |
Bioinformatics/Life informatics
|
Research Institution | National Institute of Advanced Industrial Science and Technology |
Principal Investigator |
SHIMIZU Kana 独立行政法人産業技術総合研究所, 生命情報工学研究センター, 研究員 (60367050)
|
Project Period (FY) |
2010 – 2011
|
Project Status |
Completed (Fiscal Year 2011)
|
Budget Amount *help |
¥3,250,000 (Direct Cost: ¥2,500,000、Indirect Cost: ¥750,000)
Fiscal Year 2011: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Fiscal Year 2010: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000)
|
Keywords | ゲノム / ギガシークエンサー / アルゴリズム / ショートリード / 類似配列検索 / 編集距離 / ギガシークエンスデーター / 最小全域木 |
Research Abstract |
Next Generation Sequencing(NGS) technology calls for fast and accurate algorithms that can evaluate sequence similarity for a huge amount data. In this study, we designed and implemented exact algorithm SlideSort that finds all similar pairs whose edit-distance does not exceed a given threshold from NGS data, which helps many important analyses, such as de novo genome assembly, identification of frequently appearing sequence patterns and accurate clustering. In comparison to state-of-the-art methods, our method is much faster in finding remote matches, scaling easily to tens of millions of sequences. Our software has an additional function of single link clustering, which is useful in summarizing NGS data for further processing.
|
Report
(3 results)
Research Products
(12 results)