研究課題/領域番号 |
20K06612
|
研究機関 | 国立遺伝学研究所 |
研究代表者 |
クリュコフ キリル 国立遺伝学研究所, ゲノム・進化研究系, 特命准教授 (20806202)
|
研究期間 (年度) |
2020-04-01 – 2023-03-31
|
キーワード | Genome database / DNA compression |
研究実績の概要 |
Within the last fiscal year (April 2020 - March 2021) I made the following progress related to this project: (1) Publications. I published a first author paper about sequence compression benchmark in GigaScience (Impact factor 5.999). This benchmark shows superiority of our NAF compression. Also published 3 papers utulizing NAF-compressed genome database for medical metagenome analysis (in BMC Microbiology, Scientific Reports, and Infectious Diseases). (2) Database. I maintained and improved the the GenomeSync database (http://genomesync.org/), which currently includes 391,841 genomes (5.7 Tbp of sequence data). The database was moved to National Institute of Genetics and provides open web-access to its genome data. Among recent additions are history charts and detailed download instructions. (3) NAF compressor. I maintained and improved the NAF compressor (https://github.com/KirillKryukov/naf), adding support for storing multiple files in the same archive. NAF Bioinformatics paper has currently 15 Google Scholar citations. (4) Tools using NAF-format sequence data. I made primer tester tool (http://kirill-kryukov.com/study/tools/primer-tester/). Also added support for using NAF-format database with Minimap2 search to Genome Search Toolkit.
|
現在までの達成度 (区分) |
現在までの達成度 (区分)
2: おおむね順調に進展している
理由
The project is going well. I moved to National Institute of Genetics in July 2020 (as a Specially Appointed Associate Professor in Population Genetics Laboratory), where I am continuing working on this project.
|
今後の研究の推進方策 |
I plan to continue improving NAF compression and GenomeSync database. For the items I proposed initially: (1) "Genome database in NAF format": NAF format and GenomeSync database are available for general use. I plan to publish paper about GenomeSync in the near future. (2) "Sequence search tool working directly on NAF files" - I made a primer tester tool that works directly on NAF-formatted database. This year I will continue working on this topic. (3) "K-mer analysis tool working directly on NAF files" - This part is still under development. (4) "NAF support in Genome Search Toolkit" - Support for Minimap2 search using NAF database is already implemented. This year I plan to add support for BLAST search. (5) "Library for adding NAF support to existing tools" - Planned for this or next year.
|
次年度使用額が生じた理由 |
I am continuing this project and I am going to use the money remaining from the previous fiscal year to purchase computer equipment necessary for conducting computational experiments for this project.
|
備考 |
The first two links are about the AGTC project by Naruya Saitou, which includes GenomeSync database. The other link is a blog post explaining the NAF compression in Japanese.
|