2020 Fiscal Year Research-status Report
Improving efficiency of sequence databases by applying the NAF format
Project/Area Number |
20K06612
|
Research Institution | National Institute of Genetics |
Principal Investigator |
クリュコフ キリル 国立遺伝学研究所, ゲノム・進化研究系, 特命准教授 (20806202)
|
Project Period (FY) |
2020-04-01 – 2023-03-31
|
Keywords | Genome database / DNA compression |
Outline of Annual Research Achievements |
Within the last fiscal year (April 2020 - March 2021) I made the following progress related to this project: (1) Publications. I published a first author paper about sequence compression benchmark in GigaScience (Impact factor 5.999). This benchmark shows superiority of our NAF compression. Also published 3 papers utulizing NAF-compressed genome database for medical metagenome analysis (in BMC Microbiology, Scientific Reports, and Infectious Diseases). (2) Database. I maintained and improved the the GenomeSync database (http://genomesync.org/), which currently includes 391,841 genomes (5.7 Tbp of sequence data). The database was moved to National Institute of Genetics and provides open web-access to its genome data. Among recent additions are history charts and detailed download instructions. (3) NAF compressor. I maintained and improved the NAF compressor (https://github.com/KirillKryukov/naf), adding support for storing multiple files in the same archive. NAF Bioinformatics paper has currently 15 Google Scholar citations. (4) Tools using NAF-format sequence data. I made primer tester tool (http://kirill-kryukov.com/study/tools/primer-tester/). Also added support for using NAF-format database with Minimap2 search to Genome Search Toolkit.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
The project is going well. I moved to National Institute of Genetics in July 2020 (as a Specially Appointed Associate Professor in Population Genetics Laboratory), where I am continuing working on this project.
|
Strategy for Future Research Activity |
I plan to continue improving NAF compression and GenomeSync database. For the items I proposed initially: (1) "Genome database in NAF format": NAF format and GenomeSync database are available for general use. I plan to publish paper about GenomeSync in the near future. (2) "Sequence search tool working directly on NAF files" - I made a primer tester tool that works directly on NAF-formatted database. This year I will continue working on this topic. (3) "K-mer analysis tool working directly on NAF files" - This part is still under development. (4) "NAF support in Genome Search Toolkit" - Support for Minimap2 search using NAF database is already implemented. This year I plan to add support for BLAST search. (5) "Library for adding NAF support to existing tools" - Planned for this or next year.
|
Causes of Carryover |
I am continuing this project and I am going to use the money remaining from the previous fiscal year to purchase computer equipment necessary for conducting computational experiments for this project.
|
Remarks |
The first two links are about the AGTC project by Naruya Saitou, which includes GenomeSync database. The other link is a blog post explaining the NAF compression in Japanese.
|
-
[Journal Article] Full-length 16S rRNA gene amplicon analysis of human gut microbiota using MinION nanopore sequencing confers species-level resolution2021
Author(s)
Yoshiyuki Matsuo, Shinnosuke Komiya, Yoshiaki Yasumizu, Yuki Yasuoka, Katsura Mizushima, Tomohisa Takagi, Kirill Kryukov, Tadashi Imanishi, Aisaku Fukuda, Yoshiharu Morimoto, Yuji Naito, Hidetaka Okada, Hidemasa Bono, So Nakagawa, Kiichi Hirota
-
Journal Title
BMC Microbiology
Volume: 21
Pages: 35
DOI
Peer Reviewed / Open Access / Int'l Joint Research
-
-
-
-
-
-
-
-
-
-