2022 Fiscal Year Final Research Report
Improving efficiency of sequence databases by applying the NAF format
Project/Area Number |
20K06612
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Review Section |
Basic Section 43060:System genome science-related
|
Research Institution | National Institute of Genetics |
Principal Investigator |
Kryukov Kirill 国立遺伝学研究所, Biological Networks Laboratory, 特命准教授 (20806202)
|
Project Period (FY) |
2020-04-01 – 2023-03-31
|
Keywords | Data compression / NAF / GenomeSync |
Outline of Final Research Achievements |
The achievements of this project: (1) Continued development, maintenance, and popularization of the Nucleotide Archival Format (NAF). Additions: Improved compression strength, improved customization of decompressed format, support for storing multiple files, added Bioconda installation option. (2) Evaluation of performance of various compressors in the Sequence Compression Benchmark - the most comprehensive benchmark of available compressors for biological sequence data. This benchmark clearly shows that NAF is a superior format for storing and working with sequence data. The benchmark paper has 25 Google Scholar citations. (3) Distributing NAF-compressed genome sequences via the GenomeSync database - one of the largest genome databases. Now GenomeSync offers convenient access to over 640,000 genomes, thanks to the efficiency of the NAF format. (4) Supported NAF in bioinformatic tools such as Genome Search Toolkit and Primer Tester. (5) 9 papers were published related to this project.
|
Free Research Field |
Bioinformatics
|
Academic Significance and Societal Importance of the Research Achievements |
Genome data is increasingly used across many fields of science. NAF greatly increases efficiency of working with such data compared to previous formats. This project applied, improved and advanced NAF towards becoming the fundamental infrastructure tool for the next generation of genome databases.
|