研究実績の概要 |
(1) During the last fiscal year, I published a first author paper: "Efficient compression of SARS-CoV-2 genome data using Nucleotide Archival Format (NAF)" in Patterns (Impact factor 3.19) (doi:10.1016/j.patter.2022.100562). This paper shows superiority of NAF compression for distributing SARS-CoV-2 genome data. We found that NAF provides data distribution efficiency gains ranging from 3.7 to 52.2 times, compared to currently used solutions in GISAID, DDBJ, ENA and NCBI. Also published a first author book chapter about the GSTK pipeline (doi:10.1007/978-1-0716-2996-3_15). (2) Overall, during the course of this project, I published several other papers, including: First author paper in GigaScience (doi:10.1093/gigascience/giaa072, 25 Google Scholar citations), and other related papers in BMC Microbiology (doi:10.1186/s12866-021-02094-5), Scientific Reports (doi:10.1038/s41598-021-82903-z), and Infectious Diseases (doi:10.1080/23744235.2021.1892178). (3) I developed and maintained the open source NAF compressor (https://github.com/KirillKryukov/naf). NAF Bioinformatics paper (doi:10.1093/bioinformatics/btz144) has currently 32 Google Scholar citations. (4) I developed and maintained the GenomeSync database (http://genomesync.org/), which distributes NAF-compressed genome data. Currently it includes 620,002 genomes (+228,161 compared to last year report) (10.0 Tbp of sequence data). (5) I developed tools and pipelines utilizing the NAF compression. In particular, GSTK (doi:10.1007/978-1-0716-2996-3_15) and Primer Finder (https://kirill-kryukov.com/study/tools/primer-tester/).
|