• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2022 Fiscal Year Annual Research Report

Improving efficiency of sequence databases by applying the NAF format

Research Project

Project/Area Number 20K06612
Research InstitutionNational Institute of Genetics

Principal Investigator

クリュコフ キリル  国立遺伝学研究所, Biological Networks Laboratory, 特命准教授 (20806202)

Project Period (FY) 2020-04-01 – 2023-03-31
KeywordsData compression / NAF / GenomeSync
Outline of Annual Research Achievements

(1) During the last fiscal year, I published a first author paper: "Efficient compression of SARS-CoV-2 genome data using Nucleotide Archival Format (NAF)" in Patterns (Impact factor 3.19) (doi:10.1016/j.patter.2022.100562). This paper shows superiority of NAF compression for distributing SARS-CoV-2 genome data. We found that NAF provides data distribution efficiency gains ranging from 3.7 to 52.2 times, compared to currently used solutions in GISAID, DDBJ, ENA and NCBI. Also published a first author book chapter about the GSTK pipeline (doi:10.1007/978-1-0716-2996-3_15).
(2) Overall, during the course of this project, I published several other papers, including: First author paper in GigaScience (doi:10.1093/gigascience/giaa072, 25 Google Scholar citations), and other related papers in BMC Microbiology (doi:10.1186/s12866-021-02094-5), Scientific Reports (doi:10.1038/s41598-021-82903-z), and Infectious Diseases (doi:10.1080/23744235.2021.1892178).
(3) I developed and maintained the open source NAF compressor (https://github.com/KirillKryukov/naf). NAF Bioinformatics paper (doi:10.1093/bioinformatics/btz144) has currently 32 Google Scholar citations.
(4) I developed and maintained the GenomeSync database (http://genomesync.org/), which distributes NAF-compressed genome data. Currently it includes 620,002 genomes (+228,161 compared to last year report) (10.0 Tbp of sequence data).
(5) I developed tools and pipelines utilizing the NAF compression. In particular, GSTK (doi:10.1007/978-1-0716-2996-3_15) and Primer Finder (https://kirill-kryukov.com/study/tools/primer-tester/).

  • Research Products

    (2 results)

All 2023 2022

All Journal Article (2 results) (of which Int'l Joint Research: 2 results,  Peer Reviewed: 1 results,  Open Access: 1 results)

  • [Journal Article] Nanopore sequencing data analysis of 16S rRNA genes using GenomeSync-GSTK system2023

    • Author(s)
      Kirill Kryukov, Tadashi Imanishi, So Nakagawa
    • Journal Title

      Methods in Molecular Biology

      Volume: 2632 Pages: 215-226

    • DOI

      10.1007/978-1-0716-2996-3_15

    • Int'l Joint Research
  • [Journal Article] Efficient compression of SARS-CoV-2 genome data using Nucleotide Archival Format (NAF)2022

    • Author(s)
      Kirill Kryukov, Lihua Jin, So Nakagawa
    • Journal Title

      Patterns

      Volume: 3 Pages: 100562

    • DOI

      10.1016/j.patter.2022.100562

    • Peer Reviewed / Open Access / Int'l Joint Research

URL: 

Published: 2023-12-25  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi