• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Improving efficiency of sequence databases by applying the NAF format

Research Project

Project/Area Number 20K06612
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeMulti-year Fund
Section一般
Review Section Basic Section 43060:System genome science-related
Research InstitutionNational Institute of Genetics

Principal Investigator

Kryukov Kirill  国立遺伝学研究所, Biological Networks Laboratory, 特命准教授 (20806202)

Project Period (FY) 2020-04-01 – 2023-03-31
Project Status Completed (Fiscal Year 2022)
Budget Amount *help
¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)
Fiscal Year 2022: ¥910,000 (Direct Cost: ¥700,000、Indirect Cost: ¥210,000)
Fiscal Year 2021: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000)
Fiscal Year 2020: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000)
KeywordsData compression / NAF / GenomeSync / Genome database / DNA compression / Sequence analysis
Outline of Research at the Start

Biological and medical research uses huge databases of genome sequences. Currently all such databases use outdated compression technology. The NAF compression format that we invented recently allows to making databases more compact and much faster to access. This project focuses on developing infrastructure that will allow the field to transition to the NAF format. Such infrastructure includes reference genome database in NAF format, and software tools supporting this format. This project will improve efficiency of biological and medical research, contributing to science and public health.

Outline of Final Research Achievements

The achievements of this project: (1) Continued development, maintenance, and popularization of the Nucleotide Archival Format (NAF). Additions: Improved compression strength, improved customization of decompressed format, support for storing multiple files, added Bioconda installation option. (2) Evaluation of performance of various compressors in the Sequence Compression Benchmark - the most comprehensive benchmark of available compressors for biological sequence data. This benchmark clearly shows that NAF is a superior format for storing and working with sequence data. The benchmark paper has 25 Google Scholar citations. (3) Distributing NAF-compressed genome sequences via the GenomeSync database - one of the largest genome databases. Now GenomeSync offers convenient access to over 640,000 genomes, thanks to the efficiency of the NAF format. (4) Supported NAF in bioinformatic tools such as Genome Search Toolkit and Primer Tester. (5) 9 papers were published related to this project.

Academic Significance and Societal Importance of the Research Achievements

Genome data is increasingly used across many fields of science. NAF greatly increases efficiency of working with such data compared to previous formats. This project applied, improved and advanced NAF towards becoming the fundamental infrastructure tool for the next generation of genome databases.

Report

(4 results)
  • 2022 Annual Research Report   Final Research Report ( PDF )
  • 2021 Research-status Report
  • 2020 Research-status Report
  • Research Products

    (15 results)

All 2023 2022 2021 2020 Other

All Journal Article (9 results) (of which Int'l Joint Research: 7 results,  Peer Reviewed: 6 results,  Open Access: 5 results) Presentation (3 results) (of which Invited: 3 results) Remarks (3 results)

  • [Journal Article] Nanopore sequencing data analysis of 16S rRNA genes using GenomeSync-GSTK system2023

    • Author(s)
      Kirill Kryukov, Tadashi Imanishi, So Nakagawa
    • Journal Title

      Methods in Molecular Biology

      Volume: 2632 Pages: 215-226

    • DOI

      10.1007/978-1-0716-2996-3_15

    • ISBN
      9781071629956, 9781071629963
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Journal Article] Efficient compression of SARS-CoV-2 genome data using Nucleotide Archival Format (NAF)2022

    • Author(s)
      Kirill Kryukov, Lihua Jin, So Nakagawa
    • Journal Title

      Patterns

      Volume: 3 Issue: 9 Pages: 100562-100562

    • DOI

      10.1016/j.patter.2022.100562

    • Related Report
      2022 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] MinION, a portable long-read sequencer, enables rapid vaginal microbiota analysis in a clinical setting2022

    • Author(s)
      Komiya Shinnosuke、Matsuo Yoshiyuki、Nakagawa So、Morimoto Yoshiharu、Kryukov Kirill、Okada Hidetaka、Hirota Kiichi
    • Journal Title

      BMC Medical Genomics

      Volume: 15 Issue: 1 Pages: 68-68

    • DOI

      10.1186/s12920-022-01218-8

    • Related Report
      2021 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Metagenomic analysis of bacterial 16S rRNA sequences2021

    • Author(s)
      Kirill Kryukov, So Nakagawa, Yoshiyuki Matsuo, Kiichi Hirota, Tadashi Imanishi
    • Journal Title

      Experimental Medicine

      Volume: Dec 2021

    • Related Report
      2021 Research-status Report
  • [Journal Article] Full-length 16S rRNA gene amplicon analysis of human gut microbiota using MinION? nanopore sequencing confers species-level resolution2021

    • Author(s)
      Matsuo Yoshiyuki、Komiya Shinnosuke、Yasumizu Yoshiaki、Yasuoka Yuki、Mizushima Katsura、Takagi Tomohisa、Kryukov Kirill、Fukuda Aisaku、Morimoto Yoshiharu、Naito Yuji、Okada Hidetaka、Bono Hidemasa、Nakagawa So、Hirota Kiichi
    • Journal Title

      BMC Microbiology

      Volume: 21 Issue: 1 Pages: 35-35

    • DOI

      10.1186/s12866-021-02094-5

    • Related Report
      2020 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Rapid profiling of drug-resistant bacteria using DNA-binding dyes and a nanopore-based DNA sequencer2021

    • Author(s)
      Ayumu Ohno, Kazuo Umezawa, Satomi Asai, Kirill Kryukov, So Nakagawa, Hayato Miyachi, Tadashi Imanishi
    • Journal Title

      Scientific Reports

      Volume: 11 Issue: 1 Pages: 3436-3436

    • DOI

      10.1038/s41598-021-82903-z

    • Related Report
      2020 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Diagnosis of pleural empyema/parapneumonic effusion by next-generation sequencing2021

    • Author(s)
      Shiraishi Yoshiki、Kryukov Kirill、Tomomatsu Katsuyoshi、Sakamaki Fumio、Inoue Shigeaki、Nakagawa So、Imanishi Tadashi、Asano Koichiro
    • Journal Title

      Infectious Diseases

      Volume: 53 Issue: 6 Pages: 450-459

    • DOI

      10.1080/23744235.2021.1892178

    • Related Report
      2020 Research-status Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] Sequence Compression Benchmark (SCB) database - a comprehensive evaluation of reference-free compressors for FASTA-formatted sequences2020

    • Author(s)
      Kirill Kryukov, Mahoko Takahashi Ueda, So Nakagawa, Tadashi Imanishi
    • Journal Title

      GigaScience

      Volume: 9 Issue: 7

    • DOI

      10.1093/gigascience/giaa072

    • Related Report
      2020 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] ナノポアDNAシークエンサーを用いた 迅速な細菌同定法2020

    • Author(s)
      大野 歩, 中川 草, Kirill Kryukov, 今西 規
    • Journal Title

      臨床化学

      Volume: 49 Pages: 265-270

    • Related Report
      2020 Research-status Report
  • [Presentation] GenomeSync: streamlining access to current genome data2021

    • Author(s)
      Kirill Kryukov
    • Organizer
      Genome Concept Centennial Conference
    • Related Report
      2020 Research-status Report
    • Invited
  • [Presentation] Sequence Data Compression. History, methods, best practices, perspectives.2021

    • Author(s)
      Kirill Kryukov
    • Organizer
      EvoGen Reading Club
    • Related Report
      2020 Research-status Report
    • Invited
  • [Presentation] GenomeSync and GSTK: Toolkit for precision analysis of medical metagenome sequence data2021

    • Author(s)
      Kirill Kryukov
    • Organizer
      Genome Analysis for Precision Medicine of Infectious Diseases, 2021
    • Related Report
      2020 Research-status Report
    • Invited
  • [Remarks] Asia Genome Tao Center (AGTC)

    • URL

      http://www.saitou-naruya-laboratory.org/AGTC.html

    • Related Report
      2020 Research-status Report
  • [Remarks] Asia Genome Tao Center (AGTC)

    • URL

      http://idarwin.org/docs/iDarwin_1_56-58_SAITOU_AGTC.pdf

    • Related Report
      2020 Research-status Report
  • [Remarks] 高速かつ高効率にシーケンスデータを圧縮 / 解凍する NAF

    • URL

      http://kazumaxneo.hatenablog.com/entry/2019/03/07/073000

    • Related Report
      2020 Research-status Report

URL: 

Published: 2020-04-28   Modified: 2024-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi