研究課題/領域番号 |
20K06612
|
研究機関 | 国立遺伝学研究所 |
研究代表者 |
クリュコフ キリル 国立遺伝学研究所, Department of Informatics, 特命准教授 (20806202)
|
研究期間 (年度) |
2020-04-01 – 2023-03-31
|
キーワード | Data compression / Genome database / NAF / GenomeSync |
研究実績の概要 |
Within the last fiscal year (April 2021 - March 2022) I made the following progress related to this project: (1) Publications. I co-authored a paper utilizing NAF-compressed genome database for medical metagenome analysis (in BMC Medical Genomics). I also published a first author Japanese language paper in Yodosha Experimental Medicine, describing a method for metagenomic analysis of bacterial 16S rRNA sequences, utilizing GenomeSync and NAF. (2) Database. I maintained and improved the the GenomeSync database (https://genomesync.org/), which currently includes 474,453 genomes (7.4 Tbp of sequence data). Recent improvements are: a) Added Genome Selector web-tool (http://genomesync.nig.ac.jp/selector/). b) Improved Statistics page (http://genomesync.nig.ac.jp/statistics/): Added options "Search entire taxonomy", "Show lineage", "NAF size", "FASTA size", "Avg. genome size", "Avg. GC content". c) Added instructions for selective synchronization on the "Downloading" page. (3) NAF compressor. I maintained and improved the NAF compressor (https://github.com/KirillKryukov/naf). Improvements: a) Added quality quantizer for stronger lossy compression of FASTQ data. b) Added "--long" option to ennaf for stronger compression. c) Added bioconda installation. NAF Bioinformatics paper has currently 22 Google Scholar citations.
|
現在までの達成度 (区分) |
現在までの達成度 (区分)
2: おおむね順調に進展している
理由
The project is going well. I moved to the Department of Informatics in National Institute of Genetics in October 2021 (as a Specially Appointed Associate Professor), where I am continuing working on this project, within my available time.
|
今後の研究の推進方策 |
I will continue improving NAF compression and GenomeSync database. For the items I proposed initially: (1) "Genome database in NAF format": NAF format and GenomeSync database are available for general use. I am working to publish paper about GenomeSync in the near future, and another paper about using NAF for compressing SARS-CoV-2 genomes. (2) "Sequence search tool working directly on NAF files" - I made a primer tester tool that works directly on NAF-formatted database. This year I will work on publishing a paper about this method. (3) "K-mer analysis tool working directly on NAF files" - This part is still under development. (4) "NAF support in Genome Search Toolkit" - Support for Minimap2 search using NAF database is already implemented. I am preparing a methodology article about this method. (5) "Library for adding NAF support to existing tools" - Work is ongoing on this item.
|
次年度使用額が生じた理由 |
Last fiscal year I spent only part of the budget, because of time constraints caused by these reasons: 1. Moving to a new lab in NIG. 2. Working on other urgent projects, including data analysis of SARS-CoV-2 sequence data related to the Coronavirus pandemic. In the curent fiscal year I am going to continue working on this project, and I am going to conduct massive computational experiments necessary for developing and validating the methods of this project. For this purpose I am going to purchase additional computer hardware, in particular RAM and data storage system. I will also spend money for paying publication fee of several papers.
|