2020 Fiscal Year Annual Research Report
環境ゲノムと機械学習の融合による未知代謝機能の解明と環境工学イノベーションの創出
Project/Area Number |
20F20346
|
Research Institution | National Institute of Advanced Industrial Science and Technology |
Principal Investigator |
延 優 国立研究開発法人産業技術総合研究所, 生命工学領域, 研究員 (40805644)
|
Co-Investigator(Kenkyū-buntansha) |
LENG LING 国立研究開発法人産業技術総合研究所, 生命工学領域, 外国人特別研究員
|
Project Period (FY) |
2020-11-13 – 2023-03-31
|
Keywords | Metagenomics / Machine learning / Genome annotation / Metabolism |
Outline of Annual Research Achievements |
A comprehensive genome database was successfully constructed and machine-learning-based pipeline has been developed to mine bacterial genomes for novel functions. Preliminary analyses have allowed identification of potential metabolic networks that have been overlooked in both cultured and uncultured bacteria. We have further explored what proportion of genes are core and specific to each pathway and which protein families are prone to being non-essential or -specific. Collectively, this will serve as foundational criteria for identification of genes that have the same function and contribute to a specific pathway.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
The project is proceeding as expected.
|
Strategy for Future Research Activity |
The next step is to discover new biological pathways containing the identified essential and specific proteins. With each representation of core genes as one dimension, similarity-based learning approach (i.e. Nearest Neighbor Algorithm) will be adopted to group the sequence windows that contain these genes. Subsequently, the category of potential pathway-associated windows will be predicted based on the similarity (space distance) with known aromatic compound degradation pathways. This allows us to identify candidate sequence windows for putative aromatic compound degradation pathways. Based on preliminary results, we found the xenobiotics-related genes consistently showed distinct phylogenetic behavior (tight clustering and confinement to specific habitats) compared to those associated with degradation of natural aromatic compounds. Using this trend, we can further differentiate pathways related to xenobiotics and natural compounds. To strengthen the connection between machine learning and pathway prediction, we will adopt thermodynamics to estimate the feasibility and directionality of predicted biochemical reactions.
|