2023 Fiscal Year Research-status Report
Fully automated protein NMR assignments and structures from raw time-domain data by deep learning
Project/Area Number |
23K05660
|
Research Institution | Tokyo Metropolitan University |
Principal Investigator |
PETER GUENTERT 東京都立大学, 理学研究科, 客員教授 (20392110)
|
Co-Investigator(Kenkyū-buntansha) |
池谷 鉄兵 東京都立大学, 理学研究科, 准教授 (30457840)
伊藤 隆 東京都立大学, 理学研究科, 教授 (80261147)
|
Project Period (FY) |
2023-04-01 – 2026-03-31
|
Keywords | NMR / machine learning / automated assignment / protein structure |
Outline of Annual Research Achievements |
The ARTINA workflow for machine learning-based automated NMR spectra analysis was combined with AlphaFold structure prediction and UCBShift chemical shift prediction in order to drastically reduce the amount of NMR spectra that are required for obtaining the chemical shift assignment of a protein. Extensive studies have been performed to identify the optimal sets of NMR spectra for the assignment of the backbone or all chemical shifts in a protein. This was published in Klukowski et al., Science Advances 9, eadi9323 (2023).
In addition, ARTINA was generalized to additional biomacromolecular systems. Originally, ARTINA was designed exclusively for monomeric proteins composed of standard amino acid residues. This restriction has been lifted to enable automated NMR spectra analysis also for protein-protein, protein-small molecule ligand, RNA and DNA systems. This significantly extends the applicability of the method to biologically important systems.
A large scale data set comprising more than 1300 multidimensional NMR spectra, from which the chemical shift assignments and three-dimensional structures of 100 proteins can be obtained, has been published as open research data for general use by the NMR research community. Published in Klukowski et al., Nature Scientific Data 11, 30 (2024).
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
Research progressed as planned. In particular, we started to develop new machine learning models specifically for analyzing two-dimensional homonuclear NMR spectra of proteins (see below).
|
Strategy for Future Research Activity |
As a new direction of research, we have started to implement new machine learning models specifically for the purpose of analyzing two-dimensional 1H-1H NMR spectra of proteins up to ~20 kDa. If successful, this will enable efficient NMR studies of proteins without isotope labeling and requiring much less NMR measurement time.
Training and testing data is crucial for machine learning applications, which constitutes a limiting factor for its use in biological NMR spectroscopy. In order to collect and make available a larger and more diverse set of multi-dimensional NMR spectra, we are developing a new public website and data repository for the upload, storage, and access of primary biomolecular NMR data, i.e. spectra or time-domain data. This should become available for general use by the NMR research community in the near future.
|
Causes of Carryover |
No travel expenses in FY2023. Instead, traveling between ETH Zurich and Tokyo Metropolitan University, as well as for participating in the leading biological NMR Conference, ICMRBS 2024, to be held in Seoul, South Korea, is planned for FY2024.
|