2019 Fiscal Year Annual Research Report
Advanced machine learning methods for mass spectrometry
Project/Area Number |
19J14714
|
Research Institution | Kyoto University |
Principal Investigator |
NGUYEN DaiーHai 京都大学, 薬学研究科, 特別研究員(DC2)
|
Project Period (FY) |
2019-04-25 – 2021-03-31
|
Keywords | mass spectrometry / fingerprint prediction / representation learning / sparse learning models |
Outline of Annual Research Achievements |
Machine learning based methods for metabolite identification from mass spectra usually have two steps: (i) predicting fingerprints from spectra; (ii) searching chemical compounds in database corresponding to the predicted fingerprints. We addressed the following two problems. The first is that existing approaches of fingerprint prediction (step (i)) are based on only individual peaks in the spectra, without explicitly considering the peak interactions. We formulate a sparse interaction model for metabolite peaks, which we call SIMPLE, which is computationally light and interpretable for fingerprint prediction. The formulation of SIMPLE is convex and guarantees global optimization, for which we develop an alternating direction method of multipliers (ADMM) algorithm. Experiments using the MassBank dataset show that SIMPLE achieved comparative prediction accuracy with the current top-performance kernel method, and clearly revealed individual peaks and peak interactions which contribute to enhancing the performance of fingerprint prediction. The other is that fingerprints used in existing methods are often large to cover many substructures or chemical properties, and therefore redundant, in the sense of having many substructures irrelevant to the task, causing limited predictive performance and slow prediction. We proposed ADAPTIVE, which generates representations of metabolites specific to given pairs of spectra and molecular structures. The effectiveness of ADAPTIVE in terms of both predictive performance and computational efficiency was confirmed by using a benchmark data.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
We have proposed computational frameworks to address the research problems raised in the research proposal. The research achievements have been published at top conferences and journals. We are taking into account other bioinformatics problems and developing machine learning models for dealing with them.
|
Strategy for Future Research Activity |
Even though our proposed methods are improved from the state-of-the-art frameworks, it is still far from perfection. We could see new problems in incorporating chemical knowledge and characteristics from available data, including substructure relation (for considering the relation between substructures) and ranking problem (for suggesting a set of closely related candidates for chemist to look further), and plan to deal with them. We will also look at the theoretical sides of machine learning models, which are used in our research, to deal with more general biological and chemical problem settings.
|
Research Products
(3 results)