研究実績の概要 |
Machine learning based methods for metabolite identification from mass spectra usually have two steps: (i) predicting fingerprints from spectra; (ii) searching chemical compounds in database corresponding to the predicted fingerprints. We addressed the following two problems. The first is that existing approaches of fingerprint prediction (step (i)) are based on only individual peaks in the spectra, without explicitly considering the peak interactions. We formulate a sparse interaction model for metabolite peaks, which we call SIMPLE, which is computationally light and interpretable for fingerprint prediction. The formulation of SIMPLE is convex and guarantees global optimization, for which we develop an alternating direction method of multipliers (ADMM) algorithm. Experiments using the MassBank dataset show that SIMPLE achieved comparative prediction accuracy with the current top-performance kernel method, and clearly revealed individual peaks and peak interactions which contribute to enhancing the performance of fingerprint prediction. The other is that fingerprints used in existing methods are often large to cover many substructures or chemical properties, and therefore redundant, in the sense of having many substructures irrelevant to the task, causing limited predictive performance and slow prediction. We proposed ADAPTIVE, which generates representations of metabolites specific to given pairs of spectra and molecular structures. The effectiveness of ADAPTIVE in terms of both predictive performance and computational efficiency was confirmed by using a benchmark data.
|
今後の研究の推進方策 |
Even though our proposed methods are improved from the state-of-the-art frameworks, it is still far from perfection. We could see new problems in incorporating chemical knowledge and characteristics from available data, including substructure relation (for considering the relation between substructures) and ranking problem (for suggesting a set of closely related candidates for chemist to look further), and plan to deal with them. We will also look at the theoretical sides of machine learning models, which are used in our research, to deal with more general biological and chemical problem settings.
|