2019 Fiscal Year Annual Research Report

Advanced machine learning methods for mass spectrometry

Research Project

Project/Area Number	19J14714
Research Institution	Kyoto University
Principal Investigator	NGUYEN DaiーHai 京都大学, 薬学研究科, 特別研究員(DC2)
Project Period (FY)	2019-04-25 – 2021-03-31
Keywords	mass spectrometry / fingerprint prediction / representation learning / sparse learning models
Outline of Annual Research Achievements	Machine learning based methods for metabolite identification from mass spectra usually have two steps: (i) predicting fingerprints from spectra; (ii) searching chemical compounds in database corresponding to the predicted fingerprints. We addressed the following two problems. The first is that existing approaches of fingerprint prediction (step (i)) are based on only individual peaks in the spectra, without explicitly considering the peak interactions. We formulate a sparse interaction model for metabolite peaks, which we call SIMPLE, which is computationally light and interpretable for fingerprint prediction. The formulation of SIMPLE is convex and guarantees global optimization, for which we develop an alternating direction method of multipliers (ADMM) algorithm. Experiments using the MassBank dataset show that SIMPLE achieved comparative prediction accuracy with the current top-performance kernel method, and clearly revealed individual peaks and peak interactions which contribute to enhancing the performance of fingerprint prediction. The other is that fingerprints used in existing methods are often large to cover many substructures or chemical properties, and therefore redundant, in the sense of having many substructures irrelevant to the task, causing limited predictive performance and slow prediction. We proposed ADAPTIVE, which generates representations of metabolites specific to given pairs of spectra and molecular structures. The effectiveness of ADAPTIVE in terms of both predictive performance and computational efficiency was confirmed by using a benchmark data.
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason We have proposed computational frameworks to address the research problems raised in the research proposal. The research achievements have been published at top conferences and journals. We are taking into account other bioinformatics problems and developing machine learning models for dealing with them.
Strategy for Future Research Activity	Even though our proposed methods are improved from the state-of-the-art frameworks, it is still far from perfection. We could see new problems in incorporating chemical knowledge and characteristics from available data, including substructure relation (for considering the relation between substructures) and ranking problem (for suggesting a set of closely related candidates for chemist to look further), and plan to deal with them. We will also look at the theoretical sides of machine learning models, which are used in our research, to deal with more general biological and chemical problem settings.

Research Products
(3 results)

All 2019

All Journal Article (2 results) (of which Int'l Joint Research: 2 results, Peer Reviewed: 2 results, Open Access: 2 results) Presentation (1 results) (of which Int'l Joint Research: 1 results)

[Journal Article] ADAPTIVE: leArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra2019
- Author(s)
  Dai Hai Nguyen, Canh Hao Nguyen, Hiroshi Mamitsuka
- Journal Title
  
  Bioinformatics
  
  Volume: 35 Pages: 164-172
- DOI
  10.1093/bioinformatics/btz319
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Recent Advances and Prospects of Computational Methods for Metabolite Identification: A Review with Emphasis on Machine Learning Approaches2019
- Author(s)
  Dai Hai Nguyen, Canh Hao Nguyen, Hiroshi Mamitsuka
- Journal Title
  
  Briefings in Bioinformatics
  
  Volume: 34 Pages: 323-332
- DOI
  10.1093/bib/bby066
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] ADAPTIVE: leArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra2019
- Author(s)
  Dai Hai Nguyen
- Organizer
  27th International Conference on Intelligent Systems for Molecular Biology (ISMB/ECCB 2019)
- Int'l Joint Research

2019 Fiscal Year Annual Research Report

Advanced machine learning methods for mass spectrometry

Principal Investigator

NGUYEN DaiーHai 京都大学, 薬学研究科, 特別研究員(DC2)

Current Status of Research Progress

Reason

Research Products

[Journal Article] ADAPTIVE: leArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra2019

Author(s)

Journal Title

DOI

[Journal Article] Recent Advances and Prospects of Computational Methods for Metabolite Identification: A Review with Emphasis on Machine Learning Approaches2019

Author(s)

Journal Title

DOI

[Presentation] ADAPTIVE: leArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra2019

Author(s)

Organizer