2015 Fiscal Year Annual Research Report

予防・個別化医療のためのビッグデータ・予測解析

Research Project

Project/Area Number	15F15776
Research Institution	Institute of Physical and Chemical Research
Principal Investigator	角田達彦国立研究開発法人理化学研究所, 統合生命医科学研究センター, グループディレクター (10273468)
Co-Investigator(Kenkyū-buntansha)	KAMOLA PIOTR 国立研究開発法人理化学研究所, 統合生命医科学研究センター, 外国人特別研究員
Project Period (FY)	2015-11-09 – 2018-03-31
Keywords	Personalized Medicine / Clinical Biomarkers / Multi-Omics / Drug Response / Machine Learning
Outline of Annual Research Achievements	The initial period was spend at preparing the hardware and software environment for downstream analysis. To minimize the cost, a custom build workstation was build manually from separate parts. The work began with participation in DREAM Challenge organized by AstraZeneca and Sanger Institute. The aim of the project was to predict pairs of anti-cancer drugs that work in synergistic manner. We were responsible for processing the response biomarkers based on multi-omics datasets, calculating monotherapy response based on thousands of response experiments, testing data imputation methods, executing numerous combination of machine learning algorithms with different biomarker combination, creating final predictions and confidence scores and preparing scripts and submission documents. Secondly, we have engaged in a international collaboration aimed at building online repository of aptamers. Aptamers are powerful tools that can be used for drug targeting, in vitro and in vivo screening, clinical diagnostics and even as therapeutics. We have prepared chemical and genomic annotation for many aptamers. We are currently performing additional computational analysis and expanding the database. Lastly, we are currently working on analysis of multi-omics dataset from lung adenocarcinoma and GWAS data from rheumatoid arthritis patients. Currently we are processing the data, combining the information with publicly available annotation, and preparing computational solution to find biomarkers and classify the diseases.
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason On the hardware and software side, the workstation was successfully build and fitted within the available budget. While problematic, the software environment was set up and configured to take advantage of the graphical processing in the execution of various GPU-accelerated algorithms such as deep learning. Our team has also successfully completed the AstraZeneca-Sanger Drug Combination Prediction DREAM Challenge and submitted all the required models and documents to the organizers. We are currently awaiting the final results. We have already gathered significant amount of data related to aptamers. Current work focus on expanding the annotation and adding additional aptamers from publicly available resources. The database will be published in the future and made available freely to the community. The processing and analysis of clinical data is currently ongoing. Receiving permission to work from the data from ethics committee and gathering the data and patient information took some time but is all completed now.
Strategy for Future Research Activity	While the AstraZeneca-Sanger Drug Combination Prediction DREAM Challenge was completed, our team is currently looking at additional, medical-oriented challenges that we could get participate in. For the aptamer database, we will continue standardizing the data and adding additional annotation related to structure, chemical properties and function of target entity. We have already identified a core set of curators that will help us extract additional information from the literature. After the data is prepared, we will build a relation database in PostgreSQL and a website interface that will provide access to the data. This will form the first publication. Once the data is prepared, there is also the potential of analyzing the data using machine learning approaches to identify valuable design parameters for future experiments. For analysis of clinical multi-omics datasets, we arecurrently re-analyzing the data to take advantage of newer, more sensitive algorithms. A set of pathway, network and biomarker classifiers is being created that will increase the search space. As such analysis is very complex, wewill focus on very computationally intensive methods to detect interactions between interactome elements. This will greatly enhance our ability to detect driver elements that are missed by standard analysis approaches. Simultaneous analysis will be performed that focuses on topology within protein-protein interaction cascade to account for changes at the network level.