2015 Fiscal Year Annual Research Report

データ融合によるタンパク質切断解析および疾患との関連性発見

Research Project

Project/Area Number	15F15788
Research Institution	Kyoto University
Principal Investigator	阿久津達也京都大学, 化学研究所, 教授 (90261859)
Co-Investigator(Kenkyū-buntansha)	MARINI SIMONE 京都大学, 化学研究所, 外国人特別研究員
Project Period (FY)	2015-11-09 – 2017-03-31
Keywords	Protein cleavage / Data fusion / Machine learning,
Outline of Annual Research Achievements	The aim of the project is to develop a data fusion algorithm for predicting protease targets. The project so far has reached the maturity for a first practical application. The Data Fusion algorithm is working and a minimal data set has been prepared. Initial results are promising. Our goal is to discover new protease-target pairs. 1787 new protease-targets, involving 139 proteases and 716 targets, were predicted. To validate our results, we utilized an independent algorithm, CasCleave. CasCleave is based on traditional Machine Learning, therefore it is complementary to our data fusion approach. Though our approach pinpointed targets for all possible peptidases, we could validate only the 73 Caspase-interacting subset of our targets, since CasCleave has a more limited scope compared to our approach. By exploiting the cleavage CasCleave probability distributions of our predicted targets with the ones over the whole human proteome, we found 6 new targets predicted for Caspase-1 (p-value 1.23E-5), 37 for Caspase-3 (p-value 2.2E-16); 4 for Caspase-6 (p-value 6.8E10-4); 5 for Caspase-7 (p-value 9.14E-3); 4 for Caspase-8 (p-value 1.1E-3); and 17 for Granzyme B (p-value 6.84E-3). P-values were computed with KS test. The average interaction probability of our targets predicted by Cascleave is 0.82. In summary, we verified our findings to be highly consistent with the ones of a cutting edge state-of-the-art algorithm, specialized in Caspase. However, our approach predicts a much more large range of potential protease-target pairs.
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason The algorithm has been implemented. A minimal data sets was assembled with 3 element types, formed by 3 elements, namely peptidases, targets and genes. In particular, from MEROPS we obtained 657 human peptidases affecting 3460 targets and forming 8931 peptidase-target pairs. From their mapping on Uniprot, 3833 genes coding for peptidases or targets were retained. Another important element in the data fusion approach is the presence of constraints. Constraint matrices are populated with the associations relating objects of the same type. In our application we utilized five constraints: one gene-gene interaction matrix from BIOGrid; two target-target and protease-protease interaction matrices from STIRNG (with 0.7 as combined score threshold); two target-target and protease-protease BLAST similarity matrices (with 10E-10 as e-value threshold).
Strategy for Future Research Activity	(1) Broadening the scope of the data fusion approach. Integrating drugs data will extend the approach to the discovery of protease-disease association. (2) More data harvesting. Another step that is necessary to be made is the integration of more element types in the data fusion process, namely pathways, drugs, diseases, domains and PSSMs. Other elements are to be added also to the constraints, namely the Gene Ontology, the Disease Ontology, and the Negatome. (3) Online repository. The results, in other words the putative new protease-target and protease-disease associations, can be uploaded online and made available to the scientific community for further research. (4) Wet lab validation. Wet lab experiments can confirm the predicted protease-target pairs.

Research Products
(1 results)

All Int'l Joint Research (1 results)

[Int'l Joint Research] University of Pavia(イタリア)
- Country Name
  ITALY
- Counterpart Institution
  University of Pavia