研究課題/領域番号 |
15F15788
|
研究機関 | 京都大学 |
研究代表者 |
阿久津 達也 京都大学, 化学研究所, 教授 (90261859)
|
研究分担者 |
MARINI SIMONE 京都大学, 化学研究所, 外国人特別研究員
|
研究期間 (年度) |
2015-11-09 – 2017-03-31
|
キーワード | Protein cleavage / Data fusion / Machine learning, |
研究実績の概要 |
The aim of the project is to develop a data fusion algorithm for predicting protease targets. The project so far has reached the maturity for a first practical application. The Data Fusion algorithm is working and a minimal data set has been prepared. Initial results are promising. Our goal is to discover new protease-target pairs. 1787 new protease-targets, involving 139 proteases and 716 targets, were predicted. To validate our results, we utilized an independent algorithm, CasCleave. CasCleave is based on traditional Machine Learning, therefore it is complementary to our data fusion approach. Though our approach pinpointed targets for all possible peptidases, we could validate only the 73 Caspase-interacting subset of our targets, since CasCleave has a more limited scope compared to our approach. By exploiting the cleavage CasCleave probability distributions of our predicted targets with the ones over the whole human proteome, we found 6 new targets predicted for Caspase-1 (p-value 1.23E-5), 37 for Caspase-3 (p-value 2.2E-16); 4 for Caspase-6 (p-value 6.8E10-4); 5 for Caspase-7 (p-value 9.14E-3); 4 for Caspase-8 (p-value 1.1E-3); and 17 for Granzyme B (p-value 6.84E-3). P-values were computed with KS test. The average interaction probability of our targets predicted by Cascleave is 0.82. In summary, we verified our findings to be highly consistent with the ones of a cutting edge state-of-the-art algorithm, specialized in Caspase. However, our approach predicts a much more large range of potential protease-target pairs.
|
現在までの達成度 (区分) |
現在までの達成度 (区分)
2: おおむね順調に進展している
理由
The algorithm has been implemented. A minimal data sets was assembled with 3 element types, formed by 3 elements, namely peptidases, targets and genes. In particular, from MEROPS we obtained 657 human peptidases affecting 3460 targets and forming 8931 peptidase-target pairs. From their mapping on Uniprot, 3833 genes coding for peptidases or targets were retained. Another important element in the data fusion approach is the presence of constraints. Constraint matrices are populated with the associations relating objects of the same type. In our application we utilized five constraints: one gene-gene interaction matrix from BIOGrid; two target-target and protease-protease interaction matrices from STIRNG (with 0.7 as combined score threshold); two target-target and protease-protease BLAST similarity matrices (with 10E-10 as e-value threshold).
|
今後の研究の推進方策 |
(1) Broadening the scope of the data fusion approach. Integrating drugs data will extend the approach to the discovery of protease-disease association. (2) More data harvesting. Another step that is necessary to be made is the integration of more element types in the data fusion process, namely pathways, drugs, diseases, domains and PSSMs. Other elements are to be added also to the constraints, namely the Gene Ontology, the Disease Ontology, and the Negatome. (3) Online repository. The results, in other words the putative new protease-target and protease-disease associations, can be uploaded online and made available to the scientific community for further research. (4) Wet lab validation. Wet lab experiments can confirm the predicted protease-target pairs.
|