研究実績の概要 |
The problem of protease-protein target prediction has been extensively studied in Bioinformatics. However, existing algorithms are either very specific (i.e. they work only with specific proteins or protein families, such as Caspases) or solely based on the primary structure, therefore very prone to provide false positives (i.e. non-cleaving pairs wrongly labeled as cleaving). Our work consisted in the design of a protease-protein target algorithm, wrapping up the general protein cleavage machinery, through the application of data fusion.
We extracted up to 9000 pairs of cleaving, wet lab tested protease-protein target pairs from the MEROPS database. Beside the use of protein similarity (BLAST), the model was designed by fuse relevant, but directly related cleavage information. By harvesting publicly available data bases such as KEGG, BioGRID, STRING, Domine and Interpro, we included domain, pathway, gene and protein knowledge to our model. By assessing our model on test data, not involved in the training and tuning phase, we showed how it outperforms state-of-the-art software for the protease cleavage target prediction. Unlike state-of-the-art approaches, this algorithm is general and not dedicated to specific proteases, therefore it can be used to explore poorly-studied proteases, where for example secondary and tertiary structure are completely unknown.
|