研究課題/領域番号 |
17F17050
|
研究機関 | 東京大学 |
研究代表者 |
清水 謙多郎 東京大学, 大学院農学生命科学研究科(農学部), 教授 (80178970)
|
研究分担者 |
FANG CHUN 東京大学, 農学生命科学研究科, 外国人特別研究員
|
研究期間 (年度) |
2017-10-13 – 2020-03-31
|
キーワード | タンパク質 / 天然変性 / MoRF / PSSM / 深層学習 |
研究実績の概要 |
Molecular recognition features (MoRFs) are the key functional regions of intrinsically disordered proteins (IDPs), which play important roles in the molecular interaction network of cells and are implicated in many serious human diseases. Identifying MoRFs is the key step for both functional studies of IDPs and drug design. We developed a method, named as en_DCNNMoRF (ensemble Deep Convolutional Neural Network-based MoRF predictor). It combines the outcomes of two independent deep convolutional neural networks (DCNN) classifiers that take advantage of different features. The DCNNMoRF1 employs position-specific scoring matrix (PSSM) and 22 types of amino acid-related factors to describe protein sequences and 13 types of amino acid indexes to describe protein sequences. For both single classifiers, DCNN with a novel two-dimensional attention mechanism was adopted, and an average strategy was added to further process the output probabilities of each DCNN model. Finally, en_DCNNMoRF combined the two separate models by averaging their final scores.
|
現在までの達成度 (区分) |
現在までの達成度 (区分)
2: おおむね順調に進展している
理由
We conducted the followings as described in the research plan. (1) We tracked the latest literatures and database information of intrinsically disordered proteins and collected the available and reliable IDRs and motifs’ datasets for analysis. (2) We applied factor analysis method to seek the least number of factors that can well account for the correlations among all the physicochemical properties of residues, and adopted the selected factors to refine the representation of physicochemical features. (3) We dissected the features of IDRs and motifs and the molecular recognition features (MoRFs) in intrinsically disordered proteins (IDPs). (4) We studied a variety of machine learning methods, especially the application of the deep learning used for sequence prediction, to compare them with each other in the predictions of IDRs and motifs, and select the most effective machine learning method for prediction.
|
今後の研究の推進方策 |
Our research will focus on detail of “adopting feature fusion and feature compression method for identifying IDRs and motifs in IDPs”; the main works are as follows: (1) Apply more effective feature-encoding scheme to combine more predictive features into fewer dimensions for prediction, it includes: 1) remove the redundant features and strengthen the predictive features to enhance the accuracy of prediction. For example, using the scaling skills to enhance the predictive features and weaken the noise features; 2) adopt the image processing technology to preprocess the conserved features included in PSSM; 3) Modify PSSM to combine the detailed local conservation patterns of residues with the distribution of scores in PSSM for prediction. (2) Adopt the feature fusion method, rather than connecting all features in series to design the algorithms, it includes: 1) Firstly, all the physicochemical features will be clustered; 2) secondly, factors will be calculated to represent each clustering; 3) finally, all features (including the revised PSSM and factors calculated from physicochemical features) will be fused and compressed by matrix operations to reduce the feature dimensions. (3) For MoRFs in IDPs, detailed analysis will be carried out according to their different lengths, and the related algorithms will be designed respectively for them. (4) Design the related web tools for publication.
|