研究課題/領域番号 |
17F17051
|
研究機関 | 九州大学 |
研究代表者 |
山西 芳裕 九州大学, 生体防御医学研究所, 准教授 (60437267)
|
研究分担者 |
BERENGER FRANCOIS 九州大学, 生体防御医学研究所, 外国人特別研究員
|
研究期間 (年度) |
2017-07-26 – 2019-03-31
|
キーワード | LBVS / QSAR / applicability domain |
研究実績の概要 |
I have created a rotation-translation invariant encoding of 3D molecular surfaces. It works in a combination of the shape, the hydrophobic, the hydrogen bond donor and acceptor spaces. I have published two articles during this year. I am currently writing an article with my host researcher on a fully automatic applicability domain algorithm. Most previous methods were not fully automatic because they involve a critical user-chosen parameter. Our algorithm boosts classification performance of Quantitative Structure Activity Relationship models. I have written several programs and libraries for chemoinformatics research. Some of them were released as open source. I have also done some QSAR modeling for the CFTR protein target (involved in cystic fibrosis) to select molecules for purchase.
|
現在までの達成度 (区分) |
現在までの達成度 (区分)
2: おおむね順調に進展している
理由
The research is going well since we already got some interesting results (our new applicability domain algorithm) and we could apply our expertise to a real-world drug-discovery project by doing some QSAR modeling. We have also created several open source libraries for chemoinformatics: bisec-tree (a bisector tree to index molecules), parany (a library to parallelize stream computing) (many chemoinformatics tasks work on a stream of molecules), and cpmlib (more metrics were added in our classification performance metrics library). We have wrapped SVMs and gradient boosting to use with our programming language of choice (OCaml). Thanks to our work in OCaml for chemoinformatics, we have been invited to an upcoming special issue on programming languages for the journal of chemoinformatics.
|
今後の研究の推進方策 |
Our current encoding of 3D surfaces is too slow for chemoinformatics standards. Also, in ligand-based virtual screening experiments, combining the four spaces only outperforms by a small margin well established 2D fingerprints like MACCS or ECFP4. Moreover, the current encoding does not allow to go from proteins to ligands or the reverse. I will refocus my work in 2D. The encoding will work at a higher abstraction level (with atoms instead of discrete surfaces) and I will use existing software to extract protein-ligand interactions. I will also use existing databases of protein-ligand complexes in order to access a significant amount of high quality data. Maybe, once the descriptor is satisfactorily working in 2D, it will allow us to better understand if and how to extend it to 3D.
|