2017 Fiscal Year Annual Research Report
A new rotation-translation invariant molecular encoding and its use in Computer-Aided Drug-Discovery
Project/Area Number |
17F17051
|
Research Institution | Kyushu University |
Principal Investigator |
山西 芳裕 九州大学, 生体防御医学研究所, 准教授 (60437267)
|
Co-Investigator(Kenkyū-buntansha) |
BERENGER FRANCOIS 九州大学, 生体防御医学研究所, 外国人特別研究員
|
Project Period (FY) |
2017-07-26 – 2019-03-31
|
Keywords | LBVS / QSAR / applicability domain |
Outline of Annual Research Achievements |
I have created a rotation-translation invariant encoding of 3D molecular surfaces. It works in a combination of the shape, the hydrophobic, the hydrogen bond donor and acceptor spaces. I have published two articles during this year. I am currently writing an article with my host researcher on a fully automatic applicability domain algorithm. Most previous methods were not fully automatic because they involve a critical user-chosen parameter. Our algorithm boosts classification performance of Quantitative Structure Activity Relationship models. I have written several programs and libraries for chemoinformatics research. Some of them were released as open source. I have also done some QSAR modeling for the CFTR protein target (involved in cystic fibrosis) to select molecules for purchase.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
The research is going well since we already got some interesting results (our new applicability domain algorithm) and we could apply our expertise to a real-world drug-discovery project by doing some QSAR modeling. We have also created several open source libraries for chemoinformatics: bisec-tree (a bisector tree to index molecules), parany (a library to parallelize stream computing) (many chemoinformatics tasks work on a stream of molecules), and cpmlib (more metrics were added in our classification performance metrics library). We have wrapped SVMs and gradient boosting to use with our programming language of choice (OCaml). Thanks to our work in OCaml for chemoinformatics, we have been invited to an upcoming special issue on programming languages for the journal of chemoinformatics.
|
Strategy for Future Research Activity |
Our current encoding of 3D surfaces is too slow for chemoinformatics standards. Also, in ligand-based virtual screening experiments, combining the four spaces only outperforms by a small margin well established 2D fingerprints like MACCS or ECFP4. Moreover, the current encoding does not allow to go from proteins to ligands or the reverse. I will refocus my work in 2D. The encoding will work at a higher abstraction level (with atoms instead of discrete surfaces) and I will use existing software to extract protein-ligand interactions. I will also use existing databases of protein-ligand complexes in order to access a significant amount of high quality data. Maybe, once the descriptor is satisfactorily working in 2D, it will allow us to better understand if and how to extend it to 3D.
|