An approach for eliminating chance correlations and its application to pharmaceutical data.
Project/Area Number |
25460035
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Research Field |
Physical pharmacy
|
Research Institution | Osaka University |
Principal Investigator |
|
Co-Investigator(Kenkyū-buntansha) |
川下 理日人 大阪大学, 薬学研究科, 助教 (00423111)
岡本 晃典 北陸大学, 薬学部, 講師 (70437309)
|
Project Period (FY) |
2013-04-01 – 2017-03-31
|
Project Status |
Completed (Fiscal Year 2016)
|
Budget Amount *help |
¥5,070,000 (Direct Cost: ¥3,900,000、Indirect Cost: ¥1,170,000)
Fiscal Year 2015: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Fiscal Year 2014: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Fiscal Year 2013: ¥1,950,000 (Direct Cost: ¥1,500,000、Indirect Cost: ¥450,000)
|
Keywords | Chance Correlation / L1 Regularization / L2 Regularization / Ridge Regression / Elastic Net / Hydrolyzability / Classification / Data Mining / 正則化 / 加水分解 / 環境化学 / ロジスティック回帰 / 回帰分析 / 判別分析 / 偶然の相関 / 加水分解性予測 / 説明変数 / 医薬学統計 / 主成分分析 / 主成分回帰 / PLS / バリマックス変換 / PCLS |
Outline of Final Research Achievements |
We tried to develop a novel method for eliminating "Chance correlation" descriptors which appear when supervised learning is applied. As a result, we found a combinatoric method using data classification and regression methods gave better results in the case of artificial data. However, we also found that the appropriate combination of L1 and L2 regularization also provided better predictability in the case of real data sets which showed simpler data structures. According to Ockham's prionciple, we adopted elastic net and similar methods to eliminate chance correlation descriptors. Thus, we found the latter combinatoric method applied for predicting hydrolyzabilities of esters, amides, etc showed the best predictability (in the case of esters, the correct classification rate was 89%), when L2 regularization was carried out after L1 one. Therefore, it can be concluded that the former method gives better predictability for complex data, and latter one is better for complex data.
|
Report
(5 results)
Research Products
(7 results)