Statistical theory for string data analysis and its application to computational biochemistry
Project/Area Number |
26610037
|
Research Category |
Grant-in-Aid for Challenging Exploratory Research
|
Allocation Type | Multi-year Fund |
Research Field |
Foundations of mathematics/Applied mathematics
|
Research Institution | Institute of Physical and Chemical Research (2016) Kyoto University (2014-2015) |
Principal Investigator |
Hitoshi Koyano 国立研究開発法人理化学研究所, 生命システム研究センター, 研究員 (10570989)
|
Co-Investigator(Kenkyū-buntansha) |
林田 守広 京都大学, 化学研究所, 助教 (40402929)
|
Project Period (FY) |
2014-04-01 – 2017-03-31
|
Project Status |
Completed (Fiscal Year 2016)
|
Budget Amount *help |
¥3,770,000 (Direct Cost: ¥2,900,000、Indirect Cost: ¥870,000)
Fiscal Year 2016: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Fiscal Year 2015: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Fiscal Year 2014: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
|
Keywords | 文字列 / 確率論 / 統計学 / 機械学習 / 生物配列 / バイオインフォマティクス / 計算生物学 |
Outline of Final Research Achievements |
In this research project, we first demonstrated limit theorems, extending probability theory that we constructed on a noncommutative topological monoid A* of strings in our previous studies. We next developed a theory of a learning machine that learns under the maximum margin principle in A*, using these theorems, and subsequently applied the machine to the prediction problems of RNA secondary structures and protein-protein interactions to examine its usefulness in practical data analysis. Furthermore, we derived an unsupervised procedure for string clustering by constructing a theory of a mixture model on A* and demonstrated the optimality of the procedure based on the above-mentioned theorems. Lastly, we introduced median and center strings for a distribution on A* and constructed an algorithm that searches them efficiently.
|
Report
(4 results)
Research Products
(18 results)