Statistical theory for string data analysis and its application to computational biochemistry

Research Project

Project/Area Number	26610037
Research Category	Grant-in-Aid for Challenging Exploratory Research
Allocation Type	Multi-year Fund
Research Field	Foundations of mathematics/Applied mathematics
Research Institution	Institute of Physical and Chemical Research (2016) Kyoto University (2014-2015)
Principal Investigator	Hitoshi Koyano 国立研究開発法人理化学研究所, 生命システム研究センター, 研究員 (10570989)
Co-Investigator(Kenkyū-buntansha)	林田守広京都大学, 化学研究所, 助教 (40402929)
Project Period (FY)	2014-04-01 – 2017-03-31
Project Status	Completed (Fiscal Year 2016)
Budget Amount *help	¥3,770,000 (Direct Cost: ¥2,900,000、Indirect Cost: ¥870,000) Fiscal Year 2016: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000) Fiscal Year 2015: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000) Fiscal Year 2014: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Keywords	文字列 / 確率論 / 統計学 / 機械学習 / 生物配列 / バイオインフォマティクス / 計算生物学
Outline of Final Research Achievements	In this research project, we first demonstrated limit theorems, extending probability theory that we constructed on a noncommutative topological monoid A* of strings in our previous studies. We next developed a theory of a learning machine that learns under the maximum margin principle in A, using these theorems, and subsequently applied the machine to the prediction problems of RNA secondary structures and protein-protein interactions to examine its usefulness in practical data analysis. Furthermore, we derived an unsupervised procedure for string clustering by constructing a theory of a mixture model on A and demonstrated the optimality of the procedure based on the above-mentioned theorems. Lastly, we introduced median and center strings for a distribution on A* and constructed an algorithm that searches them efficiently.

Report

(4 results)

2016 Annual Research Report Final Research Report ( PDF )
2015 Research-status Report
2014 Research-status Report

Research Products
(18 results)

All 2017 2016 2015 2014

All Journal Article (6 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 6 results, Acknowledgement Compliant: 2 results, Open Access: 1 results) Presentation (12 results) (of which Int'l Joint Research: 3 results)

[Journal Article] Finding median and center strings for a probability distribution on a set of strings under Levenshtein distance based on integer linear programming2017
- Author(s)
  Hayashida, M. and Koyano, H.
- Journal Title
  
  Communications in Computer and Information Science
  
  Volume: 690 Pages: 108-121
- DOI
  10.1007/978-3-319-54717-6_7
- ISBN
  9783319547169, 9783319547176
- Related Report
  2016 Annual Research Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] Maximum margin classifier working in a set of strings2016
- Author(s)
  H. Koyano, M. Hayashida, T. Akutsu
- Journal Title
  
  Proceedings of the Royal Society A
  
  Volume: 472 Issue: 2187 Pages: 20150551-20150551
- DOI
  10.1098/rspa.2015.0551
- Related Report
  2016 Annual Research Report 2015 Research-status Report
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] Integer linear programming approach to median and center strings for a probability distribution on a set of strings2016
- Author(s)
  Hayashida, M. and Koyano, H.
- Journal Title
  
  Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies
  
  Volume: 3 Pages: 35-41
- DOI
  10.5220/0005666400350041
- NAID
  120005947093
- Related Report
  2016 Annual Research Report
- Peer Reviewed
[Journal Article] Integer linear programming approach to center and median strings for a probability distribution on a set of strings2016
- Author(s)
  Koyano, H. and Hayashida, M
- Journal Title
  
  Communications in Computer and Information Science
  
  Volume: 未定
- Related Report
  2015 Research-status Report
- Peer Reviewed
[Journal Article] Archaeal β diversity patterns under the seafloor along geochemical gradients.2014
- Author(s)
  Koyano,H., Tsubouchi, T., Kishino, H., and Akutsu, T.
- Journal Title
  
  Journal of Geophysical Research G.
  
  Volume: 119 Issue: 9 Pages: 1770-1788
- DOI
  10.1002/2014jg002676
- NAID
  120005623238
- Related Report
  2014 Research-status Report
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] Measuring the similarity of protein structures using image local feature descriptors SIFT and SURF2014
- Author(s)
  Hayashida, M., Koyano, H., and Akutsu, T.
- Journal Title
  
  2014 8th International Conference on Systems Biology (ISB)
  
  Volume: - Pages: 167-171
- Related Report
  2014 Research-status Report
- Peer Reviewed
[Presentation] Optimal string clustering based on a statistical theory on a topological monoid of strings2017
- Author(s)
  Koyano, H., Hayashida, M., and Akutsu, T.
- Organizer
  13th Workshop on Stochastic Models, Statistics and Their Applications
- Place of Presentation
  Berlin, Germany
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Presentation] Optimal string clustering based on a Laplace-like mixture and EM algorithm on a topological monois of strings2016
- Author(s)
  小谷野仁
- Organizer
  1st IMA Conference on Theoretical and Computational Discrete Mathematics
- Place of Presentation
  Derby, UK
- Year and Date
  2016-03-22
- Related Report
  2015 Research-status Report
- Int'l Joint Research
[Presentation] Integer linear programming approach to center and median strings for a probability distribution on a set of strings2016
- Author(s)
  林田守広
- Organizer
  7th International Conference on Bioinformatics Models, Methods, and Algorithms
- Place of Presentation
  Rome, Italy
- Year and Date
  2016-02-21
- Related Report
  2015 Research-status Report
- Int'l Joint Research
[Presentation] 文字列の集合上の確率分布における中央文字列および中心文字列に対する整数計画問題2016
- Author(s)
  林田守広, 小谷野仁
- Organizer
  日本情報処理学会「数理モデル化と問題解決研究会」, 「バイオ情報学研究会」及び日本電子情報通信学会「ニューロコンピューティング研究会」, 「情報論的学習理論と機械学習研究会」合同研究会
- Place of Presentation
  沖縄、日本
- Related Report
  2016 Annual Research Report
[Presentation] 文字列データの統計的クラスタリングのための Laplace 様混合モデルと EM アルゴリズムの理論2015
- Author(s)
  小谷野仁
- Organizer
  日本応用数理学会
- Place of Presentation
  金沢大学
- Year and Date
  2015-09-09
- Related Report
  2015 Research-status Report
[Presentation] 文字列の集合上の Laplace 様混合モデルと EM アルゴリズムに基づく文字列クラスタリグ2015
- Author(s)
  小谷野仁
- Organizer
  日本情報処理学会
- Place of Presentation
  沖縄先端科学技術大学院大学
- Year and Date
  2015-06-23
- Related Report
  2015 Research-status Report
[Presentation] 文字列クラスタリングのための Laplace 様混合モデルに対する EM アルゴリズム2015
- Author(s)
  小谷野仁, 林田守広
- Organizer
  日本情報処理学会第 77 回全国大会
- Place of Presentation
  京都大学
- Year and Date
  2015-03-17 – 2015-03-19
- Related Report
  2014 Research-status Report
[Presentation] Probability theory on a topological monoid of strings and its application to statistical machine learning2014
- Author(s)
  Koyano, H. and Hayashida, M.
- Organizer
  International Conference on Recent Advances in Pure and Applied Mathematics
- Place of Presentation
  Antalya, Turkey
- Year and Date
  2014-11-06 – 2014-11-09
- Related Report
  2014 Research-status Report
[Presentation] Measuring the similarity of protein structures using image local feature descriptors SIFT and SURF2014
- Author(s)
  Hayashida, M., Koyano, H., and Akutsu, T.
- Organizer
  The 8th International Conference on Systems Biology and the 4th Translational Bioinformatics Conference
- Place of Presentation
  Qingdao, China
- Year and Date
  2014-10-24 – 2014-10-27
- Related Report
  2014 Research-status Report
[Presentation] Probability theory on a topological monoid of strings and its application to machine learning2014
- Author(s)
  Koyano, H.
- Organizer
  Sweden-Kyoto Symposium co-organized by Uppsala University, Stockholm University, Royal Institute of Technology, Karolinska Institute, and Kyoto University
- Place of Presentation
  Stockholm, Sweden
- Year and Date
  2014-09-11 – 2014-09-12
- Related Report
  2014 Research-status Report
[Presentation] 文字列の距離空間上の確率論とその機械学習への応用2014
- Author(s)
  小谷野仁, 林田守広, 阿久津達也
- Organizer
  日本応用数理学会 2014 年度年会
- Place of Presentation
  政策研究大学院大学
- Year and Date
  2014-09-03 – 2014-09-05
- Related Report
  2014 Research-status Report
[Presentation] 文字列の距離空間上の最大マージン識別器とそのタンパク質科学への応用2014
- Author(s)
  小谷野仁, 林田守広, 阿久津達也
- Organizer
  日本情報処理学会「数理モデル化と問題解決研究会」,「バイオ情報学研究会」及び日本電子情報通信学会「ニューロコンピューティング研究会」,「情報論的学習理論と機械学習研究会」合同研究会
- Place of Presentation
  沖縄科学技術大学院大学
- Year and Date
  2014-06-25 – 2014-06-27
- Related Report
  2014 Research-status Report

Statistical theory for string data analysis and its application to computational biochemistry

Principal Investigator

Hitoshi Koyano 国立研究開発法人理化学研究所, 生命システム研究センター, 研究員 (10570989)

¥3,770,000 (Direct Cost: ¥2,900,000、Indirect Cost: ¥870,000)

Report

Research Products

[Journal Article] Finding median and center strings for a probability distribution on a set of strings under Levenshtein distance based on integer linear programming2017

Author(s)

Journal Title

DOI

ISBN

Related Report

[Journal Article] Maximum margin classifier working in a set of strings2016

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Integer linear programming approach to median and center strings for a probability distribution on a set of strings2016

Author(s)

Journal Title

DOI

NAID

Related Report

[Journal Article] Integer linear programming approach to center and median strings for a probability distribution on a set of strings2016

Author(s)

Journal Title

Related Report

[Journal Article] Archaeal β diversity patterns under the seafloor along geochemical gradients.2014

Author(s)

Journal Title

DOI

NAID

Related Report

[Journal Article] Measuring the similarity of protein structures using image local feature descriptors SIFT and SURF2014

Author(s)

Journal Title

Related Report

[Presentation] Optimal string clustering based on a statistical theory on a topological monoid of strings2017

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] Optimal string clustering based on a Laplace-like mixture and EM algorithm on a topological monois of strings2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Integer linear programming approach to center and median strings for a probability distribution on a set of strings2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 文字列の集合上の確率分布における中央文字列および中心文字列に対する整数計画問題2016

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] 文字列データの統計的クラスタリングのための Laplace 様混合モデルと EM アルゴリズムの理論2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 文字列の集合上の Laplace 様混合モデルと EM アルゴリズムに基づく文字列クラスタリグ2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 文字列クラスタリングのための Laplace 様混合モデルに対する EM アルゴリズム2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Probability theory on a topological monoid of strings and its application to statistical machine learning2014

Author(s)

Organizer