文字列解析のための統計理論とその計算生化学への応用

研究課題

研究課題/領域番号	26610037
研究種目	挑戦的萌芽研究
配分区分	基金
研究分野	数学基礎・応用数学
研究機関	国立研究開発法人理化学研究所 (2016) 京都大学 (2014-2015)
研究代表者	小谷野仁国立研究開発法人理化学研究所, 生命システム研究センター, 研究員 (10570989)
研究分担者	林田守広京都大学, 化学研究所, 助教 (40402929)
研究期間 (年度)	2014-04-01 – 2017-03-31
研究課題ステータス	完了 (2016年度)
配分額 *注記	3,770千円 (直接経費: 2,900千円、間接経費: 870千円) 2016年度: 1,040千円 (直接経費: 800千円、間接経費: 240千円) 2015年度: 1,300千円 (直接経費: 1,000千円、間接経費: 300千円) 2014年度: 1,430千円 (直接経費: 1,100千円、間接経費: 330千円)
キーワード	文字列 / 確率論 / 統計学 / 機械学習 / 生物配列 / バイオインフォマティクス / 計算生物学
研究成果の概要	本研究プロジェクトでは、まず、私達の以前の研究において文字列の非可換位相半群 A* 上で展開した確率論を拡張し、いくつかの極限定理を証明した。次に、これらの定理を用いて、A* においてマージン最大化原理の下で学習する学習機械の理論を構築し、それを RNA　の 2 次構造とタンパク質間相互作用の予測問題に応用して、実際のデータ解析におけるその有用性を示した。更に、A* 上で混合モデルの理論を構築して、文字列データの教師なしクラスタリング方式を導出し、上述の定理を用いて、その最適性を証明した。最後に、A* 上の分布に対して中央及び中心文字列を定義し、それらを効率的に探索するアルゴリズムを構成した。

報告書

(4件)

研究成果
(18件)

すべて 2017 2016 2015 2014

すべて雑誌論文 (6件) (うち国際共著 1件、査読あり 6件、謝辞記載あり 2件、オープンアクセス 1件) 学会発表 (12件) (うち国際学会 3件)

[雑誌論文] Finding median and center strings for a probability distribution on a set of strings under Levenshtein distance based on integer linear programming2017
- 著者名/発表者名
  Hayashida, M. and Koyano, H.
- 雑誌名
  
  Communications in Computer and Information Science
  
  巻: 690 ページ: 108-121
- DOI
  10.1007/978-3-319-54717-6_7
- ISBN
  9783319547169, 9783319547176
- 関連する報告書
  2016 実績報告書
- 査読あり / 国際共著
[雑誌論文] Maximum margin classifier working in a set of strings2016
- 著者名/発表者名
  H. Koyano, M. Hayashida, T. Akutsu
- 雑誌名
  
  Proceedings of the Royal Society A
  
  巻: 472 号: 2187 ページ: 20150551-20150551
- DOI
  10.1098/rspa.2015.0551
- 関連する報告書
  2016 実績報告書 2015 実施状況報告書
- 査読あり / 謝辞記載あり
[雑誌論文] Integer linear programming approach to median and center strings for a probability distribution on a set of strings2016
- 著者名/発表者名
  Hayashida, M. and Koyano, H.
- 雑誌名
  
  Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies
  
  巻: 3 ページ: 35-41
- DOI
  10.5220/0005666400350041
- NAID
  120005947093
- 関連する報告書
  2016 実績報告書
- 査読あり
[雑誌論文] Integer linear programming approach to center and median strings for a probability distribution on a set of strings2016
- 著者名/発表者名
  Koyano, H. and Hayashida, M
- 雑誌名
  
  Communications in Computer and Information Science
  
  巻: 未定
- 関連する報告書
  2015 実施状況報告書
- 査読あり
[雑誌論文] Archaeal β diversity patterns under the seafloor along geochemical gradients.2014
- 著者名/発表者名
  Koyano,H., Tsubouchi, T., Kishino, H., and Akutsu, T.
- 雑誌名
  
  Journal of Geophysical Research G.
  
  巻: 119 号: 9 ページ: 1770-1788
- DOI
  10.1002/2014jg002676
- NAID
  120005623238
- 関連する報告書
  2014 実施状況報告書
- 査読あり / オープンアクセス / 謝辞記載あり
[雑誌論文] Measuring the similarity of protein structures using image local feature descriptors SIFT and SURF2014
- 著者名/発表者名
  Hayashida, M., Koyano, H., and Akutsu, T.
- 雑誌名
  
  2014 8th International Conference on Systems Biology (ISB)
  
  巻: - ページ: 167-171
- 関連する報告書
  2014 実施状況報告書
- 査読あり
[学会発表] Optimal string clustering based on a statistical theory on a topological monoid of strings2017
- 著者名/発表者名
  Koyano, H., Hayashida, M., and Akutsu, T.
- 学会等名
  13th Workshop on Stochastic Models, Statistics and Their Applications
- 発表場所
  Berlin, Germany
- 関連する報告書
  2016 実績報告書
- 国際学会
[学会発表] Optimal string clustering based on a Laplace-like mixture and EM algorithm on a topological monois of strings2016
- 著者名/発表者名
  小谷野仁
- 学会等名
  1st IMA Conference on Theoretical and Computational Discrete Mathematics
- 発表場所
  Derby, UK
- 年月日
  2016-03-22
- 関連する報告書
  2015 実施状況報告書
- 国際学会
[学会発表] Integer linear programming approach to center and median strings for a probability distribution on a set of strings2016
- 著者名/発表者名
  林田守広
- 学会等名
  7th International Conference on Bioinformatics Models, Methods, and Algorithms
- 発表場所
  Rome, Italy
- 年月日
  2016-02-21
- 関連する報告書
  2015 実施状況報告書
- 国際学会
[学会発表] 文字列の集合上の確率分布における中央文字列および中心文字列に対する整数計画問題2016
- 著者名/発表者名
  林田守広, 小谷野仁
- 学会等名
  日本情報処理学会「数理モデル化と問題解決研究会」, 「バイオ情報学研究会」及び日本電子情報通信学会「ニューロコンピューティング研究会」, 「情報論的学習理論と機械学習研究会」合同研究会
- 発表場所
  沖縄、日本
- 関連する報告書
  2016 実績報告書
[学会発表] 文字列データの統計的クラスタリングのための Laplace 様混合モデルと EM アルゴリズムの理論2015
- 著者名/発表者名
  小谷野仁
- 学会等名
  日本応用数理学会
- 発表場所
  金沢大学
- 年月日
  2015-09-09
- 関連する報告書
  2015 実施状況報告書
[学会発表] 文字列の集合上の Laplace 様混合モデルと EM アルゴリズムに基づく文字列クラスタリグ2015
- 著者名/発表者名
  小谷野仁
- 学会等名
  日本情報処理学会
- 発表場所
  沖縄先端科学技術大学院大学
- 年月日
  2015-06-23
- 関連する報告書
  2015 実施状況報告書
[学会発表] 文字列クラスタリングのための Laplace 様混合モデルに対する EM アルゴリズム2015
- 著者名/発表者名
  小谷野仁, 林田守広
- 学会等名
  日本情報処理学会第 77 回全国大会
- 発表場所
  京都大学
- 年月日
  2015-03-17 – 2015-03-19
- 関連する報告書
  2014 実施状況報告書
[学会発表] Probability theory on a topological monoid of strings and its application to statistical machine learning2014
- 著者名/発表者名
  Koyano, H. and Hayashida, M.
- 学会等名
  International Conference on Recent Advances in Pure and Applied Mathematics
- 発表場所
  Antalya, Turkey
- 年月日
  2014-11-06 – 2014-11-09
- 関連する報告書
  2014 実施状況報告書
[学会発表] Measuring the similarity of protein structures using image local feature descriptors SIFT and SURF2014
- 著者名/発表者名
  Hayashida, M., Koyano, H., and Akutsu, T.
- 学会等名
  The 8th International Conference on Systems Biology and the 4th Translational Bioinformatics Conference
- 発表場所
  Qingdao, China
- 年月日
  2014-10-24 – 2014-10-27
- 関連する報告書
  2014 実施状況報告書
[学会発表] Probability theory on a topological monoid of strings and its application to machine learning2014
- 著者名/発表者名
  Koyano, H.
- 学会等名
  Sweden-Kyoto Symposium co-organized by Uppsala University, Stockholm University, Royal Institute of Technology, Karolinska Institute, and Kyoto University
- 発表場所
  Stockholm, Sweden
- 年月日
  2014-09-11 – 2014-09-12
- 関連する報告書
  2014 実施状況報告書
[学会発表] 文字列の距離空間上の確率論とその機械学習への応用2014
- 著者名/発表者名
  小谷野仁, 林田守広, 阿久津達也
- 学会等名
  日本応用数理学会 2014 年度年会
- 発表場所
  政策研究大学院大学
- 年月日
  2014-09-03 – 2014-09-05
- 関連する報告書
  2014 実施状況報告書
[学会発表] 文字列の距離空間上の最大マージン識別器とそのタンパク質科学への応用2014
- 著者名/発表者名
  小谷野仁, 林田守広, 阿久津達也
- 学会等名
  日本情報処理学会「数理モデル化と問題解決研究会」,「バイオ情報学研究会」及び日本電子情報通信学会「ニューロコンピューティング研究会」,「情報論的学習理論と機械学習研究会」合同研究会
- 発表場所
  沖縄科学技術大学院大学
- 年月日
  2014-06-25 – 2014-06-27
- 関連する報告書
  2014 実施状況報告書

文字列解析のための統計理論とその計算生化学への応用

研究代表者

小谷野 仁 国立研究開発法人理化学研究所, 生命システム研究センター, 研究員 (10570989)

3,770千円 (直接経費: 2,900千円、間接経費: 870千円)

報告書

研究成果

[雑誌論文] Finding median and center strings for a probability distribution on a set of strings under Levenshtein distance based on integer linear programming2017

著者名/発表者名

雑誌名

DOI

ISBN

関連する報告書

[雑誌論文] Maximum margin classifier working in a set of strings2016

著者名/発表者名

雑誌名

DOI

関連する報告書

[雑誌論文] Integer linear programming approach to median and center strings for a probability distribution on a set of strings2016

著者名/発表者名

雑誌名

DOI

NAID

関連する報告書

[雑誌論文] Integer linear programming approach to center and median strings for a probability distribution on a set of strings2016

著者名/発表者名

雑誌名

関連する報告書

[雑誌論文] Archaeal β diversity patterns under the seafloor along geochemical gradients.2014

著者名/発表者名

雑誌名

DOI

NAID

関連する報告書

[雑誌論文] Measuring the similarity of protein structures using image local feature descriptors SIFT and SURF2014

著者名/発表者名

雑誌名

関連する報告書

[学会発表] Optimal string clustering based on a statistical theory on a topological monoid of strings2017

著者名/発表者名

学会等名

発表場所

関連する報告書

[学会発表] Optimal string clustering based on a Laplace-like mixture and EM algorithm on a topological monois of strings2016

著者名/発表者名

学会等名

発表場所

年月日

関連する報告書

[学会発表] Integer linear programming approach to center and median strings for a probability distribution on a set of strings2016

著者名/発表者名

学会等名

発表場所

年月日

関連する報告書

[学会発表] 文字列の集合上の確率分布における中央文字列および中心文字列に対する整数計画問題2016

著者名/発表者名

学会等名

発表場所

関連する報告書

[学会発表] 文字列データの統計的クラスタリングのための Laplace 様混合モデルと EM アルゴリズムの理論2015

著者名/発表者名

学会等名

発表場所

年月日

関連する報告書

[学会発表] 文字列の集合上の Laplace 様混合モデルと EM アルゴリズムに基づく文字列クラスタリグ2015

著者名/発表者名

学会等名

発表場所

年月日

関連する報告書

[学会発表] 文字列クラスタリングのための Laplace 様混合モデルに対する EM アルゴリズム2015

著者名/発表者名

学会等名

発表場所

年月日

関連する報告書

[学会発表] Probability theory on a topological monoid of strings and its application to statistical machine learning2014

著者名/発表者名

学会等名

小谷野仁国立研究開発法人理化学研究所, 生命システム研究センター, 研究員 (10570989)