Pattern Matching Methods for Structured Data in Bioinformatics
Project/Area Number |
16300092
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Bioinformatics/Life informatics
|
Research Institution | Kyoto University |
Principal Investigator |
AKUTSU Tatsuya Kyoto University, Institute for Chemical Research, Professor (90261859)
|
Co-Investigator(Kenkyū-buntansha) |
MIYANO Satoru The University of Tokyo, The Institute of Medical Science, Professor (50128104)
MARUYAMA Osamu Kyusyu University, Faculty of Mathematics, Associate Professor (20282519)
UEDA Nobuhisa Kyoto University, 化学研究所, Assistant Professor (80346048)
HAYASHIDA Morihiro Kyoto University, 化学研究所, Assistant Professor (40402929)
|
Project Period (FY) |
2004 – 2006
|
Project Status |
Completed (Fiscal Year 2006)
|
Budget Amount *help |
¥9,700,000 (Direct Cost: ¥9,700,000)
Fiscal Year 2006: ¥2,900,000 (Direct Cost: ¥2,900,000)
Fiscal Year 2005: ¥2,900,000 (Direct Cost: ¥2,900,000)
Fiscal Year 2004: ¥3,900,000 (Direct Cost: ¥3,900,000)
|
Keywords | sequence alignment / triangle inequality / edit distance / Euler string / kernel method / support vector machine / chemical structure / subcellular location prediction / 特徴ベクトル / 木構造 / 平面的グラフ / クラスタリング / タンパク質配列 / 生物情報ネットワーク / グラフカーネル / アルゴリズム / タンパク質細胞内局在部位予測 / サポートベクターマシン / パターンマッチング / モチーフ抽出 / 国際情報交換 / フランス / 位置特異的スコア行列 / 最大共通部分点集合 / 糖鎖 |
Research Abstract |
Main results are summarized as below 1. Sequence Analysis : In order to speed up accurate homology math based on kcal alignment (Smith-Waterman algorithm), we discovered an inequality among three sequences, which is similar to the triangle inequality, and utilized it to skip redundant comparisons. 2. Pattern Matching of Graphs We focused on pattern matching for tree structures. In order to compute approximate tree edit distance efficiently, we developed novel algorithms, in which each input tree is transformed into an Euler string and then edit distance between the transformed sequences is computed. We analyzed the worst case approximation ratio of the proposed algorithms. Besides, in order to apply tree matching to a practical problem, we developed a Smith-Waterman like algorithm for comparison of glycan structures. The algorithm was implemented in the web server named KCaM. 3. Kernel Methods for Classification of Protein Sequences and Chemical Compounds: We developed a new feature vector for representing protein sequences. It was combined with support vectors machines (SVMs) and was implemented in a web server for prediction of subcellular location of protein sequences. Among various existing methods for subcellular location prediction, the proposed method achieved the highest prediction as under the condition that prediction is made only from protein sequences Besides, in order to improve the efficiency of marginalized graph kernels, which can be applied to classification of chemical compounds, we combined marginalized graph kernels with Morgan indices, by which significant speedup was achieved. 4. Inference of Pre-Image for Chemical Compounds : We analyzed computational complexity of the pro-image problem for graphs, which is to infer the original graph structure from a given feature vector. We also developed a practical branch-and-bound algorithm for inferring the original chemical structures from given feature vectors based on frequency of labeled paths.
|
Report
(4 results)
Research Products
(84 results)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
[Journal Article] Inferring a graph from path frequency2005
Author(s)
T. Akutsu, D. Fukagawa
-
Journal Title
Proc. 16th Annual Symposium on Combinatorial Pattern Matching(CPM 2005), Lecture Notes in Computer Science 3537
Pages: 371-382
NAID
Description
「研究成果報告書概要(和文)」より
Related Report
Peer Reviewed
-
-
-
-
-
-
-
-
-
[Journal Article] A novel representation of protein sequences for prediction of subcellular location using support vector machines2005
Author(s)
S., Matsuda, J-P., Vert, H., Saigo, N., Ueda, H., Toh, T.,Akutsu
-
Journal Title
Protein Science 14
Pages: 2804-2813
Description
「研究成果報告書概要(欧文)」より
Related Report
-
-
-
-
-
-
-
-
-
-
-
-
-
-
[Journal Article] KCaM (KEGG Carbohydrate Matcher) : a software tool for analyzing the structures of carbohydrate sugar chains2004
Author(s)
K.F., Aoki, A., Yamaguchi, N., Ueda, T., Akutsu, H., Mamitsuka, S., Goto, M., Kanehisa
-
Journal Title
Nucleic Acids Research 32
Pages: 267-272
Description
「研究成果報告書概要(欧文)」より
Related Report
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
[Presentation] 特徴ベクトルからの化学構造の推定2006
Author(s)
阿久津達也, 深川大路
Organizer
情報処理学会 第4回バイオ情報学研究会
Place of Presentation
北海道大学
Year and Date
2006-02-10
Description
「研究成果報告書概要(和文)」より
Related Report
-
-
-
-
-
-
-
-
-
-
[Book] 共立出版2007
Author(s)
阿久津達也
Total Pages
223
Publisher
バイオインフォマティクスの数理とアルゴリズム
Description
「研究成果報告書概要(和文)」より
Related Report
-
-