2006 Fiscal Year Final Research Report Summary
Pattern Matching Methods for Structured Data in Bioinformatics
Project/Area Number |
16300092
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Bioinformatics/Life informatics
|
Research Institution | Kyoto University |
Principal Investigator |
AKUTSU Tatsuya Kyoto University, Institute for Chemical Research, Professor (90261859)
|
Co-Investigator(Kenkyū-buntansha) |
MIYANO Satoru The University of Tokyo, The Institute of Medical Science, Professor (50128104)
MARUYAMA Osamu Kyusyu University, Faculty of Mathematics, Associate Professor (20282519)
UEDA Nobuhisa Kyoto University, 化学研究所, Assistant Professor (80346048)
HAYASHIDA Morihiro Kyoto University, 化学研究所, Assistant Professor (40402929)
|
Project Period (FY) |
2004 – 2006
|
Keywords | sequence alignment / triangle inequality / edit distance / Euler string / kernel method / support vector machine / chemical structure / subcellular location prediction |
Research Abstract |
Main results are summarized as below 1. Sequence Analysis : In order to speed up accurate homology math based on kcal alignment (Smith-Waterman algorithm), we discovered an inequality among three sequences, which is similar to the triangle inequality, and utilized it to skip redundant comparisons. 2. Pattern Matching of Graphs We focused on pattern matching for tree structures. In order to compute approximate tree edit distance efficiently, we developed novel algorithms, in which each input tree is transformed into an Euler string and then edit distance between the transformed sequences is computed. We analyzed the worst case approximation ratio of the proposed algorithms. Besides, in order to apply tree matching to a practical problem, we developed a Smith-Waterman like algorithm for comparison of glycan structures. The algorithm was implemented in the web server named KCaM. 3. Kernel Methods for Classification of Protein Sequences and Chemical Compounds: We developed a new feature vector for representing protein sequences. It was combined with support vectors machines (SVMs) and was implemented in a web server for prediction of subcellular location of protein sequences. Among various existing methods for subcellular location prediction, the proposed method achieved the highest prediction as under the condition that prediction is made only from protein sequences Besides, in order to improve the efficiency of marginalized graph kernels, which can be applied to classification of chemical compounds, we combined marginalized graph kernels with Morgan indices, by which significant speedup was achieved. 4. Inference of Pre-Image for Chemical Compounds : We analyzed computational complexity of the pro-image problem for graphs, which is to infer the original graph structure from a given feature vector. We also developed a practical branch-and-bound algorithm for inferring the original chemical structures from given feature vectors based on frequency of labeled paths.
|
Research Products
(64 results)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
[Journal Article] A novel representation of protein sequences for prediction of subcellular location using support vector machines2005
Author(s)
S., Matsuda, J-P., Vert, H., Saigo, N., Ueda, H., Toh, T.,Akutsu
-
Journal Title
Protein Science 14
Pages: 2804-2813
Description
「研究成果報告書概要(欧文)」より
-
-
-
-
-
-
-
-
[Journal Article] KCaM (KEGG Carbohydrate Matcher) : a software tool for analyzing the structures of carbohydrate sugar chains2004
Author(s)
K.F., Aoki, A., Yamaguchi, N., Ueda, T., Akutsu, H., Mamitsuka, S., Goto, M., Kanehisa
-
Journal Title
Nucleic Acids Research 32
Pages: 267-272
Description
「研究成果報告書概要(欧文)」より
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
[Book] 共立出版2007
Author(s)
阿久津達也
Total Pages
223
Publisher
バイオインフォマティクスの数理とアルゴリズム
Description
「研究成果報告書概要(和文)」より
-