Project/Area Number |
16300092
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Bioinformatics/Life informatics
|
Research Institution | Kyoto University |
Principal Investigator |
AKUTSU Tatsuya Kyoto University, Institute for Chemical Research, Professor (90261859)
|
Co-Investigator(Kenkyū-buntansha) |
MIYANO Satoru The University of Tokyo, The Institute of Medical Science, Professor (50128104)
MARUYAMA Osamu Kyusyu University, Faculty of Mathematics, Associate Professor (20282519)
UEDA Nobuhisa Kyoto University, 化学研究所, Assistant Professor (80346048)
HAYASHIDA Morihiro Kyoto University, 化学研究所, Assistant Professor (40402929)
|
Project Period (FY) |
2004 – 2006
|
Project Status |
Completed (Fiscal Year 2006)
|
Budget Amount *help |
¥9,700,000 (Direct Cost: ¥9,700,000)
Fiscal Year 2006: ¥2,900,000 (Direct Cost: ¥2,900,000)
Fiscal Year 2005: ¥2,900,000 (Direct Cost: ¥2,900,000)
Fiscal Year 2004: ¥3,900,000 (Direct Cost: ¥3,900,000)
|
Keywords | sequence alignment / triangle inequality / edit distance / Euler string / kernel method / support vector machine / chemical structure / subcellular location prediction / 特徴ベクトル / 木構造 / 平面的グラフ / クラスタリング / タンパク質配列 / 生物情報ネットワーク / グラフカーネル / アルゴリズム / タンパク質細胞内局在部位予測 / サポートベクターマシン / パターンマッチング / モチーフ抽出 / 国際情報交換 / フランス / 位置特異的スコア行列 / 最大共通部分点集合 / 糖鎖 |
Research Abstract |
Main results are summarized as below 1. Sequence Analysis : In order to speed up accurate homology math based on kcal alignment (Smith-Waterman algorithm), we discovered an inequality among three sequences, which is similar to the triangle inequality, and utilized it to skip redundant comparisons. 2. Pattern Matching of Graphs We focused on pattern matching for tree structures. In order to compute approximate tree edit distance efficiently, we developed novel algorithms, in which each input tree is transformed into an Euler string and then edit distance between the transformed sequences is computed. We analyzed the worst case approximation ratio of the proposed algorithms. Besides, in order to apply tree matching to a practical problem, we developed a Smith-Waterman like algorithm for comparison of glycan structures. The algorithm was implemented in the web server named KCaM. 3. Kernel Methods for Classification of Protein Sequences and Chemical Compounds: We developed a new feature vector for representing protein sequences. It was combined with support vectors machines (SVMs) and was implemented in a web server for prediction of subcellular location of protein sequences. Among various existing methods for subcellular location prediction, the proposed method achieved the highest prediction as under the condition that prediction is made only from protein sequences Besides, in order to improve the efficiency of marginalized graph kernels, which can be applied to classification of chemical compounds, we combined marginalized graph kernels with Morgan indices, by which significant speedup was achieved. 4. Inference of Pre-Image for Chemical Compounds : We analyzed computational complexity of the pro-image problem for graphs, which is to infer the original graph structure from a given feature vector. We also developed a practical branch-and-bound algorithm for inferring the original chemical structures from given feature vectors based on frequency of labeled paths.
|