• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Modeling and prediction of genome sequence information by using information representation models

Research Project

Project/Area Number 12208010
Research Category

Grant-in-Aid for Scientific Research on Priority Areas

Allocation TypeSingle-year Grants
Review Section Biological Sciences
Research InstitutionKyoto University (2003-2004)
The University of Tokyo (2001-2002)
The Institute of Physical and Chemical Research (2000)

Principal Investigator

YADA Tetsushi  Kyoto University, Graduate School of Informatics, Associate Professor, 情報学研究科, 助教授 (10322728)

Co-Investigator(Kenkyū-buntansha) ASAI Kiyoshi  The University of Tokyo, Graduate School of Frontier Science, Professor, 新領域創成科学研究科, 教授 (30356357)
Project Period (FY) 2000 – 2004
Project Status Completed (Fiscal Year 2004)
Budget Amount *help
¥72,400,000 (Direct Cost: ¥72,400,000)
Fiscal Year 2004: ¥20,000,000 (Direct Cost: ¥20,000,000)
Fiscal Year 2003: ¥20,000,000 (Direct Cost: ¥20,000,000)
Fiscal Year 2002: ¥15,400,000 (Direct Cost: ¥15,400,000)
Fiscal Year 2001: ¥17,000,000 (Direct Cost: ¥17,000,000)
Keywordsbioinformatics / sequence analysis / gene finding / stochastic model / machine learning / カーネル法 / ゲノム生物学 / 生物配列解析 / ゲノム / 統計推定 / 学習アルゴリズム / 認識アルゴリズム / データマイニング / ヒト / 遺伝子 / アノテーション / コンピュータ / アルゴリズム / ソフトウェア
Research Abstract

In this research, we have focused on gene models which are capable of finding genes from genome sequences.
First, we have developed a general purpose algorithm which finds genes by combining plural existing gene-finders. The algorithm has been implemented into a novel gene-finder named DIGIT. An outline of the algorithm is as follows. First, existing gene-finders are applied to an uncharacterized genomic sequence (input sequence). Next, DIGIT produces all possible exons from the results of gene-finders, and assigns them their exon types, reading frames and exon scores. Finally, DIGIT searches a set of exons whose additive score is maximized under their reading frame constraints. Bayesian procedure and a hidden Markov model (HMM) are used to infer exon scores and search the exon set, respectively. We have designed DIGIT so as to combine the results of FGENESH, GENSCAN and HMMgene, and have assessed its prediction accuracy by using recently compiled benchmark data sets. For all data sets, … More DIGIT successfully discarded many false-positive exons predicted by individual gene-finders and yielded remarkable improvements in sensitivity and specificity at the gene level compared with the best gene level accuracies achieved by any single gene-finder.
Second, we have developed a novel index which precisely derives protein coding regions from cross-species genome alignments. The index is deeply related to frame recovery observed in coding sequence alignments, that is, if insertions or deletions of nucleotides causes frame shifts in coding regions, other in-dels which recover the reading frames will be often observed in the vicinity. In contrast, such frame recoveries are not observed in other conserved regions. We prepared two gene models: a model which finds gene by using sequence similarity and intrinsic gene measures (basic model), and the other model which finds gene by using frame recovery index in addition to sequence similarity and intrinsic gene measures (frame recovery model). We evaluated the prediction accuracies of the two models, and our benchmark test revealed that frame recovery model significantly improved the prediction accuracy in comparison with basic model.
Third, we have developed GeneDecoder which is a gene finding technology for eukaryotes, based on HMMs. The algorithm, using dynamic programing method and statistic models trained by annotated genome sequences, divides the input nucleic acid sequence into some meaningful segments. Besides, GeneDecoder has some additional features: (1) multi-stream architecture, (2) incorporation of similarity search and (3) SVM-driven putative splice sites screening. (1) In addition to nucleic acid sequences, GeneDecoder allows any other data streams to be added. Typically, dicodon bigram values can be calculated in advance and be aligned on a 'Direct' stream, which makes state transition networks much simpler. Any other meaningful features extracted in advance can be incorporated to. gene-finding process using this scheme. (2) Combining calculation of coding potential and similarity search with known sequence database realizes more reliable putative exons. For this purpose, GeneDecoder has ability both to embed known motif models in exon models and to use segments with which similarity to known sequence was found by BLAST search. (3) Support Vector Machine (SVM) is one of the pattern re cognition techniques known to have high classification capability and has succes sfully been applied to splice site prediction. In GeneDecoder, this fearure is implemented as well as PWM-based splice site mod els. While parsing, putative splice sites derived from the PWM-based models but have poor support by the SVMs designed as splice site classifiers are excluded. Less

Report

(6 results)
  • 2004 Annual Research Report   Final Research Report Summary
  • 2003 Annual Research Report
  • 2002 Annual Research Report
  • 2001 Annual Research Report
  • 2000 Annual Research Report
  • Research Products

    (46 results)

All 2005 2004 2003 2002 2001 2000 Other

All Journal Article (29 results) Book (1 results) Publications (16 results)

  • [Journal Article] Genome sequencing and analysis of Aspergillus oryzae2005

    • Author(s)
      Machida M., Asai K., et al.
    • Journal Title

      Nature 438

      Pages: 1157-1161

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Sequencing of Aspergillus nidulans and comparative analysis with A. fumitatus and A. oryzae2005

    • Author(s)
      Galagan JE, Calvo SE, Cuomo C, et al.
    • Journal Title

      Nature 438

      Pages: 1105-1115

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Genome sequence of pathogenic and allergenic filamentous fungus Aspergillus fumigatus2005

    • Author(s)
      Nierman WC, Pain A, Anderson MJ, et al.
    • Journal Title

      Nature 438

      Pages: 1151-1155

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Extracting relations between promoter sequences and their strengths from microarray data2005

    • Author(s)
      Kiryu, H., Oshima, T., Asai, K.
    • Journal Title

      Bioinformatics 21

      Pages: 1062-1068

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Extracting relations between promoter sequences and their strengths from microarray data2005

    • Author(s)
      Hisanori Kiryu, Taku Oshima, Kiyoshi Asai
    • Journal Title

      Bioinformatics 21

      Pages: 1062-1068

    • Related Report
      2004 Annual Research Report
  • [Journal Article] Finishing the euchromatic sequence of the human genome2004

    • Author(s)
      International Human Genome Sequencing Consortium
    • Journal Title

      Nature 431

      Pages: 931-945

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Annual Research Report 2004 Final Research Report Summary
  • [Journal Article] Complete sequencing and characterization of 21,243 full-length human cDNAs2004

    • Author(s)
      Ota, T., Suzuki, Y., Nishikawa, T., et al.
    • Journal Title

      Nat. Genet. 36

      Pages: 40-45

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Minimizing the Cross Validation Error to Mix Kernel Matrices of Heterogeneous Biological Data2004

    • Author(s)
      Tsuda, K., Uda, S., Kin, T., Asai, K.
    • Journal Title

      Neural Processing Letters 19

      Pages: 63-72

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] DIGIT : a novel gene finding program by combining gene-finders2003

    • Author(s)
      Yada, T., Totoki, Y., Takaeda, Y., Sakaki, Y., Takagi, T.
    • Journal Title

      Proc. of Pacific Sympo. on Biocomputing ' 03

      Pages: 375-387

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Whole-genome screening indicates a possible burst of formation of processed pseudogenes and alu repeats by particular 11 subfamilies in ancestral primates2003

    • Author(s)
      Ohshima, K., Hattori, M., Yada, T., Gojobori, T., Sakaki, Y., Okada, N.
    • Journal Title

      Genome Biol. 4

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] DIGIT : a novel gene finding program by combining gene-finders2003

    • Author(s)
      Yada, T., Totoki, Y., Takaeda, Y., Sakaki, Y., Takagi, T.
    • Journal Title

      Proc. of Pacific Sympo. on Biocomputing 03

      Pages: 375-387

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Statistics for Biological Sequences2003

    • Author(s)
      Kishino, H., Asai, K.
    • Journal Title

      Iwanami Publisher

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] A novel index which precisely derives protein coding regions from cross-species genome alignments2002

    • Author(s)
      Noguchi, H., Yada, T., Sakaki, Y.
    • Journal Title

      In Proc. of Genome Informatics Workshop 2002

      Pages: 183-191

    • NAID

      130003997152

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Marginalized kernels for biological sequences2002

    • Author(s)
      Tsuda, K., Kin, T., Asai, K.
    • Journal Title

      Bioinformatics 18

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Marginalized Kernels for RNA Sequence Data Analysis2002

    • Author(s)
      Kin, T., Tsuda K., Asai, K.
    • Journal Title

      In Proc. of Genome Informatics Workshop 2002

      Pages: 112-122

    • NAID

      130003997145

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Modeling Splicing Sites with Pairwise Correlations2002

    • Author(s)
      Arita, M., Tsuda, K., Asai, K.
    • Journal Title

      Bioinformatics 18

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Chromosome-wide assessment of replication timing for human chromosomes 11q and 21q : disease-related genes in timingswitch regions2002

    • Author(s)
      Watanabe, Y., Fujiyama, A., Ichiba, Y., Hattori, M., Yada, T., Sakaki, Y., Ikemura, T.
    • Journal Title

      Human Molecular Genetics 11

      Pages: 13-21

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] A novel index which precisely derives protein coding regions from cross-species genome alignments2002

    • Author(s)
      Noguchi, H., Yada, T., Sakaki, Y.
    • Journal Title

      Proc. of Genome Informatics Workshop 2002

      Pages: 183-191

    • NAID

      130003997152

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Marginalized Kernels for RNA Sequence Data Analysis2002

    • Author(s)
      Kin, T., Tsuda K., Asai, K.
    • Journal Title

      Proc. of Genome Informatics Workshop 2002

      Pages: 112-122

    • NAID

      130003997145

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Initial sequencing and analysis of the human genome2001

    • Author(s)
      International Human Genome Sequencing Consortium
    • Journal Title

      Nature 409

      Pages: 860-921

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] A physical map of the human genome2001

    • Author(s)
      The International Human Genome Mapping Consortium
    • Journal Title

      Nature 409

      Pages: 934-941

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] A novel bacterial gene-finding system with improved accuracy in locating start codons2001

    • Author(s)
      Yada, T., Totoki, Y., Takagi, T., Nakai, K.
    • Journal Title

      DNA Res. 8

      Pages: 97-106

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Differential display analysis of mutants for the transcription factor pdrlp regulating multidrug resistance in the budding yeast2001

    • Author(s)
      Miura, F., Yada, T., Nakai, K., Sakaki, Y., Ito., T.
    • Journal Title

      FEBS Letters 505

      Pages: 103-108

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Differential display analysis of mutants for the transcription factor pdr1p regulating multidrug resistance in the budding yeast2001

    • Author(s)
      Miura, F., Yada, T., Nakai, K., Sakaki, Y., Ito., T.
    • Journal Title

      FEBS Letters 505

      Pages: 103-108

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] The DNA sequence of human chromosome 212000

    • Author(s)
      The Chromosome 21 Mapping and Sequencing Consortium
    • Journal Title

      Nature 405

      Pages: 311-319

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] The DNA sequence of human chromosome 212000

    • Author(s)
      The Chromosome 21 Mapping, Sequencing Consortium
    • Journal Title

      Nature 405

      Pages: 311-319

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Multiple sequence alignment

    • Author(s)
      Osamu Gotoh, Shinsuke Yamada, Tetsushi Yada
    • Journal Title

      The Handbook on Computational Molecular Biology (in press)

    • Related Report
      2004 Annual Research Report
  • [Journal Article] Selective integration of multiple biological data for supervised network inference

    • Author(s)
      Tsuyoshi Kato, Koji Tsuda, Kiyoshi Asai
    • Journal Title

      Bioinformatics (in press)

    • Related Report
      2004 Annual Research Report
  • [Journal Article] Accurate extraction of functional associations between proteins based on common interaction partners and common domains

    • Author(s)
      Kinya Okada, Shig_ehiko Kanaya, Kiyoshi Asai
    • Journal Title

      Bioinformatics (in press)

    • Related Report
      2004 Annual Research Report
  • [Book] 統計科学のフロンティア9生物配列の統計 : 核酸・タンパクから情報を読む2003

    • Author(s)
      岸野洋久, 浅井潔
    • Total Pages
      264
    • Publisher
      岩波書店
    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Publications] K.Ohshima, M.Hattori, T.Yada, T.Gojobori, Y.Sakaki, N.Okada: "Whole-genome screening indicates a possible burst of formation of processed pseudogenes and Alu repeats by particular Li subfamilies in ancestral primates"Genome Biol. 4. R74 (2003)

    • Related Report
      2003 Annual Research Report
  • [Publications] T.Ota, Y.Suzuki, T.Nishikawa, et al.: "Complete sequencing and characterization of 21,243 full-length humancDNAs"Nat.Genet.. 36. 40-45 (2004)

    • Related Report
      2003 Annual Research Report
  • [Publications] K.Tsuda, S.Uda, T.Kin, K.Asai: "Minimizing the Cross Validation Error to Mix Kernel Matrices of Heterogeneous Biological Data"Neural Processing Letters. 19. 63-72 (2004)

    • Related Report
      2003 Annual Research Report
  • [Publications] T.Kato, K.Tsuda, K.Tomii, K Asai: "Maximum likelihood superposition of protein structures"Genome Informatics. 14. 488-489 (2003)

    • Related Report
      2003 Annual Research Report
  • [Publications] M.Suwa, T.Sato, I.Okouch, M.Arita, S.Matsumoto, S.Tsutsumi, H.Aburatani, K.Asai, Y.Akiyama: "SEVENS : The Comprehensive Collection of Seven Transmembrane Helix Receptors, hunted from Human genome"Nucreic Acid Research. 31. 1 Online summary paper (2003)

    • Related Report
      2003 Annual Research Report
  • [Publications] K.Tsuda, S.Akaho, K.Asai: "The em algorithm for kernel matrix completion with auxiliary data"Journal of Machine Learning Research. 4. 67-81 (2003)

    • Related Report
      2003 Annual Research Report
  • [Publications] T.Yada, Y.Totoki, Y.Ykaeda, Y.Sakaki, T.Takagi: "DIGIT : a novel gene finding program by combining gene-finders"Proc. of Pacific Sympo. on Biocomputing '03. 375-387 (2003)

    • Related Report
      2002 Annual Research Report
  • [Publications] H.Noguchi, T.Yada, Y.Sakaki: "A novel index which precisely derives protein coding regions from cross-species genome aligments"Proc. of Genome Informatics Workshop 2002. 183-191 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] T.Kin, K.Tsuda, K.Asai: "Marginalized Kernels for RNA Sequence Data Analysis"Proc. of Genome Informatics Workshop 2002. 112-122 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] M.Arita, K.Tsuda, K.Asai: "Modeling Splicing Sites with Pairwise Correlations"Bioinformatics. 18. 27S-34S (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] K.Tsuda, T.Kin, K.Asai: "Marginalized kernels for biological sequences"Bioinformatics. 18. 268S-275S (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] Y.Watanabe, A.Fujiyama, Y.Ichiba, M.Hattori, T.Yada et al.: "Chromosome-wide assessment of replication timing for human chromosomes 11q and 21q : disease-relate genes in timingswitch regions"Human Molecular Genetics. 11. 13-21 (2002)

    • Related Report
      2001 Annual Research Report
  • [Publications] T.Yada, Y.Totoki, T.Takagi, K.Nakai: "A novel bacterial gene-finding system with improved accuracy in locating start codons"DNA Res.. 8. 97-106 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] F.Miura, T.Yada, K.Nakai, Y.Sakaki, T.Ito: "Differential display analysis of mutants for the transcription factor pdr1p regulating multidrug resistance in the budding yeast"FEBS Letters. 505. 103-108 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] 浅井潔, 上野豊: "遺伝子発見のための統合ソフトウェア"蛋白質核酸酵素. Vol.46 No.16. 2510-2514 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] 浅井潔: "確率モデルと遺伝子構造"数理科学. 8月号. 8-15 (2001)

    • Related Report
      2001 Annual Research Report

URL: 

Published: 2001-04-01   Modified: 2018-03-28  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi