Genome-wide DNA and protein conformational dynamics: sequence-based prediction and tissue-specific profiling
Project/Area Number |
15K00419
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Research Field |
Life / Health / Medical informatics
|
Research Institution | National Institutes of Biomedical Innovation, Health and Nutrition |
Principal Investigator |
シャンダー アハマド 国立研究開発法人医薬基盤・健康・栄養研究所, その他部局等, 研究員 (80463298)
|
Project Period (FY) |
2015-04-01 – 2016-03-31
|
Project Status |
Discontinued (Fiscal Year 2015)
|
Budget Amount *help |
¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)
Fiscal Year 2017: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Fiscal Year 2016: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Fiscal Year 2015: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
|
Keywords | DNA structure / DNA dynamics / Transcription / Protein-DNA interactions / Machine learning |
Outline of Annual Research Achievements |
All the three objectives outlined in the project proposal have been achieved. First of all, we developed conformational ensembles of DNA sequences. Molecular dynamics (MD) snapshots data were used for this purpose. There are 136 possible tetranucleotide sequences and all of them had four flanking bases on either terminal leading to 136 unique 12-mers.We developed a support vector machine (SVM) based model, which takes a 5-mer sequence as input and returns predicted conformational ensemble populations for 12 conformational parameters in 5 bins each. Various benchmarks confirmed high accuracy of this method. We applied the newly developed tool to predict DNA conformational dynamics of the whole mouse and human genomes. Using genome-wide predicted values, we studied binding sites and their flanking regions in more than 1000 transcription factors in Emrbyonic Stem (ES) cells and one TF (STAT3) in four different cell types and in greater details. We showed that binding site flanking regions as far as 200 bases from the binding motif center carry significant conformational biases, which can distinguish binding-site flanking regions from rest of the genome. Separately, we also developed a method to predict DNA-binding proteins by using gene expression and sequence information together. We found that gene expression profiles and their global co-expression patterns can be useful in identifying proteins with week DNA-binding signals at the sequence level. Some of the results in this project are available via bioarxiv while others are being prepared for publication.
|
Report
(1 results)
Research Products
(1 results)