2015 Fiscal Year Annual Research Report
マルチオミックス解析による遺伝子発現制御領域内のがん化を導く変異の予測
Project/Area Number |
15F15385
|
Research Institution | Tokyo Medical and Dental University |
Principal Investigator |
角田 達彦 東京医科歯科大学, 難治疾患研究所, 教授 (10273468)
|
Co-Investigator(Kenkyū-buntansha) |
LOPEZ ALVAREZ YOSVANY 東京医科歯科大学, 難治疾患研究所, 外国人特別研究員
|
Project Period (FY) |
2015-11-09 – 2018-03-31
|
Keywords | mutations / transcription factors / promoter regions / liver cancer |
Outline of Annual Research Achievements |
We began working with a database of hepatocellular carcinoma which comprises about 300 sequenced genomes of Japanese individuals. 20 patients with read pairs of their genomes aligned to the human genome (hg19) by the Burrows-Wheeler Aligner (BWA) algorithm were initially regarded. These mappings were processed to remove PCR duplicates with in-house scripts. The pipeline also discarded reads that were not uniquely mapped to the genome and whose mapping distances were extremely far from the mean distance. The remaining reads were then converted to pileup format by samtools. Further, we designed a script to extract point mutations between cancer/control samples of individuals. Consequently, mutations within regulatory regions of genes were regarded. The coordinates of human genes were downloaded from the GENCODE (GRCh37) repository and regions 2kbp upstream and downstream of the transcription start site (TSS) were considered to be the promoters. Additionally, RNA-seq data of the same patients were mapped to the human genome with bowtie2 and the levels of gene expression were measured with cufflinks. After focusing on regulatory factors heavily mutated in liver cancer we realized there was not a comprehensive catalogue summarizing these factors. In this case, we decided to conduct this survey for the scientific community. For this, we downloaded those human factors from three publicly available databases - TRANSFAC, JASPAR and UniProbe. The binding locations of these factors to the human promoters were determined and are being combined with mutation data from liver cancer.
|
Current Status of Research Progress |
Current Status of Research Progress
1: Research has progressed more than it was originally planned.
Reason
Based on the initial proposal, we have been able to analyze individual genomes of liver-cancer patients and detect point mutations within promoter regions. In doing so, we mastered how to analyze next generation data and combine them with different omics data for predicting liver-cancer phenotypes. Despite a computational model has yet to be designed an additional step to reach the goal is currently being taken. In this period, secondary objectives have been pursued to elaborate a list of heavily mutated factors having important roles in liver cancer development. At present, we are working on a survey that would provide the scientific community with a collection of transcription factors whose mutated binding locations might be interesting to look at. To do this, a large set of sequence motifs (computational representation of regulatory factors) was produced and the respective binding sites in human promoters were detected. Such binding sequences are currently being combined with mutation data from liver cancer disease.
|
Strategy for Future Research Activity |
Future research work will continue to focus on deciphering non-coding somatic mutations related to liver cancer. A computational model that combine different kinds of omics data such as RNA-seq, DNase-seq for predicting the phenotype of a patient is intended to be developed. Promoter regions have been recently reported to contain a significant number of somatic mutations, which specifically affect the binding mechanism of regulatory factors and hence the expression of the downstream genes. In the coming months, a survey of heavily mutated transcription factors will be completed. This outcome is expected to be made available to the scientific community and would shed light on factors to be considered when analyzing the transcription of genes expressed in liver cancer cells. Such report would also allow us to readily assess differences in gene expression by only focusing on the promoter regions these proteins bind to. To accomplish the above objectives, mutations from the catalogue of somatic mutations in cancer (COSMIC) will be integrated with transcription factor variation to pinpoint exact binding positions changing as a result of cancer disease. In addition, mutations of these transcription factor binding sites will be combined with gene expression data to put forward a model capable of inferring cancer phenotypes. Linear regression models will also be used to associate the expression of genes to somatic mutations occurring within binding sequences in their promoters.
|
Research Products
(1 results)