2015 年度実績報告書

マルチオミックス解析による遺伝子発現制御領域内のがん化を導く変異の予測

研究課題

研究課題/領域番号	15F15385
研究機関	東京医科歯科大学
研究代表者	角田達彦東京医科歯科大学, 難治疾患研究所, 教授 (10273468)
研究分担者	LOPEZ ALVAREZ YOSVANY 東京医科歯科大学, 難治疾患研究所, 外国人特別研究員
研究期間 (年度)	2015-11-09 – 2018-03-31
キーワード	mutations / transcription factors / promoter regions / liver cancer
研究実績の概要	We began working with a database of hepatocellular carcinoma which comprises about 300 sequenced genomes of Japanese individuals. 20 patients with read pairs of their genomes aligned to the human genome (hg19) by the Burrows-Wheeler Aligner (BWA) algorithm were initially regarded. These mappings were processed to remove PCR duplicates with in-house scripts. The pipeline also discarded reads that were not uniquely mapped to the genome and whose mapping distances were extremely far from the mean distance. The remaining reads were then converted to pileup format by samtools. Further, we designed a script to extract point mutations between cancer/control samples of individuals. Consequently, mutations within regulatory regions of genes were regarded. The coordinates of human genes were downloaded from the GENCODE (GRCh37) repository and regions 2kbp upstream and downstream of the transcription start site (TSS) were considered to be the promoters. Additionally, RNA-seq data of the same patients were mapped to the human genome with bowtie2 and the levels of gene expression were measured with cufflinks. After focusing on regulatory factors heavily mutated in liver cancer we realized there was not a comprehensive catalogue summarizing these factors. In this case, we decided to conduct this survey for the scientific community. For this, we downloaded those human factors from three publicly available databases - TRANSFAC, JASPAR and UniProbe. The binding locations of these factors to the human promoters were determined and are being combined with mutation data from liver cancer.
現在までの達成度 (区分)	現在までの達成度 (区分) 1: 当初の計画以上に進展している理由 Based on the initial proposal, we have been able to analyze individual genomes of liver-cancer patients and detect point mutations within promoter regions. In doing so, we mastered how to analyze next generation data and combine them with different omics data for predicting liver-cancer phenotypes. Despite a computational model has yet to be designed an additional step to reach the goal is currently being taken. In this period, secondary objectives have been pursued to elaborate a list of heavily mutated factors having important roles in liver cancer development. At present, we are working on a survey that would provide the scientific community with a collection of transcription factors whose mutated binding locations might be interesting to look at. To do this, a large set of sequence motifs (computational representation of regulatory factors) was produced and the respective binding sites in human promoters were detected. Such binding sequences are currently being combined with mutation data from liver cancer disease.
今後の研究の推進方策	Future research work will continue to focus on deciphering non-coding somatic mutations related to liver cancer. A computational model that combine different kinds of omics data such as RNA-seq, DNase-seq for predicting the phenotype of a patient is intended to be developed. Promoter regions have been recently reported to contain a significant number of somatic mutations, which specifically affect the binding mechanism of regulatory factors and hence the expression of the downstream genes. In the coming months, a survey of heavily mutated transcription factors will be completed. This outcome is expected to be made available to the scientific community and would shed light on factors to be considered when analyzing the transcription of genes expressed in liver cancer cells. Such report would also allow us to readily assess differences in gene expression by only focusing on the promoter regions these proteins bind to. To accomplish the above objectives, mutations from the catalogue of somatic mutations in cancer (COSMIC) will be integrated with transcription factor variation to pinpoint exact binding positions changing as a result of cancer disease. In addition, mutations of these transcription factor binding sites will be combined with gene expression data to put forward a model capable of inferring cancer phenotypes. Linear regression models will also be used to associate the expression of genes to somatic mutations occurring within binding sequences in their promoters.

研究成果

(1件)

すべて雑誌論文 (1件) (うち査読あり 1件、オープンアクセス 1件、謝辞記載あり 1件)

[雑誌論文] HitPredict version 4 - comprehensive reliablity scoring of physical protein-protein interactions from more than 100 species2015
- 著者名/発表者名
  Yosvany Lopez, Kenta Nakai, Ashwini Patil
- 雑誌名
  
  Database
  
  巻: bav117 ページ: 1-10
- DOI
  10.1093/database/bav117
- 査読あり / オープンアクセス / 謝辞記載あり