2020 Fiscal Year Research-status Report
Comprehensive optimization of cell type-specific gene co-expression networks and construction of a cell type-specific co-expression database
Project/Area Number |
20K06609
|
Research Institution | Kyoto University |
Principal Investigator |
VANDENBON ALEXIS 京都大学, ウイルス・再生医科学研究所, 講師 (60570140)
|
Project Period (FY) |
2020-04-01 – 2023-03-31
|
Keywords | gene expression / gene co-expression / data normalization / database / batch effects / RNA-seq / network analysis |
Outline of Annual Research Achievements |
This year, I focused on the comprehensive analysis of different workflows for processing RNA-seq data, with the goal of identifying the workflow that results in the highest quality gene co-expression networks. First, I prepared a suitable collection of 8,796 human and 12,114 mouse RNA-seq samples, obtained from 68 human and 76 mouse cell types and tissues, and applied different RNA-seq normalization methods and different batch effect correction methods. I confirmed the biological validity of the resulting datasets. Next, in the resulting datasets, I estimated gene co-expression using Pearson and Spearman correlation. In total, 50 different workflows were used, resulting in 7,200 genome-wide gene co-expression networks. I developed an objective quality measure for gene co-expression networks, and used this quality measure to find the optimal workflow for obtaining the best networks. I conducted a detailed statistical analysis of the 7,200 networks and their quality. In general, I found that upper quartile normalization, followed by batch effect correction using a method called ComBat, and using Pearson correlation resulted in the best overall co-expression networks. I confirmed the importance of large sample counts for improving the quality of networks, but also found the choosing a suitable normalization and batch correction method can improve the quality, equivalent to a >70% and >40% increase in sample counts. Finally, I started the construction of a public gene co-expression database, and the implementation of tools for analyzing the networks.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
This year, I completed most of the goals for FY2020, including 1) collection and quality assessment of a large number of RNA-seq samples, and 2) comprehensive evaluation of data normalization, batch effect removal, and correlation measures on the quality of gene co-expression networks. I completed an evaluation of 50 different data processing workflows, generating 7,200 genome-wide gene co-expression networks, compared their quality, and summarized the results in a manuscript (Vandenbon, bioRxiv, 2021; currently under review for a peer reviewed journal). Therefore, this comprehensive evaluation has advanced more than planned. On the other hand, I did not yet expand the dataset to include additional species and additional tissues, as originally planned.
I also started implementing some of the goals of FY2021, such as the development of a public gene co-expression database on a development workstation. I also implemented analysis tools to allow users to analyze the gene co-expression networks, and applied them on a few example datasets.
|
Strategy for Future Research Activity |
In FY2021, I will continue the project as planned, focusing on 4 topics: 1) Using the optimized gene co-expression networks for human and mouse tissues and cell types, I will perform a genome-wide analysis of similarities and differences in the networks. I will pay attention to universally shared network patterns across all cell types, shared network patterns in the networks of similar cell types, and cell type-specific network patterns. These three types of network motifs can reveal valuable insights into cell type-specific gene regulation and gene functions. 2) To make these gene co-expression networks accessible to the scientific community, I will continue working on and complete a public gene co-expression network database. 3) I will implement analysis tools for the database, allowing the scientific community to easily analyze its content. Users will be able to look up genes of interest in the networks of all cells, and conduct additional network analysis steps, such as looking up genes with similar expression patterns and predict functions. 4) Finally, I will expand the dataset to include additional organisms and cell types and tissues.
|
Research Products
(1 results)