Comprehensive optimization of cell type-specific gene co-expression networks and construction of a cell type-specific co-expression database
Project/Area Number |
20K06609
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Review Section |
Basic Section 43060:System genome science-related
|
Research Institution | Kyoto University |
Principal Investigator |
|
Project Period (FY) |
2020-04-01 – 2023-03-31
|
Project Status |
Completed (Fiscal Year 2022)
|
Budget Amount *help |
¥3,770,000 (Direct Cost: ¥2,900,000、Indirect Cost: ¥870,000)
Fiscal Year 2022: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Fiscal Year 2021: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Fiscal Year 2020: ¥2,210,000 (Direct Cost: ¥1,700,000、Indirect Cost: ¥510,000)
|
Keywords | bioinformatics / gene expression / gene co-expression / data normalization / batch effect correction / database / batch effects / RNA-seq / network analysis |
Outline of Research at the Start |
Understanding gene regulation is one of the key questions in biology. The computational prediction of regulatory interactions is an attractive approach, but accuracy is low, even in simple eukaryotes. In this project, we will conduct a comprehensive evaluation of gene expression data normalization, batch effect correction, correlation measures, and downstream network processing steps and their effect on the quality of co-expression networks, in many human and mouse cell types. Results will be made public in a database. This project will lead to better predictions of gene regulatory mechanisms.
|
Outline of Final Research Achievements |
We used a large collection of RNA-seq data samples covering 68 human and 76 mouse cell types and tissues to conduct a comprehensive evaluation of which data processing workflow results in the highest quality gene co-expression networks. Our results indicate that it is important to collect as many RNA-seq samples as possible. Second, researchers should use using Upper Quartile normalization and correct batch effects. Finally, in general Pearson’s correlation should be used, but in small datasets Spearman’s rank correlation might be preferable. We confirmed that using the optimized processing workflow, we obtained a high-quality gene expression dataset which can be used as a reference. We provided two illustrations of the use of our dataset as a reference to support other bioinformatics analyses. Finally, we are preparing a freely accessible gene co-expression database, which will allow users to inspect gene expression and co-expression in many human and mouse tissues and cell types.
|
Academic Significance and Societal Importance of the Research Achievements |
Gene co-expression is widely used for the prediction of gene functions and regulatory mechanisms. We here showed how gene expression data can be processed to obtain high-quality co-expression values. This will contribute to improved bioinformatics analyses and new insights into gene regulation.
|
Report
(4 results)
Research Products
(5 results)