2019 Fiscal Year Annual Research Report
Development of Protein Design Methods and Their Experimental Verification
Project/Area Number |
18H02395
|
Research Institution | Institute of Physical and Chemical Research |
Principal Investigator |
ZHANG KAM 国立研究開発法人理化学研究所, 生命機能科学研究センター, チームリーダー (60558906)
|
Project Period (FY) |
2018-04-01 – 2021-03-31
|
Keywords | Protein design / Computation |
Outline of Annual Research Achievements |
Proteins are one of the most important component of living organisms. They carry out a multitude of functions to control a myriad of pathways. The malfunctioning of these proteins is the major cause of various diseases. The intricate function of proteins is determined by their equally sophisticated 3D structures. The ability to design proteins with a specified structure and thereby conferring it with a desired function would have tremendous impact on our ability to develop new therapeutics, diagnostics and biosensors. Our objective is to develop a novel computational method for the de novo design of proteins. Structure-based Computational Protein design (CPD) plays a critical role in advancing the field of protein engineering. Using an all-atom energy function, CPD tries to identify amino acid sequences that fold into a target structure and ultimately perform a desired function. Energy functions remain however imperfect and injecting relevant information from known structures in the design process should lead to improved designs. We propose to use a library of naturally occurring sequence segments that are known to fold into a given structural fragments to dramatically reduce the sequence space that has to be searched. We then use an evolutionary approach such as the estimation of distributions algorithm to efficiently search the sequence space by learning from previous populations.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
We have developed a CPD with backbone flexibility called SHADES, a data-driven method that exploits local structural environments in known protein structures together with energy to guide sequence design. SHADES is based on customized libraries of non-contiguous in-contact amino acid residue motifs. We have tested SHADES on a public benchmark of 40 proteins selected from different protein families. When excluding homologous proteins, SHADES achieved a protein sequence recovery of 30% and a protein sequence similarity of 46% on average, compared with the PFAM protein family of the target protein. When homologous structures were added, the wildtype sequence recovery rate achieved 93%. WD40 proteins are a subfamily of propeller proteins, with a pseudo-symmetrical fold made up of subdomains called blades. By computationally reverse-engineering the duplication, fusion and diversification events in the evolutionary history of a WD40 protein, a perfectly symmetrical homolog called Tako8 was made. We have used SHADES to redesign Tako8 to create Ika8, a four-fold symmetrical protein in which neighbouring blades carry compensating charges. Ika2 and Ika4, carrying two or four blades per subunit, respectively, were found to assemble spontaneously into a complete eight-bladed ring in solution. These artificial eight-bladed rings may find applications in bionanotechnology and as models to study the folding and evolution of WD40 proteins.
|
Strategy for Future Research Activity |
RNA polymerases are ancient proteins found in all kingdoms of life and have a large complex structure consisting of multiple domains. It remains to be demystified how these large complex proteins evolved presumably from much simpler primitive proteins. We hypothesize that the double-psi beta-barrel (DPBB) domain at the core of RNA polymerase is the ancestor of modern RNA polymerases. We plan to use SHADES to design a standalone stably folded and functional DPBB protein that is capable of Mg2+ binding, which is required for the enzymatic activity of RNA polymerases. To further reduce the complexity, we plan to design a two-fold symmetric protein of the DPBB fold, which consists of a duplicated half DPBB sequence. This design will provide support for the notion that modern day complex proteins could have been evolved from the gene duplication, fusion and diversification events from ancestral simpler proteins. We also plan to further improve our scoring function by considering not only the sequence preference but also taking into account the dihedral angle distribution in known structures. This improved scoring function will further enhance our ability to design proteins with novel folds and novel functions.
|