研究実績の概要 |
Protein structures determine their functions. The ability to design proteins with a specified 3D structure will enable us to create proteins with new or improved functions. We propose to develop a new computational method for protein design that uses triangle inequality to quickly identify small protein structural fragments and assembles them by the estimation of distribution algorithm. This new method will allow protein backbone flexibility, which is much needed but not available in current approaches. This new protein design method will be first retrospectively assessed by a test dataset of diverse protein folds. It will be subsequently validated by the design of a novel protein prospectively. This new protein design method will have tremendous potential for the development of new biosensors, therapeutics and diagnostics.
|
現在までの達成度 (区分) |
現在までの達成度 (区分)
2: おおむね順調に進展している
理由
Protein modeling and design activities often require querying the Protein Data Bank (PDB) with a structural fragment, possibly containing gaps. For some applications, it is preferable to work on a specific subset of the PDB or with unpublished structures. These requirements, along with specific user needs, motivated the creation of a new software to manage and query 3D protein fragments. We have developed FRAGGER, which is a protein fragment picker that allows protein fragment databases to be created and queried. All fragment lengths are supported and any set of PDB files can be used to create a database. FRAGGER can efficiently search a fragment database with a query fragment and a distance threshold. Matching fragments are ranked by distance to the query. The query fragment can have structural gaps and the allowed amino acid sequences matching a query can be constrained via a regular expression of one-letter amino acid codes. FRAGGER also incorporates a tool to compute the backbone RMSD of one versus many fragments in high-throughput. FRAGGER should be useful for protein design, loop grafting and related structural bioinformatics tasks.
|
今後の研究の推進方策 |
The core of the project is the implementation of the computational design method, SHADES. Our plan is to start from a protein backbone structure, overlapping 9-residue fragments will be created. These fragments will be used as queries to identify fragments in the PDB that share similar structures using our FRAGGER tool. The sequences of these fragments will be compiled into a fragment library. Then, our software will construct full-length protein sequences by assembling the short sequences together. The side chains of these residues will be generated and packed using the rotamer library and evaluated with Rosetta empirical energy. The backbone will be relaxed by energy minimization. We will generate ~50,000 sequences in the first iteration using a uniform distribution to sample the fragments in the library. Then, the top 10% of the lowest energy sequences will be used to estimate the distribution of each fragment for its occurrences in those lowest energy sequences. This is achieved by using the Estimation of Distributions Algorithm. In the next iteration, the estimated distribution will be used to sample the fragments in the library for the assembly of new sequences to generate another ~50,000 sequences. Altogether, four iterations will be performed to generate a total of ~200,000 sequences and from which the best sequences with the lowest energy will be selected as the designed sequence for a given structure.
|