Research Abstract |
Through overall comparison among the vertebrate transcription factors, I found extraordinary contents of alanine (A), glycine (G), and proline (P) residues in the orthologues of particular lineage. The codons for these amino acids are GC-rich (GCN. GGN, and CCN, respectively). To the contrary, content of arginine, the remaining amino acid encoded by GC-rich codons, was nearly equal to each other among the vertebrate orthologues. I found a significant correlation between the GC content at the third codon position of the entire region and the total contents of alanine, glycine, and proline residues (AGP content). A similar situation holds for other transcription factors of vertebrates: a wide variation of AGP content and a nearly equal arginine content. There was a clear positive correlation, regardless of functional and structural constraints inherent to each protein. No arginine-rich region was found in any transcription factors examined. Present results provide a general picture for protein structure and its evolution : amino acid compositions are under profound influence of nucleotide compositional constraints onto genome DNAs harboring coding sequences. As a result, the ratio of alanine, glycine, and proline residues linearly correlates with the degree of nucleotide compositional constraints increasing the GC contents, and changes in nucleotide compositional constraints have caused concomitant alterations in amino acid compositions through evolution. Alanine-, glycine-, and proline-rich sequences are identified as transcriptional activation domains of transcription factors. Moreover, a transcription factor artificially fused with homopolymeric proline repeats significantly modulates its transcriptional activation. It is, therefore, shown that enrichment of alanine, glycine, and proline residues in transcription factors caused by GC pressure should make profound influence on diversification of gene regulation mechanisms in mammals.
|