What is the relationship between exons and protein domains

Protein domains and features

Protein domains plotted against the exon structure. to the domains in Interpro or view a karyotype highlighting the locations of all transcripts with the domain. If you know the sequence for your gene (whole genome), paste it on GeneMark to predict all exons (if you already have the exon information. We chose a general approach and examined the relationship between exon borders and protein domain borders in various species to evaluate the extent of.

In other words, one exon should code for a single protein domain. One argument, therefore, points to the fact that there is a statistically significant correlation between exon boundaries and protein domains e. However, there are many, many examples where this correspondence does not hold. In many cases, single exons code for multiple domains.

For instance, protocadhedrin genes typically involve large exons coding for multiple domains Wu and Maniatis, In other cases, multiple exons are required to specify a single domain e. A further argument for the role of exon shuffling in protein evolution is the intron phase distributions found in the exons coding for protein domains in humans for the significance of this, see my previous article. These are typically associated with signs indicative of its mode of origin. Perhaps common ancestry is the cause, but this must be demonstrated and not assumed.

It is to this question that I now turn. The Problems with Domain Shuffling as an Explanation for Protein Folds While the hypothesis of exon shuffling does, taken at face value, have some attractive elements, it suffers from a number of problems. For one thing, the model at its core presupposes the prior existence of protein domains.

  Exon Shuffling: Evaluating the Evidence
In other words, the domain level is the lowest level at which self-contained stable structural modules exist. This leaves the origins of these domains in the first place unaccounted for. But stable and functional protein domains are demonstrably rare within amino-acid sequence space e. Axe, ; Axe, ; Taylor et al.

A fairly recent study examined many different combinations of E.

What is the relationship between exons and protein domain?

The researchers screened variants for features that might suggest folded structure. They failed, however, to find any folded protein structures. Reporting on this study, Axe writes: This contrasts with native-like structure, where secondary structure is locked-in to form a well defined and stable tertiary fold.

High-quality domain border annotations can be obtained by analyzing protein sequences with the domain definitions from the Pfam database, which is built from conserved protein sequences in a wide spectrum of species. Using this approach, we have recently demonstrated that correlation between the borders of protein domains and their encoding exons is a genome-wide phenomenon in multiple eukaryotic organisms Liu and Grigoriev, Further, we have shown that exon-bordering domains probably contributed more to the expansion and diversification of proteomes than other domains as a result of duplications and exon shuffling, as they preferentially expanded into more genes than other domains during evolution Liu et al.

In this study, we consider two main corollaries of this exon—domain correlation: Graph representation of such networks abstracts them as nodes connected by edges e. Network properties are mainly analyzed from the prospective of their node connectivity or degree distribution.

Thus there are many poorly connected nodes and very few hubs in such networks. Earlier, we have shown Liu et al.

what is the relationship between exons and protein domain?

Exon-bordering domains also co-occur with a larger number of different domains to form mosaic proteins with diverse domain architectures. This property suggests that exon-bordering domains should be found among the highly connected hubs and that the evolution of domain networks at least in terms of degree distribution is likely to be largely driven by the evolution of exon-bordering domains and their propagation into genes via exon shuffling and duplication mechanisms.

Indeed, many properties of the network of co-occurring protein domains, where each domains in human is a node and an edge represents co-occurrence of two domains not necessarily adjacent in one protein, are similar to other biological networks described. We found that this undirected network is also scale-free [data not shown, this result is analogous to already published reports Apic et al.

Thus, most of the domain pairs can be found only in one protein per pair. Such proteins, however, are often domain-rich.

We also calculated the expected distribution of co-occurring pairs by modeling domain co-occurrence as a Bernoulli process where a pair frequency would be proportional to the product of frequencies of individual domains, derived in this case from the number of proteins containing a domain, rather than domain numbers.

Similar findings obtained by other methods have been very recently published for the domain families in SCOP Vogel et al. As a group, exon-bordering domains show a much higher connectivity Fig.

Protein Domains And Their Corespond Exons

We analyzed the level of network fragmentation after the removal of mobile domains by calculating the number of components, or remaining connected subgraphs, and average degree. We also estimated the distributions of these parameters for networks obtained from the network we studied by removal of the corresponding number of random nodes.

Removal of mobile domains results in substantial fragmentation of the network and a drop in the average degree, significantly different from random node removals Fig.

Thus, mobile domains appear to be the major determinants of the network topology and evolution.

For computational prediction of protein domains in human proteins retrieved from the Ensembl Birney et al. We collected statistics for only one multi-exon transcript per gene whose protein translation had at least one domain. Remarkably, when we detected correlation of the borders of protein domains with encoding exons, it was nearly always positive, i. However, there was one notable exception: This was rather surprising since the immunoglobulin domain was considered to be mobile and its bounding introns to have phase 1—1, which is the characteristic of mobile domains Kolkman and Stemmer, Upon further investigation, we noticed that the Pfam definition of Ig domain was actually 8—20 amino acids shorter than its counterpart domain definition from the SMART database.

Owing to this reason, the amino acid positions immediately outside the Ig domain border boxes as defined by Pfam were actually right inside the domain border boxes as defined by SMART. This indicates a preference for the SMART domain definition because we consistently observed lower numbers of exon borders inside Pfam-specific Ig domain border boxes Fig.

When we switched to using SMART domain definition for Ig domains, we discovered that the two most prevalent Ig-related domains in SMART, IGc1 and IG, were ranked 2 and 7, respectively, out of all human mobile domains, with both having positive correlation with exons in contrast to the results obtained from Pfam's Ig domain definition.