Genome signatures (local deviations from the majority distribution) - evolutionary markers for new orders?

[See ref 4, ]

Based on the above background of the universal strand symmetry, the almost universal majority profile, and the role of inversions/transpositions in the creation of these universals, we are now in the position to examine the deviations of a genome from them. To be sure, there should be deviations regardless of how universal these properties are, if only because the universals are the result of an asymptotic process which should leave residual areas of not yet perfect expression.

As will be shown below, there are, indeed, many genome segments that deviate considerably from these universals, and especially from the majority profile. Most surprising, however, is the demonstration that countless major deviations were created de novo over and over again during the course of evolution. Obviously, these deviations must be the result of an unknown mechanism that acts as some kind of antagonist of the actions of inversions/transpositions which continue to erase such deviations.

Signatures: Local deviations from a genome universal.

If evaluated in their entirety, most genomes expressed the strand symmetry and the similarity of their triplet profile with the majority profile to an amazingly high degree of perfection. However, not every part of every genome expressed it to the same degree. Using a small probe segment to scan continuously along a genome and testing its local deviation from these global genome properties yielded quite characteristic patterns. They will be called 'signatures'.

The preceding sections argued that numerous inversions/transpositions in the evolutionary past of every genome have improved asymptotically its strand symmetry and its similarity to the majority profile. Therefore, one may be tempted to make some simple predictions about both kinds of genome signatures. For example, one might expect that they are the more leveled out the older the species, as its genome had more time to approach its end state. Alternatively, recognizing that the genomes of all contemporary species are actually equally old, one may expect that all signatures show only minute deviations from their end states. After all, a billion years or more of evolution should suffice to reach this end state quantitatively. Yet, none of these expectations were substantiated by the analysis reported here.

Fig.1. Illustration of genome profile signatures(= local deviation from the majority distribution determined in a probe segment of 100 kb advancing in steps of 10 kb along the genome sequence starting at the 3' end) in the case of the first 145 Mb of human chr 1. An asterix indicates repetitive DNA that was masked by N's or n's in the genome sequence. (Abscissa: position along genome [Mb]; ordinate: fraction of deviation every 10 kb).
a. Profile signature ( = local deviation from the majority profile).
b. Local triplet profile in the area of a 'burst' (left 2 arrows), which is very different from the majority profile.
c. Local triplet profile in a 'flat' area (right 2 arrows), which is identical to the majority profile.

Profile signatures

. Depending on the template used in the scanning procedure, there are strand symmetry signatures and profile signatures. The strand symmetry signatures showed very little activity. In contrast, the profile signatures revealed many complex features. In the following we will focus on them.

The scanning procedure used here consisted of selecting a probe segment of 100 kb and advancing it along the entire genome in steps of 10 kb. At every step the correlation coefficient between the triplet profile of the probe and the majority profile was measured. The procedure yielded a pattern that will be called the 'profile signature' of the genome. The present chapter suggests four properties of profile signatures that stand in stark contrast to the above mentioned expectations.
  • 1. The profile signatures of most species showed dramatic patterns of 'bursts' and 'flat regions' throughout the length of each chromosome (e.g. Fig.2).
  • 2. Descendent species seemed to inherit large parts of their profile signatures from their ancestors. Therefore, significant changes of profile signatures may be indicators of significant evolutionary change.
  • 3. Profile signatures were not simply leveled out during evolution as one might expect from previous chapters, but that frequently new ones were created. The present chapter argues that the times of new bursts may mark branching points of the phylogenetic tree.
  • 4. The new bursts seem to be the result of a universal mechanism that changes locally the AT contents. This conclusion is suggested by a universal 'sigmoid curve' that describes the correlation between the amplitude of profile signatures and the local AT contents of the genomes.
  • Conservation of profile signatures.

    The pattern of alternating bursts and flat portions appeared to be highly conserved for millions of years. An example is shown in Figure 2 for chromosomes 1 of humans, chimpanzees and rhesus monkey.

    It is apparent that the signatures of these primates were very similar to each other in spite of the large evolutionary time differences between them. A closer inspection showed that human chromosome 1 compared to chimpanzee and rhesus had additional stretches inserted in various places.

    In other cases local inversions are visible in the signatures. For example, the burst in rhesus chromosome 1 around position 160 Mb appears inverted compared to the corresponding bursts at location 180 Mb in chimpanzee chromosome 1 and 200 Mb in human chromosome 1 (Fig. 2). Since each curve shown in Figure 2 condenses more than 20,000 data points into the width of the graph, one may suspect that this similarity is confined to low resolution depictions. However, it is easy to show that the similarity extended to considerably higher levels of resolution.

    The similarities shown in Figures 2 result may not seem too surprising, considering the well-known high degree of sequence homology between genes of humans and chimpanzees. However, it should be noted, that profile signatures include also the entirety of the non-coding regions, in which sequence homologies are not as well established as in the coding regions.

    Fig.2. Example of the conservation of profile signatures. (Axes as in Fig.1a).
    Arrows between the profile signatures of chromosome 1 of three primates (human, chimpanzee and rhesus monkey) point to patterns of bursts with a high degree of similarity. Arrows point to corresponding groups of bursts. Note: Each of these signatures containing approximately 20,000 data points include coding and non-coding regions alike.

    Profile signatures as markers of chromosome rearrangements and synteny.

    The high degree of conservation should qualify profile signatures as markers of chromosome rearrangements. In order to test this, I selected several well-known cases of chromosome concatenation and inversion and examined how their profile signatures would reflect this fact.

    Fig.3. Profile signatures as markers of chromosome concatenation. (Axes as in Fig.1a).
    The arrows between profile signatures of human chromosome 2 and chimpanzee chromosome 12 and 13 suggest that the former is very similar to the concatenation of the 2 latter. This fact, well-known through the compilation of sequence homologies, can be much more rapidly established by inspection of profile signatures that also include non-coding regions. Arrows connect corresponding bursts.

    Figure 3 shows the case of the concatenation of chromosomes 12 and 13 of chimpanzee, which formed the human chromosome 2. The concatenation, known from synteny charts, is quite obvious in the profile signatures of the chromosomes. Likewise, a chromosome inversion such as a large portion of mouse chromosome 5 and rat chromosome 14 are quite easily recognized by their profile signatures (Fig. 4).

    Fig.4. Profile signatures as markers of chromosome inversion. (Axes as in Fig.1a).
    The arrows between the profile signatures of mouse chromosome 5 and rat chromosome 14 show that a part of the former is very similar to the inversion of the latter. Arrows connect corresponding bursts.

    In general, the profile signatures of rat and mouse revealed a high degree of conservation. In contrast, I found no general similarities between the profile signatures of these rodents and the mentioned primates except in the well-known area of synteny between human chr.1 on one hand and the inverted chr.4 and chr.5 of mouse and rat, respectively (Fig. 5). To date, the demonstration of such syntenic relationships between chromosomes is based on the rather time consuming compilation of sequence homologies. In contrast, it takes only minutes to establish the profile signature of a several hundred Mb large chromosome if its sequence is known. Therefore, profile signatures my offer a quick way to gain overview over potential candidates for synteny. Subsequently, the results can be confirmed by more targeted and detailed tests of sequence homologies. Note: Chromosomes with the same number do not necessarily have related profile signatures. For example, a case like the concatenation of 2 small chromosomes into a large one (e.g. Fig.3) changes the order of chromosome sizes and thus their numbers.

    Fig.5. Example of synteny expressed in profile signatures.
    Corresponding bursts of the profile signatures of human chr.1 (10 -129 Mb), the inverted mouse chr. 4 (154 - 58 Mb), and the inverted rat chr. 5 (171 - 75 Mb) are connected by arrows.

    Total fraction of bursts (TFB.)

    As mentioned earlier, the depictions of profile signatures is highly condensed in the above figures. Due to the relatively thick lines of the graphs the total fraction of a genome covered by bursts may appear much higher than it actually is. For a more accurate measurement of that fraction, I added up the cumulative size of the 10 kb large test steps where the profile signature of the entire genome of an organism dropped below 0.8 and divided it by the size of the total genome.

    In order to examine examples of members of different families, classes and phyla, the results were derived from the complete genome sequences of opossum, human, chimpanzee, rhesus, rat, mouse, platypus, arabidopsis, maize, sea squirt, zebrafish, fugu, C.elegans, and C. briggsae. Figure 6 shows that the TFB value of profile signatures was highly variable, and obviously unrelated to the evolutionary age of the corresponding organism. The only possible relationship between the data appeared to be a similarity between members of the same order.

    Fig.6. Total fraction of bursts (TFB).
    The figure shows the fraction of total genome length where the profile signature of the various organisms, indicated at the abscissa, dropped below 0.8. The data comprise the entire genomes of these organisms and not merely selected chromosomes.

    Profile signatures as the result of local AT changes (universal sigmoid curve)

    I also measured the local AT content of each probe as it scanned along the tested genomes. (One might call the result an "AT signature", but it seems unnecessary to introduce yet another new term.) A profile vs. AT-content correlation plot between amplitudes of bursts in the profile signature and the local AT contents yielded a surprisingly well-defined 'sigmoid curve'. Figure 7 shows the typical example of a burst of 10 Mb size at the beginning of human chromosome 7 (Fig.7a). The sigmoid curve indicated that the bursts were not as random as they seemed, but appeared linked to a single variable with little stochastic 'noise' (Fig.7b). Furthermore, the bursts did not create random nucleotide sequences, either.

    Fig.7. The sigmoid curve. (Abscissa: AT-content measured in probe segments of 100 kb that advanced in steps of 10 kb along a section of a genome; ordinate: deviation of the same probe segment from the majority profile.)
    (a). Profile signature of part of human chromosome 7 with a highlighted burst of 10 Mb size at its beginning.
    (b). Correlation plot between the amplitudes of the highlighted burst and the local AT contents yielding the typical shape of the sigmoid curve.

    The sigmoid curve turned out to be identical for different members of the same class. Figure 8 shows superimposed sigmoid curves of human chromosome 1, chimpanzee chromosome 1, mouse chromosome 2, rat chromosome 1, opossum chromosome 1 (0 - 320 Mb), and platypus (0 - 253 Mb). In spite of the large number of approximately 140,000 data points the the resulting superimposition remains quite well defined. Please note that platypus does not belong to the majority group and, yet, its data points fall on the same sigmoid curve as other mammals.
    In contrast, the sigmoid curves of members of other phyla and kingdoms could differ markedly in the slope and location of their inflection points.

    Relatively flat profile signatures of genomes or parts of genomes can cover only a small part of the sigmoid curve. Yet, these parts coincided quite well with the curve. Even flat profile signatures of genomes that did not belong to the majority group such as platypus and chlamydomonas coincided with the curve quite accurately. The latter genomes have an overall high GC content. Yet, their data points seemed to fall as much on parts of the same sigmoid curve as members of the majority group. It suggests that the sigmoid curve expresses more than merely a special property of genomes of the majority group.

    Another example of this larger than expected universality of the sigmoid curve resulted from the examination of 50 mitochondrial genomes.

    In summary, bursts have the following properties.
  • 1. They are local phenomena that involve stretches of 10 - 20 Mb length.
  • 2. They result in a local decrease of the AT content compared to the AT content of the genomes of the majority group.
  • 3. The reduction of the local AT content follows the universal sigmoid curve.
  • Fig.8. Degree of conservation and universality of the sigmoid curve. (Abscissa: AT-content measured in probe segments of 100 kb every 10 kb along a genome; Ordinate: Similarity between triplet profile and the majority profile of the same segments expressed as correlation coefficient between them)
    Superimposition of the sigmoid curves of human chromosome 1, chimpanzee chromosome 1, mouse chromosome 1, rat chromosome 2, opossum chromosome 1 (0 - 320 Mb), and platypus (0 - 253 Mb). In spite of the large number of 140,000 data points, the resulting superimposition remains quite well defined. Please note that platypus does not belong to the majority group and, yet, its data points fall on the same sigmoid curve as other mammals.

    An explanation of the sigmoid curve based on the spectra and mechanisms of point mutations.

    At present, the mechanism underlying the sigmoid curve is not known. However, computer simulations suggested that the effect of this unknown mechanism may not amount to much more than the replacement of A's and T's with the same number of C's and G's in random places of the local sequence. (see ref 4.)

    In view of the previous result concerning the spectra of vertebrate point mutations, it is not difficult to explain such a replacement mechanism (The data of ref 5 were not known to me at the time of the research described in ref 4. Therefore, ref. 4 offered a more complicated explanantion).

    As shown in ref.5, the [A->G] and [T->C] mutations are the most frequent and both can be explained by chemical conversions. Furthermore every [A->G] conversion amounts to a [T->C] conversion on the opposite strand and vice versa. Assuming strand symmetry of conversions, i.e. the likelihood of e.g. a [A->G] conversion is the same for both strands, the numbers of [A->G] and [T->C]-conversions will be equal on the same strand, as the computer simulation had required it.

    Significance for the "functional anarchy" of genomes:

    The appearance of new patterns of bursts during the course of evolution.

    Profile signatures appear to be markedly different for members of different kingdoms, phyla, and classes. Even members of the same class of mammalia such as platypus, rat, and chimpanzee have radically different profile signatures. In contrast, the profile signatures of members of the same order of primates such as humans, chimpanzee and rhesus ranged between 'similar' and 'practically identical'.

    Bursts do not violate the strand symmetry. Therefore, local inversions and inverted transpositions would have no effect on them. However, transpositions from a far away part of the genome that is free of bursts could conceivably erode them. Indeed, I had expected that the ongoing equalization of all profile signatures to be a monotonous function of evolutionary time. In other words, no evolutionary 'younger' species should have more bursts in its profile signatures than the genomes of an evolutionary 'older' ancestor genome that had more time to erase them.

    This expectation was obviously wrong. For example, the profile signatures of the phylogenetically 'older' opossum or even zebra fish were almost flat, whereas the profile signature of the 'younger' human showed 5 times as many bursts. Obviously, they must have been created de novo at some time after mammals branched away from the common ancestors of the zebra fish and other vertebrates. To be sure, these novel bursts were not drastic enough to remove the corresponding genomes from the majority class, but they clearly involved an identifiable and substantial portion of the 'new' genomes.

    Is the appearance of new patterns of bursts linked to the evolution of new orders?

    Considering that humans, chimpanzees and rhesus monkeys had very similar profile signatures, it stands to reason that their common ancestor, the 'first' primate had this or a very similar profile signature, as well. Likewise, one may speculate that the common profile signatures of rat and mouse were also shared by the 'first' rodent.

    Yet, obviously, the profile signatures of rodents were quite different from those of primates. Both, in turn, differed substantially from the profile signatures of the opossum and the platypus which belong to the yet different and much older orders of monotremes and marsupials within the class of mammals. It may seem, therefore, that the evolution of a pattern of new bursts in the profile signatures of certain ancestral genomes marks the beginning of new orders which, however, seems to cease subsequently while the members of this order evolve over very long evolutionary time spans.

    Of course, there is no evidence that the appearance of new patterns of bursts was causal in the origin of new orders. Still, the underlying local and massive AT->GC conversions may present yet another example of mutations that could potentially wreak havoc on genomes, but which, on the contrary, may have played at least a part in major creative events of evolution.