Based on the above background of the universal strand symmetry, the almost universal majority profile, and the role of inversions/transpositions
in the creation of these universals, we are now in the position to examine the deviations of a genome from them. To be sure, there should be
deviations regardless of how universal these properties are, if only because the universals are the result of an asymptotic process which should
leave residual areas of not yet perfect expression.
As will be shown below, there are, indeed, many genome segments that deviate considerably from these universals, and especially from the majority
profile. Most surprising, however, is the demonstration that countless major deviations were created de novo over and over again during the course
of evolution. Obviously, these deviations must be the result of an unknown mechanism that acts as some kind of antagonist of the actions of
inversions/transpositions which continue to erase such deviations.
Signatures: Local deviations from a genome universal.
If evaluated in their entirety, most genomes expressed the strand symmetry and the similarity of their triplet profile with the majority profile to
an amazingly high degree of perfection. However, not every part of every genome expressed it to the same degree. Using a small probe segment to
scan continuously along a genome and testing its local deviation from these global genome properties yielded quite characteristic patterns. They
will be called 'signatures'.
The preceding sections argued that numerous inversions/transpositions in the evolutionary past of every genome have improved asymptotically its
strand symmetry and its similarity to the majority profile.
Therefore, one may be tempted to make some simple predictions about both kinds of
genome signatures. For example, one might expect that they are the more leveled out the older the species, as its genome had more time to approach
its end state. Alternatively, recognizing that the genomes of all contemporary species are actually equally old, one may expect that all signatures
show only minute deviations from their end states. After all, a billion years or more of evolution should suffice to reach this end state
quantitatively. Yet, none of these expectations were substantiated by the analysis reported here.
Fig.1. Illustration of genome profile signatures(= local deviation from the majority distribution
determined in a probe segment of 100 kb advancing in steps of 10 kb
along the genome sequence starting at the 3' end) in the case of the first 145 Mb
of human chr 1. An asterix indicates repetitive DNA that was masked by N's or n's
in the genome sequence. (Abscissa: position along genome [Mb]; ordinate: fraction
of deviation every 10 kb).
a. Profile signature ( = local deviation from the majority profile).
b. Local triplet profile in the area of a 'burst' (left 2 arrows),
which is very different from the majority profile.
c. Local triplet profile in a 'flat' area (right 2 arrows), which
is identical to the majority profile.
Depending on the template used in the scanning procedure, there are strand symmetry signatures and profile signatures. The strand symmetry signatures
showed very little activity. In contrast, the profile signatures revealed many complex features. In the following we will focus on them.
The scanning procedure used here consisted of selecting a probe segment of 100 kb and advancing it along the entire genome in steps of 10 kb. At
every step the correlation coefficient between the triplet profile of the probe and the majority profile was measured. The procedure yielded a
pattern that will be called the 'profile signature' of the genome. The present chapter suggests four properties of profile signatures that stand
in stark contrast to the above mentioned expectations.
Conservation of profile signatures.
The pattern of alternating bursts and flat portions appeared to be highly conserved for millions of years. An example is shown in Figure 2
for chromosomes 1 of humans, chimpanzees and rhesus monkey.
It is apparent that the signatures of these primates were very similar to each other in spite of the large evolutionary time differences
between them. A closer inspection showed that human chromosome 1 compared to chimpanzee and rhesus had additional stretches inserted in
In other cases local inversions are visible in the signatures. For example, the burst in rhesus chromosome 1 around position 160 Mb appears inverted compared
to the corresponding bursts at location 180 Mb in chimpanzee chromosome 1 and 200 Mb in human chromosome 1 (Fig. 2).
Since each curve shown in Figure 2 condenses more than 20,000 data points into the width of the graph, one may suspect that this similarity is
confined to low resolution depictions. However, it is easy to show that the similarity extended to considerably higher levels of resolution.
The similarities shown in Figures 2 result may not seem too surprising, considering the well-known high degree of sequence homology between
genes of humans and chimpanzees. However, it should be noted, that profile signatures include also the entirety of the non-coding regions, in
which sequence homologies are not as well established as in the coding regions.
Fig.2. Example of the conservation of profile signatures. (Axes as in Fig.1a).
Arrows between the profile signatures of chromosome 1 of three
primates (human, chimpanzee and rhesus monkey) point to patterns of bursts with
a high degree of similarity. Arrows point to corresponding groups of bursts.
Note: Each of these signatures containing approximately 20,000 data points
include coding and non-coding regions alike.
Profile signatures as markers of chromosome rearrangements and synteny.
The high degree of conservation should qualify profile signatures as markers of chromosome rearrangements. In order to test this, I selected
several well-known cases of chromosome concatenation and inversion and examined how their profile signatures would reflect this fact.
Fig.3. Profile signatures as markers of chromosome concatenation. (Axes as in Fig.1a).
Figure 3 shows the case of the concatenation of chromosomes 12 and 13 of chimpanzee, which formed the human chromosome 2. The concatenation,
known from synteny charts, is quite obvious in the profile signatures of the chromosomes. Likewise, a chromosome inversion such as a large
portion of mouse chromosome 5 and rat chromosome 14 are quite easily recognized by their profile signatures (Fig. 4).
The arrows between profile signatures of human chromosome
2 and chimpanzee chromosome 12 and 13 suggest that the former is very similar
to the concatenation of the 2 latter. This fact, well-known through the
compilation of sequence homologies, can be much more rapidly established by
inspection of profile signatures that also include non-coding regions.
Arrows connect corresponding bursts.
Fig.4. Profile signatures as markers of chromosome inversion. (Axes as in Fig.1a).
In general, the profile signatures of rat and mouse revealed a high degree of conservation. In contrast, I found no general
similarities between the profile signatures of these rodents and the mentioned primates except in the well-known area of synteny between human
chr.1 on one hand and the inverted chr.4 and chr.5 of mouse and rat, respectively (Fig. 5).
To date, the demonstration of such syntenic relationships between chromosomes is based on the rather time consuming compilation of sequence
homologies. In contrast, it takes only minutes to establish the profile signature of a several hundred Mb large chromosome if its sequence
is known. Therefore, profile signatures my offer a quick way to gain overview over potential candidates for synteny. Subsequently, the results
can be confirmed by more targeted and detailed tests of sequence homologies.
Note: Chromosomes with the same number do not necessarily have related profile signatures. For example, a case like the concatenation of 2
small chromosomes into a large one (e.g. Fig.3) changes the order of chromosome sizes and thus their numbers.
The arrows between the profile signatures of mouse chromosome
5 and rat chromosome 14 show that a part of the former is very similar to the
inversion of the latter. Arrows connect corresponding bursts.
Fig.5. Example of synteny expressed in profile signatures.
Corresponding bursts of the profile signatures of human chr.1 (10 -129 Mb), the inverted mouse chr. 4
(154 - 58 Mb), and the inverted rat chr. 5 (171 - 75 Mb) are connected by arrows.
Total fraction of bursts (TFB.)
As mentioned earlier, the depictions of profile signatures is highly condensed in the above figures. Due to the relatively thick lines of
the graphs the total fraction of a genome covered by bursts may appear much higher than it actually is. For a more accurate measurement of
that fraction, I added up the cumulative size of the 10 kb large test steps where the profile signature of the entire genome of an organism
dropped below 0.8 and divided it by the size of the total genome.
In order to examine examples of members of different families, classes and
phyla, the results were derived from the complete genome sequences of opossum, human, chimpanzee, rhesus, rat, mouse, platypus, arabidopsis,
maize, sea squirt, zebrafish, fugu, C.elegans, and C. briggsae. Figure 6 shows that the TFB value of profile signatures was highly variable,
and obviously unrelated to the evolutionary age of the corresponding organism. The only possible relationship between the data appeared to be
a similarity between members of the same order.
Fig.6. Total fraction of bursts (TFB).
The figure shows the fraction of total genome length where the
profile signature of the various organisms, indicated at the abscissa, dropped
below 0.8. The data comprise the entire genomes of these organisms and not
merely selected chromosomes.
Profile signatures as the result of local AT changes (universal sigmoid curve)
I also measured the local AT content of each probe as it scanned along the tested genomes. (One might call the result an "AT signature", but
it seems unnecessary to introduce yet another new term.) A profile vs. AT-content correlation plot between amplitudes of bursts in the profile
signature and the local AT contents yielded a surprisingly well-defined 'sigmoid curve'. Figure 7 shows the typical example of a burst of 10 Mb
size at the beginning of human chromosome 7 (Fig.7a).
The sigmoid curve indicated that the bursts were not as random as they seemed, but appeared linked to a single variable with little stochastic
'noise' (Fig.7b). Furthermore, the bursts did not create random nucleotide sequences, either.
Fig.7. The sigmoid curve. (Abscissa: AT-content measured in probe segments
of 100 kb that advanced in steps of 10 kb along a section of a genome; ordinate:
deviation of the same probe segment from the majority profile.)
(a). Profile signature of part of human chromosome 7 with a highlighted
burst of 10 Mb size at its beginning.
(b). Correlation plot between the amplitudes of the highlighted burst
and the local AT contents yielding the typical shape of the sigmoid curve.
The sigmoid curve turned out to be identical for different members of the same class. Figure 8 shows superimposed sigmoid curves of human
chromosome 1, chimpanzee chromosome 1, mouse chromosome 2, rat chromosome 1, opossum chromosome 1 (0 - 320 Mb), and platypus (0 - 253 Mb).
In spite of the large number of approximately 140,000 data points the the resulting superimposition remains quite well defined. Please note
that platypus does not belong to the majority group and, yet, its data points fall on the same sigmoid curve as other mammals.
In contrast, the sigmoid curves of members of other phyla and kingdoms could differ markedly in the slope and location of their inflection
Relatively flat profile signatures of genomes or parts of genomes can cover only a small part of the sigmoid curve. Yet, these parts coincided
quite well with the curve. Even flat profile signatures of genomes that did not belong to the majority group such as platypus and chlamydomonas coincided
with the curve quite accurately. The latter genomes have an overall high GC content. Yet, their data points seemed to fall as much on parts of the same sigmoid curve as members of the majority group. It suggests that the sigmoid
curve expresses more than merely a special property of genomes of the majority group.
Another example of this larger than expected universality of the sigmoid curve resulted from the examination of 50 mitochondrial genomes.
In summary, bursts have the following properties.
Fig.8. Degree of conservation and universality of the sigmoid curve.
(Abscissa: AT-content measured in probe segments of 100 kb every 10 kb
along a genome; Ordinate: Similarity between triplet profile and the
majority profile of the same segments expressed as correlation coefficient
Superimposition of the sigmoid curves of human chromosome 1, chimpanzee
chromosome 1, mouse chromosome 1, rat chromosome 2, opossum chromosome 1
(0 - 320 Mb), and platypus (0 - 253 Mb). In spite of the large number
of 140,000 data points, the resulting superimposition remains quite well
defined. Please note that platypus does not belong to the majority group
and, yet, its data points fall on the same sigmoid curve as other mammals.
An explanation of the sigmoid curve based on the spectra and mechanisms of point mutations.
At present, the mechanism underlying the sigmoid curve is not known. However, computer simulations suggested that the effect of this unknown mechanism may not amount to much
more than the replacement of A's and T's with the same number of C's and G's in random places of the local sequence. (see ref 4.)
In view of the previous result concerning the spectra of vertebrate point mutations, it is not difficult to
explain such a replacement mechanism (The data of ref 5 were not known to me at the time of the research described in
ref 4. Therefore, ref. 4 offered a more complicated explanantion).
As shown in ref.5, the [A->G] and [T->C] mutations are the most frequent and both can be explained by chemical conversions. Furthermore every [A->G]
conversion amounts to a [T->C] conversion on the opposite strand and vice versa. Assuming strand symmetry of conversions, i.e. the likelihood of e.g. a
[A->G] conversion is the same for both strands, the numbers of [A->G] and [T->C]-conversions will be equal on the same strand, as the computer simulation had required it.
The appearance of new patterns of bursts during the course of evolution.
Profile signatures appear to be markedly different for members of different kingdoms, phyla, and classes. Even members of the same class of mammalia
such as platypus, rat, and chimpanzee have radically different profile signatures. In contrast, the profile signatures of members of the same order
of primates such as humans, chimpanzee and rhesus ranged between 'similar' and 'practically identical'.
Bursts do not violate the strand symmetry. Therefore, local inversions and inverted transpositions would have no effect on them. However,
transpositions from a far away part of the genome that is free of bursts could conceivably erode them. Indeed, I had expected that the ongoing
equalization of all profile signatures to be a monotonous function of evolutionary time. In other words, no evolutionary 'younger' species
should have more bursts in its profile signatures than the genomes of an evolutionary 'older' ancestor genome that had more time to erase them.
This expectation was obviously wrong. For example, the profile signatures of the phylogenetically 'older' opossum or even
zebra fish were almost flat, whereas the profile signature of the 'younger' human showed 5 times as many bursts. Obviously, they must have been
created de novo at some time after mammals branched away from the common ancestors of the zebra fish and other vertebrates. To be sure, these
novel bursts were not drastic enough to remove the corresponding genomes from the majority class, but they clearly involved an identifiable
and substantial portion of the 'new' genomes.
Is the appearance of new patterns of bursts linked to the evolution of new orders?
Considering that humans, chimpanzees and rhesus monkeys had very similar profile signatures, it stands to reason that their common ancestor,
the 'first' primate had this or a very similar profile signature, as well. Likewise, one may speculate that the common profile signatures of rat
and mouse were also shared by the 'first' rodent.
Yet, obviously, the profile signatures of rodents were quite different from those of primates. Both, in turn, differed substantially from the profile signatures
of the opossum and the platypus which belong to the yet different and much older orders of monotremes and marsupials within the class of mammals. It may seem, therefore,
that the evolution of a pattern of new bursts in the profile signatures of certain ancestral genomes marks the beginning of new orders which, however, seems to cease subsequently while the members of this order evolve over very long
evolutionary time spans.
Of course, there is no evidence that the appearance of new patterns of bursts was causal in the origin of new orders. Still, the underlying local and massive AT->GC
conversions may present yet another example of mutations that could potentially wreak havoc on genomes, but which, on the contrary, may have played at least a part in major creative events of evolution.