This most amazing

EXPLANATION: The explanation for this rule is

Surprisingly, indeed, very surprisingly, the same is true for EACH OF THE SINGLE STRANDS of the duplex.

Chargaff’s

EXPLANATION:

How is the second rule possible? There cannot be a mechanism that is able to count along a DNA strand how many e.g. TTCA's exist and adjust the number of TGAA's accordingly.

As will be shown below, this rule is almost universally valid. Therefore, the explanation for this rule

Before answering the question let us describe a quantitative way to test this claim. And while we are at it let us reformulate Chargaff's second parity rule. The reason is that it is a bit arbitrary, to call e.g. ACTG a group of bases and CAGT its reverse complement. Why not the other way round? Here is how one can avoid characterizing an oligo-nucleotide by the ambiguous term 'reverse complement' in the formulation of the second rule. This formulation will permit a very simple quantitation of its valididty.

__The quantitation of the validity of the rule.__

Consider the following example

If W1 represents the number of

(PLEASE NOTE THAT BASES ARE READ FROM LEFT TO RIGHT ON THE WATSON-STRAND AND RIGHT TO LEFT ON THE CRICK-STRAND)

If Chargaff's second parity rule holds, then there will also be W2 = W1

So, if Chargaff's second rule holds, then the Watson- and the Crick-strands will have the same number W1 = C2 of TTCA's (and likewise all other tetra-nucleotides), regardless whether they are considered tetra-nucleotides, or reverse complements of some other tetra-nucleotide.

Hence, we may reformulate

Therefore, in order to test the validity of Chargaff's second parity rule, one has to count how often each of the 64 possible triplets occurs on the Watson-strand, and then do the same for the Crick-strand. If the rule hold, then the count should be the same. The latter can be tested by a simple correlation plot as in Figure 1.

If a sequence complies completely, the plot generates a straight diagonal line with a correlation coefficient of c

It is obvious from the resulting straight line that the 2 kinds of counts are quite similar. How similar they are can be measured quantitatively by the correlation coefficient c

__The evidence for the almost universal validity.__

Using a computer program that I had written for this purpose I tested genomes whose size was less that 8 Mb by direct analysis. If a genome was larger, it was cut into sizes of 8 Mb and their triplet profile was measured individually.

Based on the analysis of more than 500 genome segments of 8 Mb size or smaller, the triplet frequencies of their Watson- and Crick-strands were virtually identical. Only a subset of mitochondrial genomes violated this identity (see below). In all other cases the standard deviation of the differences between the frequencies of all triplets on the Watson-strand and the corresponding frequencies on the Crick-strand was <2%. Correspondingly, the correlation coefficients between the Watson and Crick strands c

The high degree of compliance is not a matter of randomness of the genome sequences tested. By the very definition of randomness, all triplets of a random nucleotide sequence must occur with the same frequency. Therefore, its correlation plot must degenerate into a single point on the diagonal and the correlation coefficient becomes c

More specifically, the correlation coefficients c

a. The correlation coefficients for each of the 8 Mb large segments along the entire length of human chromosome 1.

b. Average correlation coefficients c

c. The correlation coefficients of arbitrarily selected entire chromosomes of various species ranging from primates to bacteria.

The shorter the genome segment was, the more the correlation coefficient c

In the course of the above tests it appeared that human mitochondrial genomes violated the symmetry rule. In order to test to what degree the same was true for all mitochondria I tested 51 mitochondrial genomes that belonged to a wide range of organisms. They included fungi, amoebae, invertebrates, insects, plants, slime mold, arthropods, and vertebrates such as amphibians, reptiles, marsupials, and mammals. They ranged in size between 14 kb (Limulus polyphemus) and 490 kb (Oryza sativa (rice)). Seventeen mitochondrial genomes were found to comply accurately with Chargaff's second parity rule. Similar to the human mitochondrial genomes, however, 34 other mitochondrial genomes were found to violate Chargaff's second parity rule to various degrees (Fig. 3b).

There is possibly an evolutionary explanation for the violation by several mitochondrial genomes, because most of the violators belonged to recent vertebrates.

Did some of the mitochondrial genomes violate the symmetry rule because mitochondria are not autonomous organisms? In order to examine this question, I also evaluated 42 chloroplast genomes which are not autonomous organisms, as well. The examples included those of seed plants as examples of the highest evolved plants, and of non-seed plants such as protists, algae, mosses, and ferns, ranging in size between 105 kb and 201 kb (average: 150 kb (std.dev 21 kb)). Despite their dependence on host cells, 42 chloroplast genomes complied quite accurately with Chargaff's second parity rule. Their average degree of compliance was c

a.Correlation coefficients c

b.Violation of the symmetry rule by mitochondrial genomes and lack of a size correlation between the correlation coefficients c

Of course, there is no genomic mechanism that is able to count along a DNA strand how many triplet of each kind exist and adjust the number of the reverse complementary triplets of each kind, accordingly. The explanation for this remarkable itra-strand symmetry has to be sought elsewhere.

I propose a mechanism that is based on inversions and inverted transpositions. These genome variations insert sections of a chromosome in reverse order in their original location (inversions) or somewhere else (inverted transpositions).

To be sure, the inversion of the base sequence itself would have no significance for validity of the rules, if it were not for the necessity to swap strands. In other words, the particular strand of such an inversion that was part of a Watson-strand before its excision has to be inserted into the Crick-strand and vice versa. As will be shown below, this action must equalize in an asymptotic fashion the base composition and oligo-nucleotide composition of the genome in question.

Assume e.g. that initially the number of G's is much larger than the number of C's on a Watson strand. Therefore, due to base pairing the Crick-strand contains correspondingly more C's than G's. Due to its strand swapping effect, every randomly located transposition/inversion must carry some of the supernumerary G's from the Watson-strand to the Crick strand while, at the same time, it carries some of the supernumerary C's from the Crick-strand to the Watson strand. The result is an ongoing equalization of the numbers of G's with C's on both strands. In a similar way, the mechanism equalizes the numbers of A's and T's on each strand. In contrast, it does not equalize the numbers of G's with A's, G's with T's, etc. because they are not paired with each other in the inverted segments.

The process is

The process is also

The principle effect of such large numbers of inversions/transpositions on strand symmetry is illustrated in Figure 4. Each duplex DNA is depicted as a pair of straight ribbons labeled as 'Watson' or 'Crick'. The four nucleotides are represented by shades of gray that color the various segments of the ribbons (Fig.4a). For the sake of simplicity I assumed that all inverted transposons had a constant size (see frames in Fig.4b, 4c, labeled 'inv/tp').

The illustration starts with the simplest possible situation of a duplex consisting of a poly-A strand and its complementary poly-T strand (Fig.4b, '0'). At this stage the Watson-strand contains only A's, AA's and AAA's, but no T's, TT's or TTT's. Likewise, the Crick-strand has only T's, TT's, and TTT's, but no A's, AA's or AAA's. Obviously, there is no symmetry between these strands.

The situation changes after the first inverted transposition has carried some T's to the Watson-strand while carrying an equal number of A's to the Crick-strand (Fig. 4b, '1', '2'). At this point not only do the complementary nucleotides appear on either strand. They also generate some mixed triplets such as ATT, TTA, AAT, and TAA for the first time on both strands. As the process continues and the number of randomly placed inverted transpositions increases, the distributions of A's, T's, and their corresponding doublets and triplets become increasingly the same. (Please note, that the sequences do not become the same, but only their mono-, di,-, tri-,…nucleotide distributions do.)

A more detailed analysis shows that the equalization of the nucleotide distributions grows exponentially with the number of inversions/transpositions.

Similarly, if the initial duplex contains all four nucleotides in some arbitrary ratio, the strands become exponentially more symmetrical with the increasing number of inversions/transpositions. An example is shown in Fig.4c.

Animated versions of Figures 4b and 4c are shown in Figures 4d and 4e. Note that the increasing numbers of inversions and inverted transpositions not only equalize the overall base counts on either strand but the sub-segments of either strands that still violate the symmetry rule became shorter and shorter. In other words,

a. Color coding of the 4 nucleotides by shades of gray that color the various segments of the ribbons.

b. Equalization of the numbers of A's and T's in the case of a duplex consisting of a poly-A strand and its complementary poly-T strand ('0'). Obviously, initially there is no symmetry between these strands. As the number of randomly placed inversion increases they carry increasing numbers of T's to the Watson-strand while carrying an equal number of A's to the Crick-strand panel b, '1', '2'). They also generate some mixed triplets such as ATT, TTA, AAT, and TAA for the first time on both strands. As the process continues and the number of randomly placed inverted transpositions increases, the distributions of A's, T's, and their corresponding doublets and triplets become increasingly the same. A more detailed analysis shows that the equalization of the nucleotide distributions grows exponentially with the number of inversions/transpositions.

c. Similarly, if the initial duplex contains all 4 nucleotides in some arbitrary ratio, the strands become exponentially more symmetrical with the increasing number of inversions/transpositions as indicated by the numbers at each duplex.

It is easy to express the above illustration mathematically and solve the resulting equations. (See ref 1). Figure 5 shows the example of the asymptotic increase of compliance and the concommitant equalization of the numbers of C's and G's with the number of inversions/inverted transpositions. The asymptotic value of the bases duplets, triplets, etc. is always the arithmetic mean between the 2 starting values on the Watson-strand and the Crick-strand:

(cf the progression of numbers of bases in Fig. 4d and 4d). mean of the starting values.

**Fig.5. Simulation of the convergence of a non-compliant genome to a compliant one by a recursive series of
transposition/inversions. (Abscissa: number of rounds of transposition/inversions; left ordinate: number of G's or
C's on the resulting Watson strand; right ordinate: degree of compliance of the resulting genome with Chargaff's second parity rule
expressed as correlation coefficient c _{WC} )
The thick line labeled 'compliance' depicts the simulated genome's degree of compliance with the
tinbtr-strand symmetry as a function of rounds of transposition/inversions. The thinner lines labeled 'G' and 'C' depict the
convergence of the numbers of the corresponding nucleotides during the same process. The thin line labeled 'theoretical'
depicts the theoretical curve of convergence. Note: This curve is not fitted to the simulation,
but merely uses the same value of (segment size/genome size). For the sake of graphic presentation the simulation assumed
a large ratio of (size of average inverted segment)/(size of whole genome) of 0.008. It appears that the theoretical
description matches quite accurately the exponential convergence of a non-compliant genome to a compliant one.**

The 2 strands of a duplex have, of course, very different sequences, both locally and globally. Yet, the almost universal validity of the intra-strand symmetry means that both strands have identical statistical distributions ( see the re-formulation of Chargaff's second parity rule) and, thus, identical physical properties. If, as suggested here, this amazing symmetry was created by countless 'reckless' inversions and inverted transpositions, their kind of anarchy also equalized and, thus, increased the physical stability of the 2 strands. At the same time it may have decreased their vulnerability which may stem from highly special configurations of bases present on only one of the strands. Their equalized physical properties may also have aided repair mechanisms and facilitated chromatin formation as well as horizontal gene transfer and, thus may have accelerated evolution.

The proposed mechanism also suggests that the valitdity of Chargaff's second parity rule describes

Even if it is true that inversions and inverted transpositions created the intra-strand symmetry, why did the many other 'anarchy wreaking' mechanisms of mutation not destroy it? It will be helpful for the discussion to describe the reformulated version of Chargaff's second parity rule by the simple formula:

where N

(

Now, et us look at some of the major mechanisms of mutation (variation) and test whether they are able to destroy the strand symmetry. It will be quite obvious that they do not.

If 2 sequences S

If

If a sequences S

If

If both alleles comply with the rule, then the exchange of a segments cannot change the equality of counts between the Watson- and Crick-strands.

The numbers

Base substitution have to potential to change the triplet balance between the 2 strands. However, in reality the number of point mutations is miniscule compared to the size of most genomes. Therefore, the imbalances caused by point mutations fall within the natural range of variations of the numbers N

Thus it seems that all the other, major 'anarchy-wreaking' mechanisms of variation leave the strand symmetry intact. Of course, it is possible to invent special kinds of mutations that would cause violations of the strand symmetry, but it seems that none of them exists in reality. Otherwise, it would be impossible to understand the almost universal validity of Chargaff's second parity rule. Based on this unversality, one may even go so far as to predict that if there are presently unknown major mechanism of variation, they may be discovered by searching for mutational mechanisms that leave the strand symmetry intact.

Considering the almost universal validity of Chargaff's second parity rule, there seems to be no selective advantage in obeying it. Every competitor obeys it, too. Of course, violating it could mean a severe disadvantage. For example, the almost universal use of L-amino acids represents no particular selective advantage, whereas the need to use D-amino acids would pose the severe problem for an organism here on Earth to find food. However, the evolutionary success of so many mitochondrial genomes seems to suggest that there is no disadvantage associated with the violation of the rule, either, although this example may point to the need to obey the rule, if the genome is to exist autonomously inside its own organism.

Therefore, until further insights come along, we may consider the compliance with the symmetry rule as an evolutionary neutral, though inevitable side effect of numerous transpositions, specifically the cases among them where the transposons invert and swap strands. The transposition themselves, of course, confer a major selective advantages because, as Barbara McClintock (1902-1992) pointed out in her Nobel speech, they offer genomes the possibility to respond to unforseeable "genome shocks".