II. Pure GA-sequences - a model of genome navigation.

[See ref 7]

The following describes in broad strokes an outline of genome navigation that is consistent with the above findings. It offers details only when there were obvious objections to be met.

The need to concentrate the sign posts into small spaces.

If genomes, indeed, contain sign posts in the form of pure GA-sequences, it is rather obvious, what a genome navigation system should NOT do. Imagine that it would need to scan the entire genome in order to find a particular sign post, which subsequently would guide it to the desired target genes. This mechanism would offer very little advantage over no navigation mechanism at all. After all, instead of crawling along billions of bases to find a specific gene, the search mechanism would have to crawl along the same billions of bases in order to find first the appropriate sign post. Obviously, it would be much more efficient, if all sign posts were concentrated in a small space, so that the search mechanism could rapidly leap from one sign post to the next.

The looping of GA-sequences.

Since all pure GA-sequences of a chromosome are lined up in tandem on the same DNA strand, there is essentially only one non-disruptive way of forcing all of them into a small space, namely by placing the pure GA-sequences side-by-side while folding the intervening stretches of DNA between them into loops (See Fig. 1, Fig. 2).

The role of the upstream poly(A) stretches as binding sites for linker molecules.

A side-by-side arrangement of consecutive GA-sequences requires one or more species of linker molecules which are capable of binding to consecutive GA-sequences and to each other. Therefore, all GA-sequences should, their overall individuality notwithstanding, contain or be flanked by a common binding segment for these universal linker molecules. Based on the above results, the poly(A) stretches at the upstream end of the common GA-sequences are the most obvious candidates for such common binding sites for linker molecules. In this case, it would not be difficult to find the corresponding poly(A) binding proteins that could serve as the linker molecules. Although primarily known for their interaction with the 3' poly(A) tails of mRNAs, in many cases their binding specificity does not distinguish unambiguously between poly(A) and poly(dA). Hypothesizing, therefore, that certain nuclear poly(A) binding proteins align GA-complexes with their upstream poly(A)-segments, one may arrive at a basic topology of chromatin that would support a fast genome search and navigation mechanism depicted as depicted in Fig. 1.

The reduction of the search path.

Since the distance between consecutive GA-complexes is not constant, the parallel arrangement of GA-sequences would create a kind of 'ribbon' with different size loops between them (Figure 1). In this way, each chromosome is divided into 2 domains, the ribbon of the GA-sequences (non-coding) and all the rest (including all genes). In other words, all genes are located on one or the other loop. The variable sizes of the loops accommodate variable numbers and sizes of genes. Searching along this ribbon instead of the entire chromosome could shorten considerably the search path for genome navigation. Consider the following rough estimate! Human chr. 1 has a length of 238 Mb and contains 19513 common GA-sequences with an average distance between consecutive GA-sequences of 12.2 Kb. At a distance of 0.3 nm per base pair, the average loop between consecutive GA-sequences would therefore measure 3655 nm. If there was a base-by-base search mechanism it would have to crawl along this distance in order to move from one GA-sequence to the next. On the other hand, searching along the ribbon of parallel arranged GA-sequences would shorten the distance to the next GA-sequence to 1 diameter of the double helix (2 nm) and, maybe, the diameter of a linker protein (e.g. 3 nm). Thus, instead crawling for 3655 nm along a loop, the search mechanism could leap to the next sign post by moving only 5 nm, corresponding to a 730-fold shortening of the search path.

Fig.1. Side-by-side alignment of consecutive GA-sequences by poly(A) binding proteins (PABP).
The GA-sequences are assumed to be sign posts for a reading mechanism that uses the poly(A) segments as markers for the reading direction and as binding sites for PABPs that link the GA-sequences side-by-side. The PABPs are located preferentially at the upstream end of the GA-sequences (black+ dark gray). The intervening stretches of genomic DNA have variable sizes and loop around to the next GA-sequence. The parallel arrangement of GA-sequences is called the GA-ribbon. The GA-sequences are assumed to be associated with DNA binding proteins that are specific for tetra-GA motifs (not shown).

The chromatinization of GA-ribbon and associated loops.

Of course, in reality the various loops of DNA will have to be associated with nucleosomes (Figure 2). The average size loop of 12.2 Kb is large enough to accommodate roughly 50-60 nucleosomes, or about one 30-nm fiber. The variable lengths of the 30 nm fibers would be consistent with the variable length of the loops between adjacent GA-complexes. The GA-sequences whose parallel arrangement gives rise to the GA-ribbon would hardly exist as naked DNA for long, either. More likely they are associated with GA-specific transcription factors and other GA-specific DNA-binding proteins. In view of the reported prevalence of tetra-GA motifs in GA-sequences one would expect that these DNA-binding proteins have preferences for tetra-GA motifs such as the GAGA-factor , HSF1 and others.

Fig.2. Outline of a chromatin model that supports a fast genome navigation system:
By leaping from one GA-sequence to the next along the GA-ribbon in the scanning direction and 'reading' the information encoded in the proteins bound to the GA sequence in the reading direction, the postulated 'clavisomes' (search and reading complexes) can efficiently find the appropriate GA-sequence on a more than 700-fold shorter search path than by crawling along the various size loops of genomic DNA. After a clavisome found its target GA-sequence and interacted with it, the nucleosomes in the associated loop are released and the specific coding sequences in the loop are exposed to the transcription mechanisms. In the case of some primates such as human and chimpanzee, Alu-transcripts (labeled Alu-mRNA) are released from the upstream flank of the GA-complex and may help control the general background of protein synthesis of stressed cells in order to 'make room' for the new gene products proportional to their number.

The postulate of a search mechanism ('clavisomes')

If one adopts the view that genomes contain sign posts arranged into much shortened search paths, it is consistent to postulate also that 'something' exists that searches this path. This hypothetical searching complex must (a) find the specific GA-sequence, GA-Sequ0, that belongs to the associated loop containing the gene of the target protein P0 and (b) interact with it in order to initiate transcription in the associated loop(s). Much may already be known about this entity, albeit possibly under different names. While there is no evidence that it exists in the form of a nuclear particle, for the sake of simplicity it will be treated as such and called a 'clavisome' in the following (from lat. clavis = key) as it 'unlocks' a segment of chromatin.

The hypothetical initiation of transcription by clavisomes.

How can clavisomes initiate the transcription of a specific protein P0 in response to its demand by the cell? Several reports in the literature suggested that the activation of genes is accompanied by loosening or even breaking the association between DNA and nucleosomes. Thus it seems conceivable that the interaction between a clavisome and GA-Sequ0 leads to the release of the nucleosomes in the associated loop, thus exposing the coding sequences for P0 within the loop to the mechanisms of transcription. This step is depicted in the model of Figure 2.

The hypothetical recognition of target GA-sequences by clavisomes.

Of course, the above hypothesis begs the question how clavisomes, upon a cellular demand for P0, can distinguish its particular GA-Sequ0 from all other GA-sequences. Assume, that each cellular protein P0 is able to interact with a certain number of hypothetical transcription factors, tetra-GA-factorsm, which are specific for binding one of the 16 different tetra-GA motifs (Fig. 3a,b). The interaction may catalyze the formation of a P0-specific oligomer of these transcription factors, which is subsequently released from the P0 molecule and enters the nucleus (Fig. 3c). There it binds to the characteristic chains of the tetra-GA-motifs of GA-Sequ0 (Fig. 3d) and prevents clavisomes from interacting with GA-Sequ0. As a result, no new transcripts of P0 will be made as long as the cytoplasmic levels of P0 remain sufficiently high to produce a steady stream of the P0-specific oligomers. However, if the cytoplasmic levels of P0 drop below a certain threshold, GA-Sequ0 would become 'denuded' and allow clavisomes to initiate the transcription of the genes for P0 in the associated loop of GA-Sequ0. There are precedents for major aspects of the above scheme which are discussed in Ref. 7. The reference also discusses predicted properties of the postulated components and consequences of the model for co-regulation of gene expression and others.

Fig.3. Assumed linkage between the cellular demand for protein P0 and the accessibility of the particular GA-sequence GA-Sequ0 which connects to the loop containing the P0 gene (see text).
a. As long as the cellular protein P0 is available in sufficient quantities (i.e. there is no demand for P0), one or more of the 16 conceivable tetra-GA specific transcription factors tetra-GA-factorm can bind to it at its specific binding sites.
b. The bound tetra-GA-factorm molecules form a P0-specific oligomer, [tetra-GA-factorm1] [tetra-GA-factorm2]… [tetra-GA-factormN].
c. The P0-specific oligomers are released from the P0 molecule and enter the nucleus.
d. They bind to the characteristic chains of the tetra-GA-motifs of GA-Sequ0 and prevent clavisomes from interacting with it. Conversely, if the cytoplasmic levels of P0 drop below a certain threshold (i.e. there is high demand for P0), no more P0-specific oligomers are formed to block GA-Sequ0. As a result, clavisomes are able to initiate the transcription of the genes for P0 in the associated loop of GA-Sequ0.

Significance for the "functional anarchy" of genomes:

There is no denying that long GA-sequences exist in large numbers and are evenly distributed throughout the human and other large genomes. Their large sizes exclude the possibility to explain them as products of stochastic coincidences. Their origin and functions are unknown.

The following idea about their origin is, of course, speculation. As pointed out
earlier, the GA-sequences in the human genome could be viewed as various repetitive chains of the tetra-GA motifs AAAG, AAGG, AGAG, and GGGA. Therefore, it is conceivable that they originated from more or less random concatenations of these motifs, possibly created by invading retrovirus-like elements that replicated themselves and their tetra-GA-motifs in long chains throughout the genome. In defense, the genomes may have developed binding factors for the tetra-GA-motifs that bound to these chains in order to stop their further proliferation. Thus, genomes may have 'discovered' that the tetra-GA-motif binding factors could acquire the function of transcription factors which used these otherwise meaningless and disruptive GA-chains as sign posts for genome navigation. The heat shock factors HSF1 and the well-studied GAGA-factor may be examples of such transcription factors.

Assuming that this interpretation of the GA-sequences has merit, it would present a case, where massive disruptions by chain-forming transposable elements did neither wreak havoc nor generate a particular new order or symmetry, but created a badly needed function for genome sequences, namely genome navigation. I submit, that a mechanism of genome navigation is indispensible, especially, for the huge genomes of mammals and other large vertebrates. It seems urgent to think about conceivable mechanisms. Whether the above model or another one explain how genomes are able to navigate their own vastness, of course, remains to be seen.

TABLE OF CONTENTS