The need to concentrate the sign posts into small spaces.
If genomes, indeed, contain sign posts in the form of pure GA-sequences, it is rather obvious, what a genome navigation system should NOT do. Imagine that it would need to scan the entire genome in order to find a particular sign post, which subsequently would guide it to the desired target genes. This mechanism would offer very little advantage over no navigation mechanism at all. After all, instead of crawling along billions of bases to find a specific gene, the search mechanism would have to crawl along the same billions of bases in order to find first the appropriate sign post. Obviously, it would be much more efficient, if all sign posts were concentrated in a small space, so that the search mechanism could rapidly leap from one sign post to the next. Since all pure GA-sequences of a chromosome are lined up in tandem on the same DNA strand, there is essentially only one non-disruptive way of forcing all of them into a small space, namely by placing the pure GA-sequences side-by-side while folding the intervening stretches of DNA between them into loops (See Fig. 1, Fig. 2).The role of the upstream poly(A) stretches as binding sites for linker molecules.
A side-by-side arrangement of consecutive GA-sequences requires one or more species of linker molecules which are capable of binding to consecutive GA-sequences and to each other. Therefore, all GA-sequences should, their overall individuality notwithstanding, contain or be flanked by a common binding segment for these universal linker molecules. Based on the above results, the poly(A) stretches at the upstream end of the common GA-sequences are the most obvious candidates for such common binding sites for linker molecules. In this case, it would not be difficult to find the corresponding poly(A) binding proteins that could serve as the linker molecules. Although primarily known for their interaction with the 3' poly(A) tails of mRNAs, in many cases their binding specificity does not distinguish unambiguously between poly(A) and poly(dA). Hypothesizing, therefore, that certain nuclear poly(A) binding proteins align GA-complexes with their upstream poly(A)-segments, one may arrive at a basic topology of chromatin that would support a fast genome search and navigation mechanism depicted as depicted in Fig. 1.The reduction of the search path.
Since the distance between consecutive GA-complexes is not constant, the parallel arrangement of GA-sequences would create a kind of 'ribbon' with different size loops between them (Figure 1). In this way, each chromosome is divided into 2 domains, the ribbon of the GA-sequences (non-coding) and all the rest (including all genes). In other words, all genes are located on one or the other loop. The variable sizes of the loops accommodate variable numbers and sizes of genes. Searching along this ribbon instead of the entire chromosome could shorten considerably the search path for genome navigation. Consider the following rough estimate! Human chr. 1 has a length of 238 Mb and contains 19513 common GA-sequences with an average distance between consecutive GA-sequences of 12.2 Kb. At a distance of 0.3 nm per base pair, the average loop between consecutive GA-sequences would therefore measure 3655 nm. If there was a base-by-base search mechanism it would have to crawl along this distance in order to move from one GA-sequence to the next. On the other hand, searching along the ribbon of parallel arranged GA-sequences would shorten the distance to the next GA-sequence to 1 diameter of the double helix (2 nm) and, maybe, the diameter of a linker protein (e.g. 3 nm). Thus, instead crawling for 3655 nm along a loop, the search mechanism could leap to the next sign post by moving only 5 nm, corresponding to a 730-fold shortening of the search path.Fig.1. Side-by-side alignment of consecutive GA-sequences by poly(A) binding
proteins (PABP).
The GA-sequences are assumed to be sign posts for a reading
mechanism that uses the poly(A) segments as markers for the reading direction and
as binding sites for PABPs that link the GA-sequences side-by-side. The PABPs are
located preferentially at the upstream end of the GA-sequences (black+ dark gray).
The intervening stretches of genomic DNA have variable sizes and loop around to the
next GA-sequence. The parallel arrangement of GA-sequences is called the GA-ribbon.
The GA-sequences are assumed to be associated with DNA binding proteins that are
specific for tetra-GA motifs (not shown).
The chromatinization of GA-ribbon and associated loops.
Of course, in reality the various loops of DNA will have to be associated with nucleosomes (Figure 2). The average size loop of 12.2 Kb is large enough to accommodate roughly 50-60 nucleosomes, or about one 30-nm fiber. The variable lengths of the 30 nm fibers would be consistent with the variable length of the loops between adjacent GA-complexes. The GA-sequences whose parallel arrangement gives rise to the GA-ribbon would hardly exist as naked DNA for long, either. More likely they are associated with GA-specific transcription factors and other GA-specific DNA-binding proteins. In view of the reported prevalence of tetra-GA motifs in GA-sequences one would expect that these DNA-binding proteins have preferences for tetra-GA motifs such as the GAGA-factor , HSF1 and others.Fig.2. Outline of a chromatin model that supports a fast genome navigation
system:
By leaping from one GA-sequence to the next along the GA-ribbon
in the scanning direction and 'reading' the information encoded in the proteins
bound to the GA sequence in the reading direction, the postulated 'clavisomes'
(search and reading complexes) can efficiently find the appropriate GA-sequence
on a more than 700-fold shorter search path than by crawling along the various
size loops of genomic DNA. After a clavisome found its target GA-sequence and
interacted with it, the nucleosomes in the associated loop are released and the
specific coding sequences in the loop are exposed to the transcription mechanisms.
In the case of some primates such as human and chimpanzee, Alu-transcripts
(labeled Alu-mRNA) are released from the upstream flank of the GA-complex and
may help control the general background of protein synthesis of stressed cells
in order to 'make room' for the new gene products proportional to their number.
The postulate of a search mechanism ('clavisomes')
If one adopts the view that genomes contain sign posts arranged into much shortened search paths, it is consistent to postulate also that 'something' exists that searches this path. This hypothetical searching complex must (a) find the specific GA-sequence, GA-Sequ0, that belongs to the associated loop containing the gene of the target protein P0 and (b) interact with it in order to initiate transcription in the associated loop(s). Much may already be known about this entity, albeit possibly under different names. While there is no evidence that it exists in the form of a nuclear particle, for the sake of simplicity it will be treated as such and called a 'clavisome' in the following (from lat. clavis = key) as it 'unlocks' a segment of chromatin.The hypothetical initiation of transcription by clavisomes.
How can clavisomes initiate the transcription of a specific protein P0 in response to its demand by the cell? Several reports in the literature suggested that the activation of genes is accompanied by loosening or even breaking the association between DNA and nucleosomes. Thus it seems conceivable that the interaction between a clavisome and GA-Sequ0 leads to the release of the nucleosomes in the associated loop, thus exposing the coding sequences for P0 within the loop to the mechanisms of transcription. This step is depicted in the model of Figure 2.The hypothetical recognition of target GA-sequences by clavisomes.
Of course, the above hypothesis begs the question how clavisomes, upon a cellular demand for P0, can distinguish its particular GA-Sequ0 from all other GA-sequences. Assume, that each cellular protein P0 is able to interact with a certain number of hypothetical transcription factors, tetra-GA-factorsm, which are specific for binding one of the 16 different tetra-GA motifs (Fig. 3a,b). The interaction may catalyze the formation of a P0-specific oligomer of these transcription factors, which is subsequently released from the P0 molecule and enters the nucleus (Fig. 3c). There it binds to the characteristic chains of the tetra-GA-motifs of GA-Sequ0 (Fig. 3d) and prevents clavisomes from interacting with GA-Sequ0. As a result, no new transcripts of P0 will be made as long as the cytoplasmic levels of P0 remain sufficiently high to produce a steady stream of the P0-specific oligomers. However, if the cytoplasmic levels of P0 drop below a certain threshold, GA-Sequ0 would become 'denuded' and allow clavisomes to initiate the transcription of the genes for P0 in the associated loop of GA-Sequ0. There are precedents for major aspects of the above scheme which are discussed in Ref. 7. The reference also discusses predicted properties of the postulated components and consequences of the model for co-regulation of gene expression and others.Fig.3. Assumed linkage between the cellular demand for protein P0 and the
accessibility of the particular GA-sequence GA-Sequ0 which connects to the loop
containing the P0 gene (see text).
a. As long as the cellular protein P0 is available in sufficient quantities
(i.e. there is no demand for P0), one or more of the 16 conceivable tetra-GA
specific transcription factors tetra-GA-factorm can bind to it at its specific
binding sites.
b. The bound tetra-GA-factorm molecules form a P0-specific oligomer,
[tetra-GA-factorm1] [tetra-GA-factorm2]… [tetra-GA-factormN].
c. The P0-specific oligomers are released from the P0 molecule and
enter the nucleus.
d. They bind to the characteristic chains of the tetra-GA-motifs of
GA-Sequ0 and prevent clavisomes from interacting with it. Conversely,
if the cytoplasmic levels of P0 drop below a certain threshold (i.e.
there is high demand for P0), no more P0-specific oligomers are formed
to block GA-Sequ0. As a result, clavisomes are able to initiate the
transcription of the genes for P0 in the associated loop of GA-Sequ0.