GENOME NAVIGATION

The need for genome navigation:

Cells cannot regulate the expression of a gene unless they can find it.

Genome navigation is a necessary, albeit still little understood, early step of gene regulation. Especially the often huge mammalian genomes may require a sophisticated system of genome navigation to enable polymerases, transcription factors and other DNA binding proteins to zoom quickly into their proper target areas without having to search base-by-base along sequences of many billion bases.


Illustration

As a little illustration of the tantalizing problem of genome navigation, imagine to be inside the cytoplasm of a cell that needs to get a new copy of a particular protein PX. Below are 9 kb of human chromosome 2. How would you, the reader, decide whether the sequence below contains the promoter or an exon of the PX gene? And keep in mind that the cytoplasm only 'knows' the protein, but not its unspliced nucleotide sequence! And then consider
  • that you don't know whether the PX gene is even on chromosome 2, and
  • that the human chromosome 2 alone is 26,000 times larger than the piece below, let alone that there are 45 other chromosomes of comparable size to be checked, and
  • that there may be mutants and pseudo-genes of PX, that you don't want to use, and
  • that you have at best a few minutes time to find, transcribe, splice and translate the right PX gene.
  • Admittedly, the example may be somewhat polemic, but it still may illustrate how difficult genome navigation must be, and how little we understand it.

  • ....TAGAGGGATTTATATGGGCAGAGAGAAGGGTGGTCTGGAGAGGGATTTATAAGGACAGAGAGAAGGGTGGTCTGGAAAGTGATTTACAAGGGTAGAGAGAAGGGTGCTCTGGAGAGGGATTTATATGGGCAGAGAGAAGGGTGGTCTGTAG AGGGATTTATAAGGACAGAGAGAAGGGTGGTCTGGAGAGGGATTTATATGGGCAGAAAGAAGGGTGGTCTGGAGAAGGATTTATATGGGCAGAGAGAAGGGTGGTCTGGAGAAGGATTTATATGGGCAGAGAGAAGGGTGGTCTGGAGAGG GATTTATAAGGACAGAGAGAAGGGTGGTCTGTAGAGGGATTTATATGGGCAGAGAGAAGGGTGGTCTGTAGAGGGATTTATATGGGCAGAGAGAAGGGTGGTCTGTAGGGGAATTTATATGGGCAGAGAGAGAAGGGTGGTCTGGAGAAGG ATTTATAAGAGCAGAGAGAAGGGTGGTCTGGAGAAGGATTTATAAGGGTGGAGAGACATGGAATTTGCCTCGCTGGCTGCATCAGCATGAATCTCGAGGTTTGGTGCTTGTGATGCAGTTGAGGGGAGGGGTGGGAGTTGAGGAGATGGAG AACTGGCTAGAACCCGAGGGGCAGCCTTGAGGGGTGGAGGGGGAGCCTGAGGAAGCCGGGGGCCCCGGGAGGGGGCTGGGCTGCATGTGGCTGGGTAGGGCTGCATCTCTGTCCTAAGAAGGAGTGAAAACCCTTGGGTGGTGGACAGCGC GGGGCTGAGACGCAGGCGGGAAACCTTGGGATGTGCATTTTGAAAAACTGAGTTGCCAGGTGGAGAAGAGACTGGAGCGGGAGCAGGTATCCTGGTCCCGCAGGCTTGTCCGAGGTAGAGATGTGGGCAGCTGGGACCATGTTGGTGGCTG TTCAGATGCAAAGAAGTAAATGGATTCAAGAGCCTCAAATGGATTCAATGTCTAGTTAAAGGCTTGATAGAAGGGATTTTTAGATGCCATTAAAAATGACAGTCTTAATCAGATTAAAAATTTGGCCACATTAAAATAAAATATCCACACA TATATATTATTTTTCAAAAACACACCATAAACCAAGTTAACTGACAGAATACAAGCTGGAAGGACGTGTGTTCTATTTATGTGACTTACAAGAGGTTTATTTCTGTATTAAATGAAGATTTTAAAAACAATAAGGCAGTGTCATAAAAGGA AAAAAATCCCGTGGTACAGACAGACAGTCCAGCAGGTGGAAACCTTTAAGGCCAGTAAACGTGGATGATGTCCAAGTTCACTTGTGATTGGAGAAATATAAATTTCCATCAATGGGGATGTCTTCTTCCCAGCTAGGAGACTGAAAAATAA AACGTGTTAAACCCAAATGTTGCCTAGTTTAGTGGATTGCCAGAAACTCACATGCAGGTTTGTAAATCATTACAAATGTCTATGAGGATCAGTTCTCTAGAAGAATTGAAGACGCGCCTTTGGGTGCAGAGTTGCATATCTGGGTGCCCTA GAAATTCCTTCGCTCCCAGCAGGATGTCCCAGCATCATCTGTATCTGCTGCCCTGCAAGACAAAAGACTTGCAGATGGCCCATGCAGCAGATAGTCCATGTTTATAAATGGTCAGTATTACTTGTAACAGGAAATAAAAATGGAAATAGCT GAAATAGAATACTGTAGGGGATTATCTTAATACACTGTATACATGTGTTATGCTTTGTTAATTTCATGCAATGAAATATTACATTACTGTTAAAATGGATTAATTGGGGCTATCTCCATCAACATATCTAAATATGTTACAGCTTTGAGAA AGGCAGTTATAGAGCACAGTATATAGTATGACCCCCTTCCTGTAATTAAAAAAAAAAGTGTATAACTGTTGTCAACAAGTAATCCCATGTTTCATTAATTTGCAGACAACTTTTTTTTGTTTTGTTTTTTGTTTTGCGACGGAGTCTCGCT CTGTCACCCAGGCTGGAGTTCAGTGGCGTGATCTTGGCTCACTGCAGCCTCTGCCTCCCAGGTTCCAGCCATTCTCCTGCCTCAGCCTCCTGGGTAGCTGAGATTACAGGCGCATGCCACGTGCCTGGCTAATTTTTGTATTTTTAGTAGA GATGGGGTTTTACCATGTTGGCCAGGTTGGCCTCAAACTCCTGATCTCAGGTGATCCGCCCGCCTCAGCCTCCCAAAGTGCAGGGATTACAGGCGTGAGCCACCACGCCCGGCCGACAACTTTATTTCTTATAAAGAGTTGCAGCCCTCAG GCTGGACTTTCTGACAGGCTGCCTCCAGCAGAGACCGAAAGCAGGCATTTGAAGGAGGAAGGCTGAGACAGGAATTTATGCTGAACAGGTTGGCCAAATATACATATTTAGCAGGTGGGAGATGGAGCTATGCGTATTCACGGACTGGCTC TGAGACATTTGACTGAATAAACACGTATGTTATGTGTGACACATATTCACTTTGGGGTGTGGACTTAACATTAAAATGTAGCAAAATTAGGCTCTGTGTGTTGAAAGGTGAAGCTGGTACATGAATGCCCTCAGTGCACAGCCTCTGCAAA ACCAGCCAGGACCAGATCATGGTGGGTGGTCTCTTATCAGGAGAAAATTACTGGAATCATACCTTATCCAACTGAAGCTGCTGCTGTGGCTGGTAAGAGGTTCAGTTGGCCAGGGTCTCAGAGCTGGATGAGCTATGAGTATTTTAATCTT TCTTATCTCTAGGCTAGTGCTGGTTCAGCTGCTACAGAAAAAGAAAATCTTGTGGCAGTGAGTATACGGTTTACTAAGAGCAGGGTGCACAACTCACCCTTTGCCTGGCATGGCTTTAGGTCCTGTTTGTAATTTGGTGTCTTATTGCCAC AGAGAATCTACTCTGTCAGTCTTATGATCTCTGTTTTACTGCTGTAACATCTAGTATTAATTATTCTGTAACTGTGTCATAAAAGTCTAAAAACTTCATTAGAAAATGAAATTGACTCTGTAAAGCAAAAGGTATGTAATAAGGGGAATTC TCAAGTATATCTTTTTAAAATTATATACACACATGCAGACACACGTATACACACATACACATGCAGGCATGCAGACACATGGAGCTACGTGTACACATGCAGACACATGCAATGTGCATGCACAAACATGCAAACAAATGCACATGCATGC ACACATGTAGAGACACATACGTGCACATTCACACCCAGAATATATGCATATCCACACAGGCAGGGACACACACACACATACACACATGCACACACACATGTATGTACATGTATTTATGGTGTAGGTTAAAATTGTATAAAGCTGGCATTGA ATATACAGATGCACACCATAGTGTTAAATTTTGCCTAAGAGCTCACGTATACATAAATTTTAAAAAGAAGAGTTACCGTATCTTCTTTTAATTTACTTGGTTTCTGATCACAGATAAGCATTGCTAGACAGCTCTGTGACTGTGTCAGGTG AATGTTAAAGGGTTGTCGTGAGGGCAGTGTTAGCTATGGAATGCATCAATAACTGTCAGGTTCCGAGCACACAGTGGGTGTTCTACAAATGCGGATGGTGTGAGGAGGTGACCTGTGGTGTCTGGTCAGCTCATAACTTCCATATTTTTCT TTAGGCTCTTCCCCTTCAGAGAGTGCCCCACGCCATCTCCAGTTGGGTTTCGTCCATCCCTTTTCCTCCAGGCCCTTTCAGAATTTCGTGCATGTATCTGGCATCCATGCATAAGAAATGTAAAGATATTTGTTAGTCATGGAGTTAGTTA GCAAAATATGTATGTTGTAGTATGGTAACGAGCCTCTGTTGACTCCCACTTTTTACTGGTAGGGATAAAATTTGAAAATGATTAATTCACAGTTCTTTTAAAAATCCCATTAAGGCTGGCTGATTGCTGAAGTGGCTCTTACGTTCTGTTC AGTGCATCAAAGCAAGGGAAACATCATCTTGCAGAAAGTAGTGAGGAGACAAGCAGCCCTGGGGAGCTGAAGGCACGTCGCTGGTGGTCAGAGGGTGTCTTCATTTACTGGCCACAATCGGCATAAGCTGCTGCTTCTGAAAGATGCCCTT GAAATTTGGTGCCCACTCCTGAGACCATGGTGGGCCCTCCAGGAATCCCTGCTTTAGTGTTCACTTATTTGTGGCAAAAGATGGGGCCTTGGTGCCCACAGCCGACTGGGTAGGTGGAGGCTGTGCAGTCAGGGAGGGGAGAGTGCATGGT GGGCAGGGTTGGGGCCGCTGTCAGACCTGGAGAAACATCAGTGAGAAAGTTTGGAAAAGGAGGAAGGAGCAGGAATGAGAGCTGTTTAGACCCAGGGCGTGAACTTCTCACCTGGTTCAGGCTGTGCTCTGCAGGGAAATATTCACAGCAT GAACGATATGCAAAGCCAATGCATGCTCAAGTGGCTGGCAAAAAAAAATTAATAATAATAATAAAATAAAAAGATTGCGTGATTTTTTTGTAAAATGTTGGCAAAGGCCTGAGCACAACGTGCTTTCACTTAATGGGCCATCACATTACAG CCCCTGTTGGCTCTCAGCAGGGTTGGCTCCCAGCAGGTGTTTTCTGAGTGCTGCTGGGCCTGGGATGCTGGTAGTTTCCAGGGTGTGAAGAGATGCCCATTGTCCTCCTCCTGTCCTGACAGGTTTCCCACTTGATGAGAATACAGCACGC ATGCAGAGCTCTATCTACAGTGCTGACAGCGCTGCATGTTTTACTAGAAGGACCTGGCATCAGCTCTTCGAGCAGGTGCTGTTAGGGGGGCAGCTGTGGAGGGTGTGAGTTGTGGACAGGCATTCTGAGCTGGGCTGCAGAGCACTGATGG AGGGACGTCATGGTGTGTATTGTGCATCAAACAGTCCCCTTTGGCTCAGCTTCATGTGATGGGCAATGAGGTTGAAATCAGACAGGCCAGATATCATGTCTGTGAGTCTTGGAGGGTCTTGGAGAACATGCTAAGAAGTTTTCATTTTAAT CCATCAATGAGTGTGTTTAGAGCAAGAGAGGGACTAGTGATTCCAAACTGGCGTTTCCTGATTCACGTGGCAATGTCATGCAGGGTTGAGTGGGAAGGGGACAAACTGGAGCTGTTTAGAAGATGGCTTTGTGAATCTGAGTGAGTGTTGT GGGAAGAGGCCATAGGTGTGCTGGGAATGTGGAGGGAGTGACCAGGTGTGGTATCCAAGGGAAGAAATGGCAAGGAGGGCAGGGTCAAAGTGGCTCCAAGCTTTGGAGATTGCACACTCTTCTGGAAAGAAAGGAAATGGAAATTCATTGA AAAACTATAATGTCACAGCAGGCTGTTGAAACTCTACAGACATTCTCTAAGTTAATGATAGTTCCACTGGTAATTATGCTGCAGGATTAGGCAGTTTTAGCTGAGGGATAACCACAGTGCTAAGGAGGCTCATGCGAGGTGGGCGACCTGG TATTAACATCAACACAGAGGCCAGGCGTGGTGGCTCAGGTCTGTAATCCCAGCACTTTGGGAGGCCGAGGCAGGTGGATCACCTGAGGTCAGGAGACTAGTGTGGCCAACATGGTGAAACCCCGTCTCTACAAAAAAAAGAAAAATTAGCC GGGTGTGGTGGCGGGCACCTGTAATCCCAGCTACTTGGGAAGCTGAGGTAGGAGAATTGCTTGAACTCCGGAGGCAGAGGTTGCAGTGAGCTGAGATCGTGCCACTGCACTCCAGCTTGGGGGACAGAGCGAGACTCCATCTCAAAGAAAA ACAACAAGAACAACACAGAAACACTATGGGCCACTGACAGTTATTCCCAATAGCCATTCAGGCATGAATCTCTTCTGATATTAACAATGGTGCATTTCAAGTACACAAGCTGATGATGAATAAAATCCCATTTTATACACATACGTGCATG CTCTTCCAGCCTTCATAGACGTGGGGTGGATGGTGGAGGACTCCAGGGCAATGCCCAGCAGGTGAATGGCAGGGCCACTTCCTAAGGGCCCAGGCATCTGGGCTTGGGCCCAGCTCCCCTGGCTCCAGCATGAATGCCTTTCCTCCCACTT CTGGCCCCTGGACTGGCCAGATGGTGAGTGAGGGAACCACTTACAACCATCAGCAGGCAGGCCACGGTGGAGCCTCAGAGGGACATCTGCAGAAGGACTTATTTTTACCAAAGAGCTGTTCTGAAAACCCACTCAGCATCTGCACTGTCCG AATGGGATCTGATTGTGCTGGGTCTGTGTACTCATTCCTGCACTTCTCCAAATTTGTAGGAATTTTTTTTTTAAAAAAATTCCAACAGAAAAGTTGGAAGAATAGTCTAATAATATACTCCTCATCCATATTCAATTACTAATATTTTAAT ATATTTATCTTTGTGTTTTGTTCTTGCTGATTATTAAAAATCATTTGCAAAATTCAGGAATTTTAACCTCAAACATGTCAAACTGTCTGCCACATCAGGAAGTCCCATTATTCATGAGCTGAGTTTTTCACTTAAGAGAGTAACTCACAGA CTTCTCTGTTTTTAACCATATGTATATATATATTTTTTGCCAATAGTAACTGTTAGACAGCATGAAAATATCCTACCAACAATTTTTGCTTAATAGCTTTAGCATTTAATGATGATCCTTCTGAGTCAATTATTTCATTGAACCTGGAAAA ATGGTGATTTTTGTATTACATTTTTAATGACCTTTCTCCAAAGAAGAATAGCTTTTCCTCATCAACTATAGAGACACTGTGGTTCCTCTTAAAATGAAAGGTGAATGATTAATTCTCTCTTGAATCGCCAATTTTCACAGTAAGGAATCGT GTTTTGATCTCTTCCAGTGGGGGCAATCAAGCCATGTTTGTCTCTTCCTCTCACTTGCTTTCTCTGTAATTTTTTGAGTATTATTACATATACATGGTGCACTACAGTCAGTTACGGTAGCTACTATCTTTGACAGTTAAATAGACCCAAA TGTGATTAGTAGGAGGTTCAAGCTGGTTTCTGTATCTTGTTGACCTGCACCCATAAAAGTCTGTTGATCAGTAGCTCCTTTTGTTTGATAATAAAAAATATCTCAAGCACGCTCTCATAATACTTTGCTGAATCCAGAAATGGCATGCCCC ATTTTCCAAGGAGATATATTGCTTTTTAAAATCCATCATTTTAACAGGGTATTTGGAAACCAAAAGCTTGCAGGGACATTTTCTGGACACATCAAGCAAATATATTTTAAAATTTTGTTTAAATATATTTTTTAATATTTTACTCAAATTT GATATAACTTATTTTTTGCCTTGACTTCTAGGAATTTTCTTTTTTTTTTCTTTTGCTTTAGGGCCAGAGTCTTTCTGTGTCATCCAGACTGAAGTGCAGCACTATGATCATAACTCATTGCAGCCTCCACCTCCTGGCACCTCCTGGCTCA AGCAATCCTCCCACCCCAGCCTCCTGAGTAGCTGGGACTATAGGAATGCACCACCATGCCCGGATTTATTATTATTATTAATTTTTGTAAAGATGGGGGTCTCACTTTTTTGCTCAGGCTGGTCTGAAACTCCTGGCCTCAAGTCATCCTC CTGCCTTGGCCTCCCAAAGTGCTGCAATTAAAAGTGTGAGCCACCACACCTGGCTTGAATTTTGTTTTATGTGCTTTTTTTTTTTTTTTTTTTAGTGAAAACATAAGCTCCTAATAATCCTATCATGCTAAATTACTTATTCTATGTGTAC ATGATACATTTTCAAAATTTTAGTACACTAGTTACACGTGAACTAGTACAGCCACTGTGGAGAAGAGTATGGAGGTTCCTCATAAAGTTATGAATAGAGCACAGAGGTGAACTGAAAACTATGAATTGAGCCAGTGTTCCCATTACTGGGC ATTTATCCGAAGGAAAGGAAATCCTGCGCTCCTGTGAATCCCTAGGTTCTTGCAAGTCCCTGATTTCCTGTGAGTCCCTGCGTTCCCATGAATCCCTTTGTTCCCATAAATTCCTTTGTTTTTGTGAGCCCCTGTGTTCCCGTGAATCCTT GCGCTCACGTGAACCACCTGTGTTCCCGTGAGCCTGTGCGTTCCCGTGAATCCCTGCATTCCCGTGAGGACCTGCGTTCTCAGGAGCCTCTGCGTTCTCGTTAGCCCCTGCATTGCCGTGAATCTGAGTTCCCATGATTGCCTGAGTTCCC ATGAGTCCCTGAGTTCCCGTGAGTCCCTGAGTTCCCGAGTCCCTGAGTTCCCGTGAGTCCTTGCATTCCCGTCAGTCCCTGAGTTCCCCTGAATCCCTGAGTTCCTGTGATTCCGAGTTTCTGATTCCCTGTGTTCCTCTGAGTCCCTGTA TCCTCGAGTATTCCTGTACCCTTGTGAAGCCCCCGAGTTTTTGTGCATACTTGCCTAGTACATGGCAAGTGATCAGGTACATTTTTTATTAATAAGGGATTGCATTGGGATTGGTGATTATCAAAATGGGTCAGTGCTAAATAGTGAATTT TCCCTCTGTCATTTCTTCTTTCCCCCTTCCATGAGAATCTCCTTGTGTAGCTGAGAAGCTGCAGTTAACATTGTAGCATCTTGATAGCGAATATAAATCCATAGACCTTGAGAAACAAGGCCTAGGGACTGTCGTCTGGAGAGTTAAGATT TACTTTAAGAAACAAGAGAAGATAAAGGAGATAAATGGCTTCAGCCAGTTGGCCAGGGTAGGAGGATGTGAGACTGGCAGCAAATGAGAGTCTCCCAGGCACACACAAGCCTCTGAGCTGCTACAAGTGGGAGACACCAGGAAAGAGGATT TTTTAATGTGGTTCAGAAATAAATCCTTTTTGACTTTACATTTAACCCACTATCAGTAGATATTCACATAGGTTTTTTTTTTTTAATTTTATTTTCTGGCCCGTGTCTTGAGGAGAGAATTAGTCTGGGCCCATGTTATTTATCCTGTTTG AAATGAGGGATTTGAGTAATCACCCCCCAGCCTGGCCTGGACGGTCACTTTACTCTTGGGCCATCTCCTTCTCTTGGCAGCAGTAGAAGGATGTGGCTCCAAGATTGTGTTTTCTCCTGTAGTGTGAGGTGAGTGGCCATGACATTGGGCT GAATTGATGCTGTGTTGGAGGCTGGATGGTGGTAAATCTATTTTTATTTAAATTTTCTGC....




    Many facets of the mechanisms of genome navigation are presumably already known. For example, the distribution and binding of transcription factors along the genome and the pausing of polymerases will undoubtedly play important roles. On the other hand, it is not fully understood what mechanism are able to transport these components to their respective target sites. The often tacit assumption that they travel along the genome by molecular diffusion is doubtful, as molecules that have to diffuse randomly throughout the large and dense chromatin matrix may not provide a sufficiently fast and accurate genomic search mechanism. This objection applies especially to the cases of stress responses such as heat shock, where a cell needs to mount a very rapid multi-component response.

    In order to learn some of the most basic requirements for the navigation of large genomes, one may look to our own information technology which, similar to genomes, operates under the constraints of relatively limited storage space and the need for short access times. In this case one of the most basic requirements is the dense distribution of formatting markers. They are needed to tell the mechanisms of data-storage and data-retrieval at every moment their present locations in memory during their search for target loci.

    I submit that large genomes may require the presence of similar sign posts. If this is correct, and they can be found in mammalian genomes, they would provide an important starting point for the search for the mechanisms of genome navigation.

    The medical significance of genome navigation:

    Imagine the havoc that misdirected transcription could cause in a cell!

    It seems quite important to understand the mechanism(s)of genome navigation, because possible failures or even slight deviations could have catastrophic medical consequences. For example, even a 'mild' slow-down of the search mechanisms may cause numerous diseases by delaying the synthesis and/or turnover of vital gene products and moving them out of a required synchrony. Worse, even a small mutation in the direction-giving elements may cause the misdirection of the search mechanism. By sending large numbers of polymerases to the wrong targets such a mutation may produce diseases that have no single cause, but are the result of hundreds and thousands of improper gene expressions that may seem functionally unrelated and, thus, render it almost intractable. One wonders whether cancer or various dementias are diseases of this kind.