For some of the examination of genomes presented here, it will be advantegeous to depict genome sequences in a novel way.
Instead of describing the as strings of letters, we will present them as optical images by the
GPxI method described earlier. Briefly, the method assigns to the bases of a DNA sequence the following gray-tone values: A: black, G: white,
C: dark gray and T: light gray (Fig.A1a). This assignment is, of course, arbitrary, but must remain the same throughout. It transforms the consecutive bases
of the sequence into a continuous line of pixels with varying gray values. In addition, the method requires the choice of an arbitrary, but also fixed
image width W. Whenever the line of pixels reaches W, it wraps around like any other text would, and continues at the beginning of the next line
immediately underneath. For example, the GPxI of a computer-constructed, random DNA sequence appears as the featureless dot-pattern shown in
Fig.A1b.
It is, of course, also possible to choose the image width equal to the size of the depicted sequences. In this way, an
array of sequences (e.g. the Alu-sequences) can be written in register .
Fig.A1. Basic principle of the 'genome pixel image' (GPxI) method.
(a). DNA sequence written in the tradional way and the assignment of a certain graytone to each base (insert).
(b). Writing the above DNA sequence from left to right while expressing each base as a single pixel with the assigned gray-value yields a line of pixesl with varying graytones.
(c).Whenever the pixel line has reached the edge of the image, it wraps around and continues on the left margin.
(d).By omitting the white spaces between the consecutive line the Genome Pixel Image (GPxI) emerges.
(e).Examples of the GPxIs of a random DNA file and a highly structured part of the human X-chromosome.
Fig.A1f. Animation of the basic principle of the 'genome pixel image' (GPxI) method.
Examples of GPxIs and their interpretations
Figure A2 shows the GPxI of the first 150 Kb of the human X chromosome (Fig.A2a). While the size of 150 Kb is already too large for many applications of the traditional alignment methods, the striking patterns visible in the GPxI image in near the 5’ end highlight immediately the exact location for candidates of repetitive sequences without any prior knowledge of any special properties of the sequences in this location. Furthermore, one can see immediately that these special sequences occur in 2 clusters separated by a large stretch of non-repetitive DNA. Their pseudo-repetitive character becomes obvious through the action of 2 consecutive magnifications shown of Figures A2b and A2c: The larger the magnification, the less the repetitions of any patterns become detectable.
Fig.A2. GPxI of the first 150 Kb of the human X chromosome (Un-sequenced portions are omitted). (Scales: 50[b]/division)
(a). The appearance of several pseudo-repetitive sequences as various, seemingly repetitive patterns. The appearance of identical repetition
vanishes with increasing magnification of the GP demonstrating the power of the human visual sense to still detect rules and relationships between
DNA sequences even after mutations and variations have obliterated them to a large degree.
(b). Enlargement of the portion of the GPxI within the black frame in panel a.
(c). Enlargement of the portion of the GPxI within the black frame in panel b.
The distinction between repetitive and pseudo-repetitive sequences can be tested by GPxIs in a much more objective way, too. Obviously, the appearance of any patterns depends on the width of the GPxI, as it determines which downstream part of a sequence is written directly below it. Given a series of truly repetitive motifs there will exist specific values for the GPxI-width where the motifs fall into perfect register and, thus, generate a pattern of vertical lines. As shown in Figure A3a, at a GPxI-width of 610 [b] the obliquely striped patterns shown in Fig.A2a seem to turn into vertical lines (marked as ‘1’ in Figure A3). However, as shown by the magnified inset at the right hand side, the vertical lines are not perfect. Instead, they fall into 3 groups shifted out of register by 2 insertions. Furthermore, they are interrupted by numerous point mutations that appear as differently colored pixels within many vertical lines. Both properties identify them not only as pseudo-repeats, but also identify the causes of their differences.
Fig.A3.Effect of GPxI-width on pattern appearance and recognition on a portion of the GPxI of Figure A2. The numbers 1,2,and 3 indicate
the same domains on each panel. Enlargments of these domains are shown on the right hand side.(Scale: 50[b]/division)
(a). GPxI-width = 610 [b]. The pattern at '1' turns vertical but, as shown by the enlargement, contains deviations in the form of 2 shifts
(=insertions) and single deviant pixels (=point mutations).
(b). GPxI-width = 568 [b]. The domains '2' and '3' appear almost random.
(c). GPxI-width = 551 [b]. Domain '2' shows a clear periodicity with few deviations. Domain '3'' shows pseudo-repetitive patterns.
The method of changing the width of the GPxI may also bring out the existence of otherwise easily overlooked relationships. For example, the domains labeled as ‘2’ and ‘3’ in Figure A3 may appear rather unstructured and, thus, unrelated at a GPxI-width of 568 [b], whereas pseudo-repetitive patterns become clearly visible at 551 [b] GPxI-width.
In the above examples, the GPx Images depicted continuous genome sequences that continued in the next line underneath because the string of pixels had reached the margin of the image. In this way, the above patterns became quite visisble. However, most parts of natural genomes do not contain as many pseudo-repetitive sequences as the above examples. Nevertheless, visual pattern recognition can be used to detect homologies, similarities or deviations from homologies etc. very easily using the GPxI method. After placing isolated segments of related genome sequences underneath each other and in register one can compare them visually with each other, even though they belong to different parts of genomes or different genomes of different species. For example, Fig. A4 shows a collection of Alu-sequences from human chromosome 1 placed in register. It is easy to detect their common features as well as their individual differences (mutations) in this way.
Fig.A4.Visual comparison of 100 selected segments of human chromosome 1 that contain an Alu-sequence with up to 50 point mutations compared to AluY which were placed in register with their
downstream ends (=right hand side where the poly-A ('pA') portion begins).
The labels indicate uf: upstream flank, df: downstream flank, pA: typical poly-A portion at
the downstream end of Alu-sequences.
Statistical properties such as AT- or GC-richness can also be depicted by the GPxI method. In this case one may use color coding and superimpose the color on the GPx Image. As the GPx Image is generated line by line, the computer tallies up the ratio r = (A+T)/G+C) cumulatively as it writes a line of the image. As long as r remains close to unity, the computer tints the pixel green. However, if a particular line is AT-rich, r will rise above unity from left to right and the tint of the pixels will correspondingly become increasingly blue. On the other hand, if the line is GC-rich it will be made increasingly red corresponding to the decrease of r from left to right.
Fig.A5.Line by line assessment of the ratio r = (A+T)/(G+C) in the human Y-chromosome between positions 2,496,000 and 2,808,000.
As the ratio increases from left
to right in AT-rich genome segments, the color becomes increasingly blue. If it decreases from left to right in GC-rich genome segments, the color becomes increasingly red. Green lines
indicate balanced ratios along the line.
GPx Images, like all other images, can be animated, provided one knows a computer-generated or observed time sequence of the depicted DNA sequence. Fig. A6 shows the example of a computer generated depiction of random transpositions on a portion of the human X-chromosome.
Fig.A6.Animation of a segment of the human X-chromosome to illustrate the effect of (computer-generated) transpositions.
Transpositions appear as shifts of a
portion of the sequence, preceeded by the temporary appearance of a white segment where the transposon originated)acting on the segment