Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 12, 506

The Human Genome: A Multifractal Analysis

Affiliations

The Human Genome: A Multifractal Analysis

Pedro A Moreno et al. BMC Genomics.

Abstract

Background: Several studies have shown that genomes can be studied via a multifractal formalism. Recently, we used a multifractal approach to study the genetic information content of the Caenorhabditis elegans genome. Here we investigate the possibility that the human genome shows a similar behavior to that observed in the nematode.

Results: We report here multifractality in the human genome sequence. This behavior correlates strongly on the presence of Alu elements and to a lesser extent on CpG islands and (G+C) content. In contrast, no or low relationship was found for LINE, MIR, MER, LTRs elements and DNA regions poor in genetic information. Gene function, cluster of orthologous genes, metabolic pathways, and exons tended to increase their frequencies with ranges of multifractality and large gene families were located in genomic regions with varied multifractality. Additionally, a multifractal map and classification for human chromosomes are proposed.

Conclusions: Based on these findings, we propose a descriptive non-linear model for the structure of the human genome, with some biological implications. This model reveals 1) a multifractal regionalization where many regions coexist that are far from equilibrium and 2) this non-linear organization has significant molecular and medical genetic implications for understanding the role of Alu elements in genome stability and structure of the human genome. Given the role of Alu sequences in gene regulation, genetic diseases, human genetic diversity, adaptation and phylogenetic analyses, these quantifications are especially useful.

Figures

Figure 1
Figure 1
Analyses of multifractal parameters: A: CGR of an H. sapiens chromosome I fragment (~80,000 bp). B: Generalized dimension spectra for two chromosome fragments with the highest (blue) and lowest multifractality (red). A medium multifractality is depicted (green) for comparison. C: Multifractal spectrum τ(q) for the fragments of B. D: Number of chromosome fragments per RM. E: Distribution of 2-D points (Dq (q = 1), Dq (q = -1)) of the human genome. Dq (q = 1) is called the information dimension.
Figure 2
Figure 2
Analyses of molecular parameters: Relationships between the MD versus A: Alu content, R2 ~0.86, p < 0.05 and B: Alu subfamilies: Alu-S (R2 ~0.84, p < 0.05), Alu-J (R2 ~0.7), and Alu-Y (R2 ~0.52). C: Alu content per range of ΔDq. D: Multifractality versus log (CGI), R2 ~0.64, p < 0.05. E: LINE, MIR, MER and LTR contents per RM. F: Distribution of 3-D points (Dq (q = 1), Dq (q = -1), Alu content) of the human genome. We used a cut point ≥ 217.9 Alus (blue dots) according to paragraph 1.4.
Figure 3
Figure 3
Multifractal map of the human genome. Overview between the MD (green) and Alu density (purple) across the human chromosomes. (*): VSTRs.
Figure 4
Figure 4
Genomic location of the most multifractal chromosome fragments: A: Discrimination method based on three parameters. Each chromosome fragment dataset is characterized by three quantities. The first quantity (x-axis) is the MD for each chromosomic fragment. The second quantity (y-axis) is the density of Alu content of the chromosomic fragments. The third quantity (z-axis) is the correlation coefficient τ(q). Blue color indicates those fragments with ΔDq ≥ 1.159 and Alu contents ≥ 217.9. B: Above, distribution for chromosome fragments with high multifractality and below, for fragments with LMM.
Figure 5
Figure 5
Distributions by gene function, gene family, and gene length: A: Gene functional distributions per RM. These distributions are strongly significant up to 80% of the ranges. B: Percentage of gene families per RM. Gene families: CA: Carbonic anhydrase, CD: cluster of differentiation, GPR: G protein-coupled receptors, KCN: potassium channels, OR: olfactory receptor, RPS: ribosomal proteins, SLC: solute carrier, SNORA: small nucleolar RNA, USP: ubiquitin-specific peptidases, ZNF: Zinc fingers, C2H2-type. C: Degree of gene fragmentation per RM. AGL: average of gene length, R2 ~0.55. AEL: average of exon length, R2 ~0.91. AIL: average of intron length, R2 ~0.74.
Figure 6
Figure 6
Multifractal classification for the human chromosomes: A: Distributions of the average degree of multifractality (Av. ΔDq) and Alu content per chromosome. B: Discrimination method based on multifractal formalism in a distribution of two-dimensional points, R2 ~0.967, p < 0.05. On top: hierarchical clustering for the averaged multifractal parameters by chromosome between Dq(-20, 20) (color scale bar is indicated). Minimum similarities are indicated near nodes and the asterisks show the only two exceptions found.
Figure 7
Figure 7
Multifractality per average of chromosome región: A: Multifractal distribution per chromosome, where each chr. region (each bar) has an equal length. Blue color represents those chromosome regions with high averaged multifractality (ΔDq > 1.04). Degraded blue-red color depicts medium multifractality (ΔDq ≤ 1.04). Red color: low multifractality. B: Correspondence between averages of gene, CGI (R2 ~0.62, p < 0.05) and Alu (R2 ~0.95, p < 0.05) contents versus averaged multifractality across the human chromosome 1.
Figure 8
Figure 8
Summary diagram: a conceptual non linear model for the human genome: From left to right multifractality increases. In A: multifractality profile for 9,379 chromosome fragments (from 0.79 to 1.56). In B: Figure 1D. In C above: Alu content profile for 9,379 chromosome fragments and below Figure 2C. In D: Figure 2E. In E: Figure 5A, C. In F: Figure 5B. In G: Figure 6B.

Similar articles

See all similar articles

Cited by 8 PubMed Central articles

See all "Cited by" articles

References

    1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C. et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. - DOI - PubMed
    1. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N. et al. The sequence of the human genome. Science. 2001;291:1304–51. doi: 10.1126/science.1058040. - DOI - PubMed
    1. International human genome sequencing consortium. Finishing the euchromatic sequence of the human genome. Nature. 2004;431:931–45. doi: 10.1038/nature03001. - DOI - PubMed
    1. Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N, Huang J, Kirkness EF, Denisov G, Lin Y, MacDonald JR, Pang AW, Shago M, Stockwell TB, Tsiamouri A, Bafna V, Bansal V, Kravitz SA, Busam DA, Beeson KY, McIntosh TC, Remington KA, Abril JF, Gill J, Borman J, Rogers YH, Frazier ME, Scherer SW, Strausberg RL. et al. The diploid genome sequence of an individual human. PLoS Biol. 2007;5:e254. doi: 10.1371/journal.pbio.0050254. - DOI - PMC - PubMed
    1. Versteeg R, van Schaik BDC, van Batenburg MF, Roos M, Monajemi R, Caron H, Bussemaker HJ, van Kampen AHC. The human transcriptome map reveals extremes in gene density, intron length, GC content, and repeat pattern for domains of highly and weakly expressed genes. Genome Research. 2003;13:1998–2004. doi: 10.1101/gr.1649303. - DOI - PMC - PubMed

Publication types

LinkOut - more resources

Feedback