Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(10):e48375.
doi: 10.1371/journal.pone.0048375. Epub 2012 Oct 31.

NetView: A High-Definition Network-Visualization Approach to Detect Fine-Scale Population Structures From Genome-Wide Patterns of Variation

Affiliations
Free PMC article

NetView: A High-Definition Network-Visualization Approach to Detect Fine-Scale Population Structures From Genome-Wide Patterns of Variation

Markus Neuditschko et al. PLoS One. .
Free PMC article

Abstract

High-throughput sequencing and single nucleotide polymorphism (SNP) genotyping can be used to infer complex population structures. Fine-scale population structure analysis tracing individual ancestry remains one of the major challenges. Based on network theory and recent advances in SNP chip technology, we investigated an unsupervised network clustering method called Super Paramagnetic Clustering (Spc). When applied to whole-genome marker data it identifies the natural divisions of groups of individuals into population clusters without use of prior ancestry information. Furthermore, we optimised an analysis pipeline called NetView, a high-definition network visualization, starting with computation of genetic distance, followed clustering using Spc and finally visualization of clusters with Cytoscape. We compared NetView against commonly used methodologies including Principal Component Analyses (PCA) and a model-based algorithm, Admixture, on whole-genome-wide SNP data derived from three previously described data sets: simulated (2.5 million SNPs, 5 populations), human (1.4 million SNPs, 11 populations) and cattle (32,653 SNPs, 19 populations). We demonstrate that individuals can be effectively allocated to their correct population whilst simultaneously revealing fine-scale structure within the populations. Analyzing the human HapMap populations, we identified unexpected genetic relatedness among individuals, and population stratification within the Indian, African and Mexican samples. In the cattle data set, we correctly assigned all individuals to their respective breeds and detected fine-scale population sub-structures reflecting different sample origins and phenotypes. The NetView pipeline is computationally extremely efficient and can be easily applied on large-scale genome-wide data sets to assign individuals to particular populations and to reproduce fine-scale population structures without prior knowledge of individual ancestry. NetView can be used on any data from which a genetic relationship/distance between individuals can be calculated.

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Spc and NetView analysis of the simulated data set.
(A) Population structure used for creating the simulated data (adapted from Lawson et al. [19]). (B) Spc tree of clusters representing the grouping of individuals with k-NN = 10. The individuals have been separated into 5 clusters, representing the three main populations and the additional existence of two sub-populations (PopA1 and PopA2, PopB1 and PopB2). Each cluster is represented by a box; with Y axis positions indicating the stability of each cluster, whilst the X-axis positions are indicating the proximity between clusters. (C) High-definition network visualization (NetView) of the simulated population structure. Each individual is represented by a node; with the different shades denote the sample origin. The thickness of edges varies in proportion to the genetic distance and has been used to visualize individual relationships within and between populations. The node size varies in proportion to the numbers of edges per node, and illustrates how well each individual is connected within the population.
Figure 2
Figure 2. PCA scatter plots of the simulated data set.
Projection of individuals from 5 populations onto a two dimensional (X,Y) subspace of four PCs. The panels A to D show pair wise comparison of PC combinations. Each individual is represented by a datum point. Each sub-population is denoted by a separate colour. The variation captured by each PC is indicated in parenthesis next to the axis label.
Figure 3
Figure 3. Cluster assignment of the simulated population data following analysis by Admixture using 2–5 clusters (K).
Individuals are presented by a single vertical column divided into K colours. Each colour represents one cluster, and the length of the coloured segment corresponds to the individuals estimated proportion of membership in that cluster. For each K, 10 iterations were performed. The panels A to D represent the cluster patterns at K = 2 to 5.
Figure 4
Figure 4. Spc and NetView analysis of human HapMap reference population after removal of closely related individuals.
(A) Spc tree of clusters representing the grouping of 1,159 unrelated individuals. All individuals have been separated into 11 clusters, representing 9 distinct populations and the existence of sub-structures within GIH and MKK samples. (B) NetView of the 1,159 assumed unrelated individuals. The topology of the network highlights the sub-structures within GIH, MXL and MKK and reveals a close relationship between CEU and TSI as well as between ASW and MKK. The identified outliers and key individuals of the population are indicated by their HapMap ID.
Figure 5
Figure 5. Alternative (organic) NetView of populations with evidence of internal sub-structures.
Organic visualization style of (A) ASW/MKK and (B) TSI/CEU as implemented in software Cytoscape . The network structure of this visualization highlights the existence of sub-structures and clearly identifies cross-linking individuals.
Figure 6
Figure 6. Spc and NetView analysis of the Bovine HapMap data.
(A) Spc tree of clusters representing the grouping of 477 animals represented in the Bovine HapMap data set . The animals have been allocated into 19 clusters, representing 18 out of 19 breeds and the existence of sub-structures within JER (JER_1 and JER_2), and a merged Angus cluster (ANG and RGU). (B) NetView of 477 bovine HapMap samples from Bos taurus, Bos indicus and admixed origins. The topology of the network reflects the genetic relatedness between cattle breeds and reveals sub-structures within LMS, SHK and ANG cluster.

Similar articles

See all similar articles

Cited by 28 articles

See all "Cited by" articles

References

    1. Serre D, Montpetit A, Paré G, Engert JC, Yusuf S, et al. (2008) Correction of population stratification in large multi-ethnic association studies. PLoS One 3: e1382. - PMC - PubMed
    1. Bowden R, MacFie TS, Myers S, Hellenthal G, Nerrienet E, et al. (2012) Genomic tools for evolution and conservation in the chimpanzee: Pan troglodytes ellioti is a genetically distinct population. PLoS Genet 8: e1002504. - PMC - PubMed
    1. Pushkarev D, Neff NF, Quake SR (2009) Single-molecule sequencing of an individual human genome. Nat Biotechnol 27: 847–850. - PMC - PubMed
    1. Schuster SC (2008) Next-generation sequencing transforms today’s biology. Nat Methods 5: 16–18. - PubMed
    1. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, et al. (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456: 53–59. - PMC - PubMed

Grant support

The authors have no support or funding to report.
Feedback