Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov 15;10:1169.
doi: 10.3389/fgene.2019.01169. eCollection 2019.

Towards the Complete Goat Pan-Genome by Recovering Missing Genomic Segments From the Reference Genome

Affiliations
Free PMC article

Towards the Complete Goat Pan-Genome by Recovering Missing Genomic Segments From the Reference Genome

Ran Li et al. Front Genet. .
Free PMC article

Abstract

It is broadly expected that next generation sequencing will ultimately generate a complete genome as is the latest goat reference genome (ARS1), which is considered to be one of the most continuous assemblies in livestock. However, the rich diversity of worldwide goat breeds indicates that a genome from one individual would be insufficient to represent the whole genomic contents of goats. By comparing nine de novo assemblies from seven sibling species of domestic goat with ARS1 and using resequencing and transcriptome data from goats for verification, we identified a total of 38.3 Mb sequences that were absent in ARS1. The pan-sequences contain genic fractions with considerable expression. Using the pan-genome (ARS1 together with the pan-sequences) as a reference genome, variation calling efficacy can be appreciably improved. A total of 56,657 spurious SNPs per individual were repressed and 24,414 novel SNPs per individual on average were recovered as a result of better reads mapping quality. The transcriptomic mapping rate was also increased by ∼1.15%. Our study demonstrated that comparing de novo assemblies from closely related species is an efficient and reliable strategy for finding missing sequences from the reference genome and could be applicable to other species. Pan-genome can serve as an improved reference genome in animals for a better exploration of the underlying genomic variations and could increase the probability of finding genotype-phenotype associations assessed by a comprehensive variation database containing much more differences between individuals. We have constructed a goat pan-genome web interface for data visualization (http://animal.nwsuaf.edu.cn/panGoat).

Keywords: de novo assembly; goats; pan-genome; pan-sequences; reference genome.

Figures

Figure 1
Figure 1
Phylogenetic relationship of Caprini species (A) and their genomic divergence (B). Each of the representative genomes of other Caprini species was compared with the goat reference genome to estimate the genomic divergence.
Figure 2
Figure 2
Characteristic of pan-sequences. (A) Length distribution of pan-sequences. (B) Homolog identification of pan-sequences within seven non-Caprini bovid species. (C) Frequency distribution of pan-sequences in domestic goats. (D) The cumulative size of pan-sequences by sequentially adding de novo assemblies of eight Caprini species (blue line) as compared with simulated sequence length by adding goat individuals (red line). The simulated sequence length was calculated using the formula as described in methods.
Figure 3
Figure 3
The source of pan-sequences. (A) An example of pan-sequences resulting from insertions. A region of 18.8 kb was found to be present in goat by comparing Oar4.0 with ARS1 which was supported by reads mapping information. (B) An example of pan-sequences resulting from assembly errors in ARS1. The dot plots showed a region of 148 kb identified from chr20 of Oar4.0 that was missing in chr23 of ARS1. The presence of this region was supported by synteny with chr23 of CHIR2.0 and by the reads mapping information.
Figure 4
Figure 4
Improvement of reads mapping for resequencing data using pan-genome versus ARS1. (A) Comparison of mapping ratio of resequencing data using pan-genome versus ARS1. (B) The mapping quality of reads from pan-sequences as compared with their original mapping quality on ARS1. (C) The number of identified SNPs for the 10 goat samples using pan-genome versus ARS1. (D) The reads mapping quality was improved within the red rectangle accompanied by repression of false SNPs removal of the low-quality mapped reads. Pan-base specifically refers to the ARS1 proportion in the pan-genome when using the pan-genome as the reference for mapping whereas ARS1 refers to using the ARS1 as the reference for mapping. T-test was used for the comparison. ** P < 0.01.
Figure 5
Figure 5
Improvement of reads mapping for transcriptomic data using pan-genome versus ARS1. (A) Comparison of mapping ratio of resequencing data using pan-genome versus ARS1. (B) The mapping quality of reads from pan-sequences as compared with their original mapping quality on ARS1. (C) The expression of pan-sequences across nine tissues. T-test was used for the comparison ** P < 0.01.
Figure 6
Figure 6
Overview of goat pan-genome database features.

Similar articles

See all similar articles

References

    1. Alkan C., Kidd J. M., Marques-Bonet T., Aksay G., Antonacci F., Hormozdiari F., et al. (2009). Personalized copy number and segmental duplication maps using next-generation sequencing. Nat. Genet. 41 (10), 1061–1067. 10.1038/ng.437 - DOI - PMC - PubMed
    1. Ameur A., Che H., Martin M., Bunikis I., Dahlberg J., Hoijer I., et al. (2018). De novo assembly of two Swedish genomes reveals missing segments from the human GRCh38 reference and improves variant calling of population-scale sequencing data. Genes (Basel) 9 (10), 486. 10.3390/genes9100486 - DOI - PMC - PubMed
    1. Bailey J. A., Liu G., Eichler E. E. (2003). An Alu Transposition Model for the Origin and Expansion of Human Segmental Duplications. Am. J. Hum. Genet. 73 (4), 823–834. 10.1086/378594 - DOI - PMC - PubMed
    1. Bibi F., Vrba E., Fack F. (2012). A new African fossil caprin and a combined molecular and morphological bayesian phylogenetic analysis of caprini (Mammalia: Bovidae). J. Evol. Biol. 25 (9), 1843–1854. 10.1111/j.1420-9101.2012.02572.x - DOI - PubMed
    1. Bickhart D. M., Rosen B. D., Koren S., Sayre B. L., Hastie A. R., Chan S., et al. (2017). Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat. Genet. 49 (4), 643–650. 10.1038/ng.3802 - DOI - PMC - PubMed
Feedback