Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2019 Jan 16;10(1):260.
doi: 10.1038/s41467-018-08260-0.

Chromosome-level assembly of the water buffalo genome surpasses human and goat genomes in sequence contiguity

Affiliations
Comparative Study

Chromosome-level assembly of the water buffalo genome surpasses human and goat genomes in sequence contiguity

Wai Yee Low et al. Nat Commun. .

Abstract

Rapid innovation in sequencing technologies and improvement in assembly algorithms have enabled the creation of highly contiguous mammalian genomes. Here we report a chromosome-level assembly of the water buffalo (Bubalus bubalis) genome using single-molecule sequencing and chromatin conformation capture data. PacBio Sequel reads, with a mean length of 11.5 kb, helped to resolve repetitive elements and generate sequence contiguity. All five B. bubalis sub-metacentric chromosomes were correctly scaffolded with centromeres spanned. Although the index animal was partly inbred, 58% of the genome was haplotype-phased by FALCON-Unzip. This new reference genome improves the contig N50 of the previous short-read based buffalo assembly more than a thousand-fold and contains only 383 gaps. It surpasses the human and goat references in sequence contiguity and facilitates the annotation of hard to assemble gene clusters such as the major histocompatibility complex (MHC).

PubMed Disclaimer

Conflict of interest statement

S.B.K. is an employee of Pacific Biosciences, T.S. is an employee of Dovetail Genomics.

Figures

Fig. 1
Fig. 1
An overview of assembly methods. Contig assembly was carried out with the diploid assembler FALCON-Unzip to produce primary contigs and haplotigs. It began with selection of longest “seed” reads and shorter reads were aligned to them to create pre-assemble reads using a consensus approach. The primary contigs were carried forward to the scaffolding step that began with Chicago reads for short range scaffolding (1–100 kb) with HiRise. Then long-range scaffolding (10–10,000 kb) was carried out with Hi-C reads to cluster scaffolds to the chromosome level. Each join of contigs to create a scaffold was checked against an LD map and for conservation of synteny with cattle and goat. Then long-reads were used to fill gaps and polish the sequence, followed by indel correction with short reads
Fig. 2
Fig. 2
A circos plot of B. bubalis chromosome mapping to B. taurus. Chromosome 1–5 in B. bubalis are sub-metacentric and clear mapping to the expected homologous B. taurus (UMD3.1) chromosomes is found. Conservation of synteny of all B. bubalis chromosomes to B. taurus matched the whole-genome RH map
Fig. 3
Fig. 3
Structural differences between UMD_CASPUR_WB_2.0 and UOA_WB_1. a Venn diagram of structural differences called in UMD_CASPUR_WB_2.0 and haplotigs when UOA_WB_1 was used as the reference. The 8664 unique and 1313 overlapping differences in haplotigs represent heterozygous alleles. Structural differences present only in UMD_CASPUR_WB_2.0 are likely assembly errors. b Total bases of structural differences in categories deletion, insertion, repeat contraction, repeat expansion, tandem contraction, and tandem expansion. For example, for deletion, we report the number of bases found in UOA_WB_1 but missing in UMD_CASPUR_WB_2.0. c Count of structural differences in the categories from part b, partitioned by size
Fig. 4
Fig. 4
Comparisons of gaps and sequence contiguity between human, goat, and water buffalo assemblies. a Barplot of number of gaps by chromosomes. b Distribution of un-gapped contig lengths between the assemblies of the 3 species. Wilcoxon rank sum, one-sided test (water buffalo (n = 480) against human (n = 687), W = 212,810; water buffalo (n = 480) against goat (n = 680), W = 165,300; p-value after Bonferroni correction <0.05)
Fig. 5
Fig. 5
Resolution of hard to assemble repetitive and polymorphic regions. a Violin plot of repeat lengths >2 kb for LINE/L1, LINE/RTE-BovB and satellite/centromeric repeats for ARS1, UMD_CASPUR_WB_2.0 and UOA_WB_1 assemblies. b Dot plot of a ~218 kb region of MHC class II in UOA_WB_1 (horizontal) against UMD_CASPUR_WB_2.0 (vertical) showing a substantial level of repetition throughout the region. c Resolved MHC class II genes present on the single contig in UOA_WB_1 also shown in b. Protein-coding genes in UOA_WB_1 are shown for the same single contig, with assembly gaps for the same region in UMD_CASPUR_WB_2.0

Similar articles

Cited by

References

    1. Meuwissen T, Hayes B, Goddard M. Accelerating improvement of livestock with genomic selection. Annu. Rev. Anim. Biosci. 2013;1:221–237. doi: 10.1146/annurev-animal-031412-103705. - DOI - PubMed
    1. Koren S, Phillippy AM. One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Curr. Opin. Microbiol. 2015;23:110–120. doi: 10.1016/j.mib.2014.11.014. - DOI - PubMed
    1. Human Genome Sequencing Consortium, I. Finishing the euchromatic sequence of the human genome. Nature. 2004;431:931–945. doi: 10.1038/nature03001. - DOI - PubMed
    1. Eid J, et al. Real-time DNA sequencing from single polymerase molecules. Science. 2009;323:133–138. doi: 10.1126/science.1162986. - DOI - PubMed
    1. Rhoads A, Au KF. PacBio sequencing and its applications. Genom. Proteom. Bioinform. 2015;13:278–289. doi: 10.1016/j.gpb.2015.08.002. - DOI - PMC - PubMed

Publication types