Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jul 20:13:324.
doi: 10.1186/1471-2164-13-324.

Composition and organization of active centromere sequences in complex genomes

Affiliations

Composition and organization of active centromere sequences in complex genomes

Karen E Hayden et al. BMC Genomics. .

Abstract

Background: Centromeres are sites of chromosomal spindle attachment during mitosis and meiosis. While the sequence basis for centromere identity remains a subject of considerable debate, one approach is to examine the genomic organization at these active sites that are correlated with epigenetic marks of centromere function.

Results: We have developed an approach to characterize both satellite and non-satellite centromeric sequences that are missing from current assemblies in complex genomes, using the dog genome as an example. Combining this genomic reference with an epigenetic dataset corresponding to sequences associated with the histone H3 variant centromere protein A (CENP-A), we identify active satellite sequence domains that appear to be both functionally and spatially distinct within the overall definition of satellite families.

Conclusions: These findings establish a genomic and epigenetic foundation for exploring the functional role of centromeric sequences in the previously sequenced dog genome and provide a model for similar studies within the context of less-characterized genomes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
General strategy for informatic and functional analysis of centromere satellite domains in complex genomes. The diagram and underlying flow chart highlights three phases involved in the sequence processing and centromeric database construction. The first phase defines the sequences that are unassigned to a specific chromosome in the current genome reference assembly (all reads in that are unassembled as well as constitute the assembled unmapped contigs; or canFam2.0 chrUn). Of the tandemly repeated satellite sequence families within this database, seven were enriched in centromeric regions, resulting in an inventory of all satellites and any adjacent non-satellite sequences. Phase II reformats the read database from Phase I into a list of unique k-mers demonstrated to be specific to the pericentromere and each determined to be single-copy or multi-copy based on observed sequence frequency in the genome. These k-mers result in a library describing all inherent sequence variation in centromeric regions and are useful for investigating enrichment trends using next gen sequence datasets in Phase III, such as CENP-A ChIP sequence reads. Comparative analyses result in a list of functional k-mers that define the genomic context of the centromere. K-mers are mapped back to the read and paired read dataset to study regional sequence organization.
Figure 2
Figure 2
Characterizing functional satellite sequence features. Centromere sequence features associated with CENP-A ChIP sequences. (A) Reads were initially mapped to canFam2.0 and characterized relative to sequence classification, as indicated in pie graph. (B) Both CarSat1 and CarSat2 are highly enriched in the CENP-A ChIPseq dataset (p < 0.01) relative to genomic background estimates (as demonstrated by red dotted line). Other satellite families showed no evidence of enrichment and are combined into one data point. (C) CarSat satellite families (CarSat1 and CarSat2) show enrichment of select sequences in the CENP-A ChIP dataset on an xy-plot of two replicate enrichment estimates (log transformed relative enrichment scores), highlighting in red in the upper right quadrant those k-mers that are enriched in both comparisons as delineated with grey dotted lines. (D) CarSat k-mers that are enriched (red) compared to those that are not enriched (black), as a function of their observed frequency in the genome. Both high-copy and low-copy number k-mers are enriched in both satellite families.
Figure 3
Figure 3
CarSat satellite family contains functional sequence subtypes. (A) Phylogenetic analysis of reads containing a full-length CarSat1 monomer illustrate largely distinct clades of reads associated with CENP-A (CENP-A[+]; red) or not associated with CENP-A (CENP-A[−]; gray). (B) The subset of reads containing full-length monomers was further characterized by sliding window 200 bp clustering approach (see Methods) and assigned to distinct sequence subgroups, as indicated by different colors. CENP-A[−] reads are highly similar in the 3’ end of the monomer but divide into definable major subgroups in the 5’ end; CENP-A[+] reads appear to have the inverted similarity pattern. Phylogenetic analysis of the 5’ end of CarSat1 reads shows distinct clades that distinguish CENP-A[+] from CENP-A[−] sequences. A similar analysis of the 3’ end of CarSat1 reads. Overall, [+] and [−] reads could be classified into four predominant monomer types, shown as turquoise-black and blue-black for CENP-A[−], and maroon-red and maroon-yellow for CENP-A[+]. There are smaller subfamilies, one in CENP-A[−] (pink-purple) and one in CENP-A[+] (maroon-yellow) that are far less abundant and appear to clade together. (C) Paired read frequency patterns between monomer cluster types predict that the CENP-A-containing satellites (CENP-A[+]] are spatially distinct from the non-CENP-A-containing satellites (CENP-A[−]) at dog centromeres. Relative node sizes represent read depth for each of the 200 bp windows, while lines represent a minimum threshold for paired-read connectivity. Three sequence groups are identified: CENP-A[+] array, highlighted in red, and two CENP-A[−] arrays in grey. CENP-A[−] arrays can be further divided into two groups, both minimally connected to CENP-A[+] domain through transitional monomer clusters. Model of predicted genomic organization at dog centromeres, indicating the two major types (CENP-A [+] and [−]) and predicted transition monomers at bottom.

Similar articles

Cited by

References

    1. Eichler EE. Repetitive conundrums of centromere structure and function. Hum Mol Genet. 1999;8(2):151–155. - PubMed
    1. Eichler EE, Clark RA, She X. An assessment of the sequence gaps: unfinished business in a finished human genome. Nat Rev Genet. 2004;5(5):345–354. - PubMed
    1. Rudd MK, Willard HF. Analysis of the centromeric regions of the human genome assembly. Trends Genet. 2004;20(11):529–533. - PubMed
    1. Alkan C, Cardone MF, Catacchio CR, Antonacci F, O'brien SJ, Ryder OA, Purgato S, Della Zoli M, Valle G, Eichler EE. et al.Genome-wide characterization of centromeric satellites from multiple mammalian genomes. Genome Res. 2011;21(1):137–145. - PMC - PubMed
    1. Schueler MG, Higgins AW, Rudd MK, Gustashaw K, Willard HF. Genomic and genetic definition of a functional human centromere. Science. 2001;294(5540):109–115. - PubMed

Publication types