Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Nov 26:9:562.
doi: 10.1186/1471-2164-9-562.

Refined annotation and assembly of the Tetrahymena thermophila genome sequence through EST analysis, comparative genomic hybridization, and targeted gap closure

Affiliations

Refined annotation and assembly of the Tetrahymena thermophila genome sequence through EST analysis, comparative genomic hybridization, and targeted gap closure

Robert S Coyne et al. BMC Genomics. .

Abstract

Background: Tetrahymena thermophila, a widely studied model for cellular and molecular biology, is a binucleated single-celled organism with a germline micronucleus (MIC) and somatic macronucleus (MAC). The recent draft MAC genome assembly revealed low sequence repetitiveness, a result of the epigenetic removal of invasive DNA elements found only in the MIC genome. Such low repetitiveness makes complete closure of the MAC genome a feasible goal, which to achieve would require standard closure methods as well as removal of minor MIC contamination of the MAC genome assembly. Highly accurate preliminary annotation of Tetrahymena's coding potential was hindered by the lack of both comparative genomic sequence information from close relatives and significant amounts of cDNA evidence, thus limiting the value of the genomic information and also leaving unanswered certain questions, such as the frequency of alternative splicing.

Results: We addressed the problem of MIC contamination using comparative genomic hybridization with purified MIC and MAC DNA probes against a whole genome oligonucleotide microarray, allowing the identification of 763 genome scaffolds likely to contain MIC-limited DNA sequences. We also employed standard genome closure methods to essentially finish over 60% of the MAC genome. For the improvement of annotation, we have sequenced and analyzed over 60,000 verified EST reads from a variety of cellular growth and development conditions. Using this EST evidence, a combination of automated and manual reannotation efforts led to updates that affect 16% of the current protein-coding gene models. By comparing EST abundance, many genes showing apparent differential expression between these conditions were identified. Rare instances of alternative splicing and uses of the non-standard amino acid selenocysteine were also identified.

Conclusion: We report here significant progress in genome closure and reannotation of Tetrahymena thermophila. Our experience to date suggests that complete closure of the MAC genome is attainable. Using the new EST evidence, automated and manual curation has resulted in substantial improvements to the over 24,000 gene models, which will be valuable to researchers studying this model organism as well as for comparative genomics purposes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Results of MIC/MAC comparative genomic hybridization. A: Distribution of MIC scaffold ratios. Red line: proposed separation of MAC-destined (maD) DNA scaffolds (on the left) and MIC-limited (miL) scaffolds (on the right). B: Scatter plot of MIC scaffold ratios as a function of scaffold length. Pink and aqua points: maD and miL DNA, respectively, by the log2 ratio criterion in Figure 1A. Black diamonds and small black circles, respectively: miL and maD scaffolds with high sequence identity to miL transposon genes. The maD distribution is more diffuse as the length decreases to the minimum scaffold length (1,000 bp). This is attributed to the fact that the number of probes is roughly proportional to scaffold length. Given a uniform intrinsic variability in hybridization ratios for each probe, the variance of the scaffold means is expected to vary inversely with scaffold length. The secondary peak in the maD distribution (around log2 ratio = -0.45) in 1A and the multimodality of the maD distribution in 1B (most clearly seen for scaffolds > 50 kb) are caused by the partial loss of MIC chromosome segments in the cells used for the MIC DNA preps (Orias and Hamilton, unpublished observations).
Figure 2
Figure 2
Distribution of EST gene hits. The x-axis is divided into bins by the total number of validated ESTs (from all libraries) hitting a given gene. The y-axis depicts the percent of ESTs from each of the six conditions that fall into the indicated x-axis bin. For example, the bin containing genes matched by between 2 and 10 ESTs contains 8,426 matches from the conjugation condition (TTE and FCO libraries). The total of all CNJ ESTs is 18,837 (see Table 1). The percent of total CNJ ESTs in this bin is therefore 8,426/18,837 = 44.7%. Abbreviations as in Table 1.
Figure 3
Figure 3
Venn diagrams of the overlap in EST representation for all genes detected in (A) the four vegetative growth conditions and (B) the combined vegetative pool vs. starvation or conjugation.

Similar articles

Cited by

References

    1. Asai DJ, Forney JD, Eds . Tetrahymena thermophila. San Diego, CA: Academic Press; 2000.
    1. Collins K, Gorovsky MA. Tetrahymena thermophila. Curr Biol. 2005;15:R317–318. - PubMed
    1. Eisen JA, Coyne RS, Wu M, Wu D, Thiagarajan M, Wortman JR, Badger JH, Ren Q, Amedeo P, Jones KM, et al. Macronuclear Genome Sequence of the Ciliate Tetrahymena thermophila, a Model Eukaryote. PLoS Biol. 2006;4:e286. - PMC - PubMed
    1. Karrer KM. Tetrahymena genetics: two nuclei are better than one. Methods Cell Biol. 2000;62:127–186. - PubMed
    1. Aury JM, Jaillon O, Duret L, Noel B, Jubin C, Porcel BM, Segurens B, Daubin V, Anthouard V, Aiach N, et al. Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraurelia. Nature. 2006 - PubMed

Publication types