Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Sep;157(1):14-28.
doi: 10.1104/pp.111.179663. Epub 2011 Jul 5.

A White Spruce Gene Catalog for Conifer Genome Analyses

Affiliations
Free PMC article

A White Spruce Gene Catalog for Conifer Genome Analyses

Philippe Rigault et al. Plant Physiol. .
Free PMC article

Abstract

Several angiosperm plant genomes, including Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), poplar (Populus trichocarpa), and grapevine (Vitis vinifera), have been sequenced, but the lack of reference genomes in gymnosperm phyla reduces our understanding of plant evolution and restricts the potential impacts of genomics research. A gene catalog was developed for the conifer tree Picea glauca (white spruce) through large-scale expressed sequence tag sequencing and full-length cDNA sequencing to facilitate genome characterizations, comparative genomics, and gene mapping. The resource incorporates new and publicly available sequences into 27,720 cDNA clusters, 23,589 of which are represented by full-length insert cDNAs. Expressed sequence tags, mate-pair cDNA clone analysis, and custom sequencing were integrated through an iterative process to improve the accuracy of clustering outcomes. The entire catalog spans 30 Mb of unique transcribed sequence. We estimated that the P. glauca nuclear genome contains up to 32,520 transcribed genes owing to incomplete, partially sequenced, and unsampled transcripts and that its transcriptome could span up to 47 Mb. These estimates are in the same range as the Arabidopsis and rice transcriptomes. Next-generation methods confirmed and enhanced the catalog by providing deeper coverage for rare transcripts, by extending many incomplete clusters, and by augmenting the overall transcriptome coverage to 38 Mb of unique sequence. Genomic sample sequencing at 8.5% of the 19.8-Gb P. glauca genome identified 1,495 clusters representing highly repeated sequences among the cDNA clusters. With a conifer transcriptome in full view, functional and protein domain annotations clearly highlighted the divergences between conifers and angiosperms, likely reflecting their respective evolutionary paths.

Figures

Figure 1.
Figure 1.
The GCAT process applied to P. glauca. A, Overview of the EST, clone, and FL-cDNA analyses steps. The clone analysis and the FL-cDNA sequencing enable iterative clustering, ultimately helping to optimize gene models, annotations, and downstream applications. B, Size distribution and sequence completion of the representative cDNA clones for the 27,720 cDNA clusters. For incomplete clones, a minimum length was estimated based on available sequence.
Figure 2.
Figure 2.
Occurrence and classification of protein family domains in P. glauca relative to angiosperms. The total number of P. glauca cDNA clusters containing each of the Pfam-A domains was compared with Arabidopsis, rice, and poplar genes (for entire list, see Supplemental Table S6) by using normalized angiosperm data to account for the number of overall genes in each species. The number of overrepresented (A) and underrepresented (B) protein domains in P. glauca was determined by χ2 testing (P < 0.05, with Bonferroni correction) for Pfam domains found six times or more in at least one of the species compared and with a 50% difference between the species. The Pfam domains that were statistically different in at least two out of three comparisons were classified into major biological groups based upon TAIR annotations of Arabidopsis homologs (Gene Ontology process) and Interpro and Pfam descriptions. DUF, Domains of unknown function.
Figure 3.
Figure 3.
Relative frequencies of major Pfam domains found in TF. Frequencies were determined for Pfam-A domains with hits in three or more genes in P. glauca and calculated relative to the total number of genes containing TF Pfam domains within each of the species (P. glauca (Pgl), poplar (Ptr), Arabidopsis (Ath), rice (Osa), and grapevine (Vvi). Stars indicate frequencies that are significantly different from Arabidopsis and rice.
Figure 4.
Figure 4.
Coverage and validation of EST clusters with GS-FLX (454) ESTs. A, Illustration of the validation of cDNA clusters with low clone coverage in the 5′ proximal region (GQ04008_F14; CC3HC4-type RING finger protein sequence). B, Confirmation of an unspliced intron in a unique cDNA clone that was suggested by sequence similarity analyses (GQ0011_p18; RING 1A sequence, containing a putative unspliced intron).

Similar articles

See all similar articles

Cited by 68 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback