Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 20 (1), 441

An Equivalence Approach to the Integrative Analysis of Feature Lists

Affiliations

An Equivalence Approach to the Integrative Analysis of Feature Lists

Alex Sánchez-Pla et al. BMC Bioinformatics.

Abstract

Background: Although a few comparison methods based on the biological meaning of gene lists have been developed, the goProfiles approach is one of the few that are being used for that purpose. It consists of projecting lists of genes into predefined levels of the Gene Ontology, in such a way that a multinomial model can be used for estimation and testing. Of particular interest is the fact that it may be used for proving equivalence (in the sense of "enough similarity") between two lists, instead of proving differences between them, which seems conceptually better suited to the end goal of establishing similarity among gene lists. An equivalence method has been derived that uses a distance-based approach and the confidence interval inclusion principle. Equivalence is declared if the upper limit of a one-sided confidence interval for the distance between two profiles is below a pre-established equivalence limit.

Results: In this work, this method is extended to establish the equivalence of any number of gene lists. Additionally, an algorithm to obtain the smallest equivalence limit that would allow equivalence between two or more lists to be declared is presented. This algorithm is at the base of an iterative method of graphic visualization to represent the most to least equivalent gene lists. These methods deal adequately with the problem of adjusting for multiple testing. The applicability of these techniques is illustrated in two typical situations: (i) a collection of cancer-related gene lists, suggesting which of them are more reasonable to combine -as claimed by the authors- and (ii) a collection of pathogenesis-based transcript sets, showing which of these are more closely related. The methods developed are available in the goProfiles Bioconductor package.

Conclusions: The method provides a simple yet powerful and statistically well-grounded way to classify a set of genes or other feature lists by establishing their equivalence at a given equivalence threshold. The classification results can be viewed using standard visualization methods. This may be applied to a variety of problems, from deciding whether a series of datasets generating the lists can be combined to the simplification of groups of lists.

Keywords: Equivalence tests; Feature lists; Functional profiles; Gene lists.

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Power curve of the equivalence test, as a function of the true squared Euclidean distance. Balanced case of two gene lists of size 200 with 20 genes in common. Equivalence limit at Δ=0.25. The null hypothesis of the equivalence test states that the true squared Euclidean distance, d, is greater than or equal to Δ, that is to say, that both lists are sufficiently dissimilar according to the Δ limit criterion. Thus, rejecting this hypothesis corresponds to declaring equivalence. When the true simulated distance is d<Δ, not rejecting the null hypothesis not declaring equivalence) corresponds to a false negative. When dΔ, declaring equivalence is a false positive
Fig. 2
Fig. 2
Power curve of the equivalence test, as a function of the true squared Euclidean distance. Balanced case of two gene lists of size 1,000 with 100 genes in common. Equivalence limit at Δ=0.25. The null hypothesis of the equivalence test states that the true squared Euclidean distance, d, is greater than or equal to Δ, that is to say, that both lists are sufficiently dissimilar according to the Δ limit criterion. Thus, rejecting this hypothesis corresponds to declaring equivalence. When the true simulated distance is d<Δ, not rejecting the null hypothesis (not declaring equivalence) corresponds to a false negative. When dΔ, declaring equivalence is a false positive
Fig. 3
Fig. 3
Power curve of the equivalence test, as a function of the true squared Euclidean distance. Balanced case of two gene lists of size 200 with 20 genes in common. Equivalence limit at Δ=0.025. The null hypothesis of the equivalence test states that the true squared Euclidean distance, d, is greater than or equal to Δ, that is to say, that both lists are sufficiently dissimilar according to the Δ limit criterion. Thus, rejecting this hypothesis corresponds to declaring equivalence. When the true simulated distance is d<Δ, not rejecting the null hypothesis (not declaring equivalence) corresponds to a false negative. When dΔ, declaring equivalence is a false positive
Fig. 4
Fig. 4
Dendrogram produced from the equivalence analysis of kidney gene lists made at level 3 of the BP ontology. The lists are grouped naturally depending on the type of process on which the genes of the lists are involved. See Additional file 3 for supplementary figures at levels 2 to 8 of all three (MF, CC and BP) ontologies
Fig. 5
Fig. 5
Dendrogram produced from the equivalence analysis of cancer gene lists made at level 3 of the BP ontology. In this case there are no natural groupings but instead, the dendrogram may be used to suggest which lists may be combined more reasonably than others. See Additional file 4 for supplementary figures at levels 2 to 8 of all three (MF, CC and BP) ontologies

Similar articles

See all similar articles

References

    1. Draghici S, Khatri P, Martins RP, Ostermeier GC, Krawetz SA. Global functional profiling of gene expression. Genomics. 2003;81(2):98–104. doi: 10.1016/S0888-7543(02)00021-6. - DOI - PubMed
    1. Mootha VK, Lindgren CM, Eriksson K-F, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstråle M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesirov JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop LC. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet. 2003;34(3):267–73. doi: 10.1038/ng1180. - DOI - PubMed
    1. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50. doi: 10.1073/pnas.0506580102. - DOI - PMC - PubMed
    1. Shojaie A, Michailidis G. Analysis of gene sets based on the underlying regulatory network. J Comput Biol J Comput Mol Cell Biol. 2009;16(3):407–26. doi: 10.1089/cmb.2008.0081. - DOI - PMC - PubMed
    1. Khatri P, Sirota M, Butte AJ. Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges. PLoS Comput Biol. 2012;8(2):1002375. doi: 10.1371/journal.pcbi.1002375. - DOI - PMC - PubMed
Feedback