Detecting novel associations in large data sets
- PMID: 22174245
- PMCID: PMC3325791
- DOI: 10.1126/science.1205438
Detecting novel associations in large data sets
Abstract
Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R(2)) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.
Figures
Comment in
-
Mathematics. A correlation for the 21st century.Science. 2011 Dec 16;334(6062):1502-3. doi: 10.1126/science.1215894. Science. 2011. PMID: 22174235 No abstract available.
Similar articles
-
A novel piecewise-linear method for detecting associations between variables.PLoS One. 2023 Aug 24;18(8):e0290280. doi: 10.1371/journal.pone.0290280. eCollection 2023. PLoS One. 2023. PMID: 37616293 Free PMC article.
-
On the assessment of statistical significance of three-dimensional colocalization of sets of genomic elements.Nucleic Acids Res. 2012 May;40(9):3849-55. doi: 10.1093/nar/gks012. Epub 2012 Jan 20. Nucleic Acids Res. 2012. PMID: 22266657 Free PMC article.
-
MICRAT: a novel algorithm for inferring gene regulatory networks using time series gene expression data.BMC Syst Biol. 2018 Dec 14;12(Suppl 7):115. doi: 10.1186/s12918-018-0635-1. BMC Syst Biol. 2018. PMID: 30547796 Free PMC article.
-
Intestinal MicrobiOMICS to define health and disease in human and mice.Curr Pharm Biotechnol. 2012 Apr;13(5):746-58. doi: 10.2174/138920112799857567. Curr Pharm Biotechnol. 2012. PMID: 22122483 Review.
-
Functional genomics: learning to think about gene expression data.Curr Biol. 1999 May 6;9(9):R338-41. doi: 10.1016/s0960-9822(99)80208-5. Curr Biol. 1999. PMID: 10322108 Review.
Cited by
-
Effective data filtering is prerequisite for robust microbial association network construction.Front Microbiol. 2022 Oct 4;13:1016947. doi: 10.3389/fmicb.2022.1016947. eCollection 2022. Front Microbiol. 2022. PMID: 36267180 Free PMC article.
-
DTW-MIC Coexpression Networks from Time-Course Data.PLoS One. 2016 Mar 31;11(3):e0152648. doi: 10.1371/journal.pone.0152648. eCollection 2016. PLoS One. 2016. PMID: 27031641 Free PMC article.
-
Linear and non-linear associations of gonorrhea diagnosis rates with social determinants of health.Int J Environ Res Public Health. 2012 Sep 3;9(9):3149-65. doi: 10.3390/ijerph9093149. Int J Environ Res Public Health. 2012. PMID: 23202676 Free PMC article.
-
ApoE Modifier Alleles for Alzheimer's Disease Discovered by Information Theory Dependency Measures: MIST Software Package.J Comput Biol. 2023 Mar;30(3):323-336. doi: 10.1089/cmb.2022.0185. Epub 2022 Nov 2. J Comput Biol. 2023. PMID: 36322888 Free PMC article.
-
CoGTEx: Unscaled system-level coexpression estimation from GTEx data forecast novel functional gene partners.PLoS One. 2024 Oct 4;19(10):e0309961. doi: 10.1371/journal.pone.0309961. eCollection 2024. PLoS One. 2024. PMID: 39365797 Free PMC article.
References
-
- Hastie T, Tibshirani R, Friedman JH. The elements of statistical learning: data mining, inference, and prediction. Springer Verlag; 2009.
-
- Science Staff, Challenges and opportunities. Science. 2011;331:693. - PubMed
-
-
By ‘functional relationship’ we mean a distribution (X,Y) in which Y is a function of X, potentially with independent noise added.
-
-
- Caspi A, et al. Influence of life stress on depression: moderation by a polymorphism in the 5-HTT gene. Science. 2003;301:386. - PubMed
-
- Clayton RN, Mayeda TK. Oxygen isotope studies of achondrites. Geochimica et Cosmochimica Acta. 1996;60:1999.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
