High-dimensional variable selection accounting for heterogeneity in regression coefficients across multiple data sources
- PMID: 39319323
- PMCID: PMC11417471
- DOI: 10.1002/cjs.11793
High-dimensional variable selection accounting for heterogeneity in regression coefficients across multiple data sources
Abstract
When analyzing data combined from multiple sources (e.g., hospitals, studies), the heterogeneity across different sources must be accounted for. In this paper, we consider high-dimensional linear regression models for integrative data analysis. We propose a new adaptive clustering penalty (ACP) method to simultaneously select variables and cluster source-specific regression coefficients with sub-homogeneity. We show that the estimator based on the ACP method enjoys a strong oracle property under certain regularity conditions. We also develop an efficient algorithm based on the alternating direction method of multipliers (ADMM) for parameter estimation. We conduct simulation studies to compare the performance of the proposed method to three existing methods (a fused LASSO with adjacent fusion, a pairwise fused LASSO, and a multi-directional shrinkage penalty method). Finally, we apply the proposed method to the multi-center Childhood Adenotonsillectomy Trial to identify sub-homogeneity in the treatment effects across different study sites.
Insérer votre résumé ici. We will supply a French abstract for those authors who can’t prepare it themselves.
Keywords: ADMM; MSC 2020; Primary 62J07; coefficient clustering; data heterogeneity; k-means; secondary 62J05; variable selection.
Similar articles
-
Penalized and constrained LAD estimation in fixed and high dimension.Stat Pap (Berl). 2022;63(1):53-95. doi: 10.1007/s00362-021-01229-0. Epub 2021 Mar 31. Stat Pap (Berl). 2022. PMID: 33814727 Free PMC article.
-
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217. Cochrane Database Syst Rev. 2022. PMID: 36321557 Free PMC article.
-
Debiased inference for heterogeneous subpopulations in a high-dimensional logistic regression model.Sci Rep. 2023 Dec 11;13(1):21979. doi: 10.1038/s41598-023-48903-x. Sci Rep. 2023. PMID: 38081913 Free PMC article.
-
A robust and efficient variable selection method for linear regression.J Appl Stat. 2021 Aug 6;49(14):3677-3692. doi: 10.1080/02664763.2021.1962259. eCollection 2022. J Appl Stat. 2021. PMID: 36246863 Free PMC article.
-
Integrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Data.J Mach Learn Res. 2021 Jan;22:55. J Mach Learn Res. 2021. PMID: 34744522 Free PMC article.
References
-
- Blasey CM, Debattista C, Roe R, Block T, & Belanoff JK (2009). A multisite trial of mifepristone for the treatment of psychotic depression: a site-by-treatment interaction. Contemporary Clinical Trials, 30, 284–288. - PubMed
-
- Boyd S, Parikh N, & Chu E (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Now Publishers Inc.
-
- Efron B, Hastie T, Johnstone I, & Tibshirani R (2004). Least angle regression. The Annals of Statistics, 32, 407–499.
-
- Fan J & Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.