Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep;52(3):900-923.
doi: 10.1002/cjs.11793. Epub 2023 Aug 19.

High-dimensional variable selection accounting for heterogeneity in regression coefficients across multiple data sources

Affiliations

High-dimensional variable selection accounting for heterogeneity in regression coefficients across multiple data sources

Tingting Yu et al. Can J Stat. 2024 Sep.

Abstract

When analyzing data combined from multiple sources (e.g., hospitals, studies), the heterogeneity across different sources must be accounted for. In this paper, we consider high-dimensional linear regression models for integrative data analysis. We propose a new adaptive clustering penalty (ACP) method to simultaneously select variables and cluster source-specific regression coefficients with sub-homogeneity. We show that the estimator based on the ACP method enjoys a strong oracle property under certain regularity conditions. We also develop an efficient algorithm based on the alternating direction method of multipliers (ADMM) for parameter estimation. We conduct simulation studies to compare the performance of the proposed method to three existing methods (a fused LASSO with adjacent fusion, a pairwise fused LASSO, and a multi-directional shrinkage penalty method). Finally, we apply the proposed method to the multi-center Childhood Adenotonsillectomy Trial to identify sub-homogeneity in the treatment effects across different study sites.

Insérer votre résumé ici. We will supply a French abstract for those authors who can’t prepare it themselves.

Keywords: ADMM; MSC 2020; Primary 62J07; coefficient clustering; data heterogeneity; k-means; secondary 62J05; variable selection.

PubMed Disclaimer

Similar articles

References

    1. Blasey CM, Debattista C, Roe R, Block T, & Belanoff JK (2009). A multisite trial of mifepristone for the treatment of psychotic depression: a site-by-treatment interaction. Contemporary Clinical Trials, 30, 284–288. - PubMed
    1. Boyd S, Parikh N, & Chu E (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Now Publishers Inc.
    1. Efron B, Hastie T, Johnstone I, & Tibshirani R (2004). Least angle regression. The Annals of Statistics, 32, 407–499.
    1. Fan J & Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
    1. Feaster DJ, Mikulich-Gilbertson S, & Brincks AM (2011). Modeling site effects in the design and analysis of multi-site trials. The American Journal of Drug and Alcohol Abuse, 37, 383–391. - PMC - PubMed