How easily can omission of patients, or selection amongst poorly-reproducible measurements, create artificial correlations? Methods for detection and implications for observational research design in cardiology

Int J Cardiol. 2013 Jul 15;167(1):102-13. doi: 10.1016/j.ijcard.2011.12.018. Epub 2012 Jan 27.


Background: When reported correlation coefficients seem too high to be true, does investigative verification of source data provide suitable reassurance? This study tests how easily omission of patients or selection amongst irreproducible measurements generate fictitious strong correlations, without data fabrication.

Method and results: Two forms of manipulation are applied to a pair of normally-distributed, uncorrelated variables: first, exclusion of patients least favourable to a hypothesised association and, second, making multiple poorly-reproducible measurements per patient and choosing the most supportive. Excluding patients raises correlations powerfully, from 0.0 ± 0.11 (no patients omitted) to 0.40 ± 0.11 (one-fifth omitted), 0.59 ± 0.08 (one-third omitted) and 0.78 ± 0.05 (half omitted). Study size offers no protection: omitting just one-fifth of 75 patients (i.e. publishing 60) makes 92% of correlations statistically significant. Worse, simply selecting the most favourable amongst several measurements raises correlations from 0.0 ± 0.12 (single measurement of each variable) to 0.73 ± 0.06 (best of 2), and 0.90 ± 0.03 (best of 4). 100% of correlation coefficients become statistically significant. Scatterplots may reveal a telltale "shave sign" or "bite sign". Simple statistical tests are presented for these suspicious signatures in single or multiple studies.

Conclusion: Correlations are vulnerable to data manipulation. Cardiology is especially vulnerable to patient deletion (because cardiologists ourselves might completely control enrolment and measurement), and selection of "best" measurements (because alternative heartbeats are numerous, and some modalities poorly reproducible). Source data verification cannot detect these but tests might highlight suspicious data and--aggregating across studies--unreliable laboratories or research fields. Cardiological correlation research needs adequately-informed planning and guarantees of integrity, with teeth.

Publication types

  • Observational Study

MeSH terms

  • Cardiology / methods*
  • Cardiology / standards*
  • Humans
  • Patient Selection*
  • Reproducibility of Results*
  • Research Design / standards*