Many epidemiological studies now rely on the reuse of large healthcare administrative databases. In those studies, most of the time is consumed in managing data and performing basic statistical analyses and is not available anymore for complex statistical and medical analysis, therefore the potential of such databases is sometimes underexploited. The objective of this work is to build SAF4SUHAD, a statistical analysis framework for secondary use of healthcare administrative databases, using literature-based specifications. A literature review was performed on PubMed in four different medical domains: caesarian deliveries, cholecystectomies, hip replacement surgeries and bariatric surgeries. We identified 22 papers relating analyses of large databases. They reported epidemiological indicators (e.g. mean age), that were abstracted to features (e.g. univariate description of a quantitative variable), and then were implemented through 32 functions available for the user in R programming language. For instance, a function will draw a histogram, compute the mean with confidence interval, quantiles, etc. Those functions comprehend 4 functions for data management, 9 for univariate analysis, 8 for bivariate analysis, 11 for multivariate analysis, and many other intermediate functions. Those functions were successfully used to analyze a French database of 250 million discharge summaries. The set of R ready-to-use functions defined in this work could enable to secure repetitive tasks, and to refocus efforts on expert analysis.
Keywords: Healthcare epidemiology; Medico-administrative databases; Statistics.