Background: On-site source data verification is a common and expensive activity, with little evidence that it is worthwhile. Central statistical monitoring (CSM) is a cheaper alternative, where data checks are performed by the coordinating centre, avoiding the need to visit all sites. Several publications have suggested methods for CSM; however, few have described their use in real trials.
Methods: R-programs were created to check data at either the subject level (7 tests within 3 programs) or site level (9 tests within 8 programs) using previously described methods or new ones we developed. These aimed to find possible data errors such as outliers, incorrect dates, or anomalous data patterns; digit preference, values too close or too far from the means, unusual correlation structures, extreme variances which may indicate fraud or procedural errors and under-reporting of adverse events. The methods were applied to three trials, one of which had closed and has been published, one in follow-up, and a third to which fabricated data were added. We examined how well the methods work, discussing their strengths and limitations.
Results: The R-programs produced simple tables or easy-to-read figures. Few data errors were found in the first two trials, and those added to the third were easily detected. The programs were able to identify patients with outliers based on single or multiple variables. They also detected (1) fabricated patients, generated to have values too close to the multivariate mean, or with too low variances in repeated measurements, and (2) sites which had unusual correlation structures or too few adverse events. Some methods were unreliable if applied to centres with few patients or if data were fabricated in a way which did not fit the assumptions used to create the programs. Outputs from the R-programs are interpreted using examples.
Limitations: Detecting data errors is relatively straightforward; however, there are several limitations in the detection of fraud: some programs cannot be applied to small trials or to centres with few patients (<10) and data falsified in a manner which does not fit the program's assumptions may not be detected. In addition, many tests require a visual assessment of the output (showing flagged participants or sites), before data queries are made or on-site visits performed.
Conclusions: CSM is a worthwhile alternative to on-site data checking and may be used to limit the number of site visits by targeting only sites which are picked up by the programs. We summarise the methods, show how they are implemented and that they can be easy to interpret. The methods can identify incorrect or unusual data for a trial subject, or centres where the data considered together are too different to other centres and therefore should be reviewed, possibly through an on-site visit.