Since 1983, hospitals in the United States have been receiving prospective payment for their in-hospital patient admissions covered under Medicare. Under such schemes each patient is placed in a group by a classification system, known as the Diagnosis Related Groups (DRG), and the hospital is reimbursed by the Health Care Financing Administration according to some predetermined group average, adjusted for hospital level characteristics, such as size, location and teaching activity. Recent interest has focused on refining the DRG system or considering totally different systems of classification. Studies designed to compare the ability of different systems to account for between-patient variability in resource consumption in the same dataset lead to the problem of model selection between large non-nested regressions, where resource consumption, measured by length of hospital stay or costs, is regressed on dummy-indicator variables representing different patient groups. We use a simple measure of fit to develop a symmetric test of the null hypothesis that the two systems account equally well for variability in resource consumption. With this method, unlike methods such as Akaike's AIC criterion, we can quantify the probability of a false positive, and thereby limit the probability of choosing one system over another when it is no better at accounting for variability in resource consumption.