It is sometimes supposed that standardizing tests of mouse behavior will ensure similar results in different laboratories. We evaluated this supposition by conducting behavioral tests with identical apparatus and test protocols in independent laboratories. Eight genetic groups of mice, including equal numbers of males and females, were either bred locally or shipped from the supplier and then tested on six behaviors simultaneously in three laboratories (Albany, NY; Edmonton, AB; Portland, OR). The behaviors included locomotor activity in a small box, the elevated plus maze, accelerating rotarod, visible platform water escape, cocaine activation of locomotor activity, and ethanol preference in a two-bottle test. A preliminary report of this study presented a conventional analysis of conventional measures that revealed strong effects of both genotype and laboratory as well as noteworthy interactions between genotype and laboratory. We now report a more detailed analysis of additional measures and view the data for each test in different ways. Whether mice were shipped from a supplier or bred locally had negligible effects for almost every measure in the six tests, and sex differences were also absent or very small for most behaviors, whereas genetic effects were almost always large. For locomotor activity, cocaine activation, and elevated plus maze, the analysis demonstrated the strong dependence of genetic differences in behavior on the laboratory giving the tests. For ethanol preference and water escape learning, on the other hand, the three labs obtained essentially the same results for key indicators of behavior. Thus, it is clear that the strong dependence of results on the specific laboratory is itself dependent on the task in question. Our results suggest that there may be advantages of test standardization, but laboratory environments probably can never be made sufficiently similar to guarantee identical results on a wide range of tests in a wide range of labs. Interpretations of our results by colleagues in neuroscience as well as the mass media are reviewed. Pessimistic views, prevalent in the media but relatively uncommon among neuroscientists, of mouse behavioral tests as being highly unreliable are contradicted by our data. Despite the presence of noteworthy interactions between genotype and lab environment, most of the larger differences between inbred strains were replicated across the three labs. Strain differences of moderate effects size, on the other hand, often differed markedly among labs, especially those involving three 129-derived strains. Implications for behavioral screening of targeted and induced mutations in mice are discussed.
Copyright 2003 Wiley Periodicals, Inc.