In animal experiments, animals, husbandry and test procedures are traditionally standardized to maximize test sensitivity and minimize animal use, assuming that this will also guarantee reproducibility. However, by reducing within-experiment variation, standardization may limit inference to the specific experimental conditions. Indeed, we have recently shown in mice that standardization may generate spurious results in behavioral tests, accounting for poor reproducibility, and that this can be avoided by population heterogenization through systematic variation of experimental conditions. Here, we examined whether a simple form of heterogenization effectively improves reproducibility of test results in a multi-laboratory situation. Each of six laboratories independently ordered 64 female mice of two inbred strains (C57BL/6NCrl, DBA/2NCrl) and examined them for strain differences in five commonly used behavioral tests under two different experimental designs. In the standardized design, experimental conditions were standardized as much as possible in each laboratory, while they were systematically varied with respect to the animals' test age and cage enrichment in the heterogenized design. Although heterogenization tended to improve reproducibility by increasing within-experiment variation relative to between-experiment variation, the effect was too weak to account for the large variation between laboratories. However, our findings confirm the potential of systematic heterogenization for improving reproducibility of animal experiments and highlight the need for effective and practicable heterogenization strategies.