Results from better quality studies should in some sense be more valid or more accurate than results from other studies, and as a consequence should tend to be distributed differently from results of other studies. To date, however, quality scores have been poor predictors of study results. We discuss possible reasons and remedies for this problem. It appears that 'quality' (whatever leads to more valid results) is of fairly high dimension and possibly non-additive and nonlinear, and that quality dimensions are highly application-specific and hard to measure from published information. Unfortunately, quality scores are often used to contrast, model, or modify meta-analysis results without regard to the aforementioned problems, as when used to directly modify weights or contributions of individual studies in an ad hoc manner. Even if quality would be captured in one dimension, use of quality scores in summarization weights would produce biased estimates of effect. Only if this bias were more than offset by variance reduction would such use be justified. From this perspective, quality weighting should be evaluated against formal bias-variance trade-off methods such as hierarchical (random-coefficient) meta-regression. Because it is unlikely that a low-dimensional appraisal will ever be adequate (especially over different applications), we argue that response-surface estimation based on quality items is preferable to quality weighting. Quality scores may be useful in the second stage of a hierarchical response-surface model, but only if the scores are reconstructed to maximize their correlation with bias.