Public report cards and confidential, collaborative peer education represent distinctly different approaches to cardiac surgery quality assessment and improvement. This review discusses the controversies regarding their methodology and relative effectiveness. Report cards have been the more commonly used approach, typically as a result of state legislation. They are based on the presumption that publication of outcomes effectively motivates providers, and that market forces will reward higher quality. Numerous studies have challenged the validity of these hypotheses. Furthermore, although states with report cards have reported significant decreases in risk-adjusted mortality, it is unclear whether this improvement resulted from public disclosure or, rather, from the development of internal quality programs by hospitals. An additional confounding factor is the nationwide decline in heart surgery mortality, including states without quality monitoring. Finally, report cards may engender negative behaviors such as high-risk case avoidance and "gaming" of the reporting system, especially if individual surgeon results are published. The alternative approach, continuous quality improvement, may provide an opportunity to enhance performance and reduce interprovider variability while avoiding the unintended negative consequences of report cards. This collaborative method, which uses exchange visits between programs and determination of best practice, has been highly effective in northern New England and in the Veterans Affairs Administration. However, despite their potential advantages, quality programs based solely on confidential continuous quality improvement do not address the issue of public accountability. For this reason, some states may continue to mandate report cards. In such instances, it is imperative that appropriate statistical techniques and report formats are used, and that professional organizations simultaneously implement continuous quality improvement programs. The statistical methodology underlying current report cards is flawed, and does not justify the degree of accuracy presented to the public. All existing risk-adjustment methods have substantial inherent imprecision, and this is compounded when the results of such patient-level models are aggregated and used inappropriately to assess provider performance. Specific problems include sample size differences, clustering of observations, multiple comparisons, and failure to account for the random component of interprovider variability. We advocate the use of hierarchical or multilevel statistical models to address these concerns, as well as report formats that emphasize the statistical uncertainty of the results.