This article focuses on the necessary psychometric properties of a patient-reported outcomes (PROs) measure. Topics include the importance of reliability and validity, psychometric approaches used to provide reliability and validity estimates, the kinds of evidence needed to indicate that a PRO has a sufficient level of reliability and validity, contexts that may affect psychometric properties, methods available to evaluate PRO instruments when the context varies, and types of reliability and validity testing that are appropriate during different phases of clinical trials. Points discussed include the perspective that the psychometric properties of reliability and validity are on a continuum in which the more evidence one has, the greater confidence there is in the value of the PRO data. Construct validity is the type of validity most frequently used with PRO instruments as few "gold standards" exist to allow the use of criterion validity and content validity by itself only provides beginning evidence of validity. Several guidelines are recommended for establishing sufficient evidence of reliability and validity. For clinical trials, a minimum reliability threshold of 0.70 is recommended. Sample sizes for testing should include at least 200 cases and results should be replicated in at least one additional sample. At least one full report on the development of the instrument and one on the use of the instrument are deemed necessary to evaluate the PRO psychometric properties. Psychometric testing ideally occurs before the initiation of Phase III trials. When testing does not occur prior to a Phase III trial, considerable risk is posed in relation to the ability to substantiate the use of the PRO data. Various qualitative (e.g., focus groups, behavioral coding, cognitive interviews) and quantitative approaches (e.g., differential item functioning testing) are useful in evaluating the reliability and validity of PRO instruments.