The growing amount of data in operational electronic health record systems provides unprecedented opportunity for its reuse for many tasks, including comparative effectiveness research. However, there are many caveats to the use of such data. Electronic health record data from clinical settings may be inaccurate, incomplete, transformed in ways that undermine their meaning, unrecoverable for research, of unknown provenance, of insufficient granularity, and incompatible with research protocols. However, the quantity and real-world nature of these data provide impetus for their use, and we develop a list of caveats to inform would-be users of such data as well as provide an informatics roadmap that aims to insure this opportunity to augment comparative effectiveness research can be best leveraged.