Personalized medicine is being realized by our ability to measure biological and environmental information about patients. Much of these data are being stored in electronic health records yielding big data that presents challenges for its management and analysis. Here, we review several areas of knowledge that are necessary for next-generation scientists to fully realize the potential of biomedical big data. We begin with an overview of big data and its storage and management. We then review statistics and data science as foundational topics followed by a core curriculum of artificial intelligence, machine learning and natural language processing that are needed to develop predictive models for clinical decision making. We end with some specific training recommendations for preparing next-generation scientists for biomedical big data.