Identification of cell phenotypic states within heterogeneous populations, along with elucidation of their switching dynamics, is a central challenge in modern biology. Conventional single-cell analysis methods typically provide only indirect, static phenotypic readouts. Transmitted light images, on the other hand, provide direct morphological readouts and can be acquired over time to provide a rich data source for dynamic cell phenotypic state identification. Here, we describe an end-to-end deep learning platform, UPSIDE (Unsupervised Phenotypic State IDEntification), for discovering cell states and their dynamics from transmitted light movies. UPSIDE uses the variational auto-encoder architecture to learn latent cell representations, which are then clustered for state identification, decoded for feature interpretation, and linked across movie frames for transition rate inference. Using UPSIDE, we identified distinct blood cell types in a heterogeneous dataset. We then analyzed movies of patient-derived acute myeloid leukemia cells, from which we identified stem-cell associated morphological states as well as the transition rates to and from these states. UPSIDE opens up the use of transmitted light movies for systematic exploration of cell state heterogeneity and dynamics in biology and medicine.