Endosomes are subcellular organelles which serve as important transport compartments in eukaryotic cells. Fluorescence microscopy is a widely applied technology to study endosomes at the subcellular level. In general, a microscopy image can contain a large number of organelles and endosomes in particular. Detecting and annotating endosomes in fluorescence microscopy images is a critical part in the study of subcellular trafficking processes. Such annotation is usually performed by human inspection, which is time-consuming and prone to inaccuracy if carried out by inexperienced analysts. This paper proposes a two-stage method for automated detection of ring-like endosomes. The method consists of a localization stage cascaded by an identification stage. Given a test microscopy image, the localization stage generates a voting-map by locally comparing the query endosome patches and the test image based on a bag-of-words model. Using the voting-map, a number of candidate patches of endosomes are determined. Subsequently, in the identification stage, a support vector machine (SVM) is trained using the endosome patches and the background pattern patches. Each of the candidate patches is classified by the SVM to rule out those patches of endosome-like background patterns. The performance of the proposed method is evaluated with real microscopy images of human myeloid endothelial cells. It is shown that the proposed method significantly outperforms several state-of-the-art competing methods using multiple performance metrics.