Chloroplasts originate from ancient cyanobacteria-like endosymbiont. Several tens of chloroplast proteins are encoded by the chloroplast genome, while more than hundreds are encoded by the nuclear genome in plants and algae, but the exact number and identity of nuclear-encoded chloroplast proteins are still unknown. We describe here attempts to identify a large number of unidentified chloroplast proteins of endosymbiont origin (CPRENDOs). Our strategy consists of whole genome protein clustering by the homolog group method, which is optimized for organism number, and phylogenetic profiling that extract groups conserved in cyanobacteria and photosynthetic eukaryotes. An initial minimal set of CPRENDOs was predicted without targeting prediction and experimentally validated.