The short response latencies of face selective neurons in the inferotemporal cortex impose major constraints on models of visual processing. It appears that visual information must essentially propagate in a feed-forward fashion with most neurons only having time to fire one spike. We hypothesize that flashed stimuli can be encoded by the order of firing of ganglion cells in the retina and propose a neuronal mechanism, that could be related to fast shunting inhibition, to decode such information. Based on these assumptions, we built a three-layered neural network of retino-topically organized neuronal maps. We showed, by using a learning rule involving spike timing dependant plasticity, that neuronal maps in the output layer can be trained to recognize natural photographs of faces. Not only was the model able to generalize to novel views of the same faces, it was also remarkably resistant to image noise and reductions in contrast.