In crowding, the perception of a target strongly deteriorates when neighboring elements are presented. Crowding is usually assumed to have the following characteristics. (a) Crowding is determined only by nearby elements within a restricted region around the target (Bouma's law). (b) Increasing the number of flankers can only deteriorate performance. (c) Target-flanker interference is feature-specific. These characteristics are usually explained by pooling models, which are well in the spirit of classic models of object recognition. In this review, we summarize recent findings showing that crowding is not determined by the above characteristics, thus, challenging most models of crowding. We propose that the spatial configuration across the entire visual field determines crowding. Only when one understands how all elements of a visual scene group with each other, can one determine crowding strength. We put forward the hypothesis that appearance (i.e., how stimuli look) is a good predictor for crowding, because both crowding and appearance reflect the output of recurrent processing rather than interactions during the initial phase of visual processing.