Understanding the goals or intentions of other people requires a broad range of evaluative processes including the decoding of biological motion, knowing about object properties, and abilities for recognizing task space requirements and social contexts. It is becoming increasingly evident that some of this decoding is based in part on the simulation of other people's behavior within our own nervous system. This review focuses on aspects of action understanding that rely on embodied cognition, that is, the knowledge of the body and how it interacts with the world. This form of cognition provides an essential knowledge base from which action simulation can be used to decode at least some actions performed by others. Recent functional imaging studies or action understanding are interpreted with a goal of defining conditions when simulation operations occur and how this relates with other constructs, including top-down versus bottom-up processing and the functional distinctions between action observation and social networks. From this it is argued that action understanding emerges from the engagement of highly flexible computational hierarchies driven by simulation, object properties, social context, and kinematic constraints and where the hierarchy is driven by task structure rather than functional or strict anatomic rules.