Estimating distances between people and robots plays a crucial role in understanding social Human-Robot Interaction (HRI) from an egocentric view. It is a key step if robots should engage in social interactions, and to collaborate with people as part of human-robot teams. For distance estimation between a person and a robot, different sensors can be employed, and the number of challenges to be addressed by the distance estimation methods rise with the simplicity of the technology of a sensor. In the case of estimating distances using individual images from a single camera in a egocentric position, it is often required that individuals in the scene are facing the camera, do not occlude each other, and are fairly visible so specific facial or body features can be identified. In this paper, we propose a novel method for estimating distances between a robot and people using single images from a single egocentric camera. The method is based on previously proven 2D pose estimation, which allows partial occlusions, cluttered background, and relatively low resolution. The method estimates distance with respect to the camera based on the Euclidean distance between ear and torso of people in the image plane. Ear and torso characteristic points has been selected based on their relatively high visibility regardless of a person orientation and a certain degree of uniformity with regard to the age and gender. Experimental validation demonstrates effectiveness of the proposed method.
Keywords: Human–Robot Interaction; distance estimation; single RGB image; social interaction.