A Neural-Dynamic Architecture for Concurrent Estimation of Object Pose and Identity

Front Neurorobot. 2017 Apr 28:11:23. doi: 10.3389/fnbot.2017.00023. eCollection 2017.

Abstract

Handling objects or interacting with a human user about objects on a shared tabletop requires that objects be identified after learning from a small number of views and that object pose be estimated. We present a neurally inspired architecture that learns object instances by storing features extracted from a single view of each object. Input features are color and edge histograms from a localized area that is updated during processing. The system finds the best-matching view for the object in a novel input image while concurrently estimating the object's pose, aligning the learned view with current input. The system is based on neural dynamics, computationally operating in real time, and can handle dynamic scenes directly off live video input. In a scenario with 30 everyday objects, the system achieves recognition rates of 87.2% from a single training view for each object, while also estimating pose quite precisely. We further demonstrate that the system can track moving objects, and that it can segment the visual array, selecting and recognizing one object while suppressing input from another known object in the immediate vicinity. Evaluation on the COIL-100 dataset, in which objects are depicted from different viewing angles, revealed recognition rates of 91.1% on the first 30 objects, each learned from four training views.

Keywords: neural dynamics; object recognition; pose estimation; recurrent process; top-down feedback.