Sorting Objects from a Conveyor Belt Using POMDPs with Multiple-Object Observations and Information-Gain Rewards

Ady-Daniel Mezei; Levente Tamás; Lucian Buşoniu

doi:10.3390/s20092481

Sorting Objects from a Conveyor Belt Using POMDPs with Multiple-Object Observations and Information-Gain Rewards

Sensors (Basel). 2020 Apr 27;20(9):2481. doi: 10.3390/s20092481.

Authors

Ady-Daniel Mezei¹, Levente Tamás¹, Lucian Buşoniu¹

Affiliation

¹ Department of Automation, Technical University of Cluj-Napoca, Str. George Bariţiu, Nr. 26-28, 400027 Cluj-Napoca, Romania.

Abstract

We consider a robot that must sort objects transported by a conveyor belt into different classes. Multiple observations must be performed before taking a decision on the class of each object, because the imperfect sensing sometimes detects the incorrect object class. The objective is to sort the sequence of objects in a minimal number of observation and decision steps. We describe this task in the framework of partially observable Markov decision processes, and we propose a reward function that explicitly takes into account the information gain of the viewpoint selection actions applied. The DESPOT algorithm is applied to solve the problem, automatically obtaining a sequence of observation viewpoints and class decision actions. Observations are made either only for the object on the first position of the conveyor belt or for multiple adjacent positions at once. The performance of the single- and multiple-position variants is compared, and the impact of including the information gain is analyzed. Real-life experiments with a Baxter robot and an industrial conveyor belt are provided.

Keywords: POMDP; active perception; information-gain rewards; robotics.