Depth Image-Based Deep Learning of Grasp Planning for Textureless Planar-Faced Objects in Vision-Guided Robotic Bin-Picking

Ping Jiang; Yoshiyuki Ishihara; Nobukatsu Sugiyama; Junji Oaki; Seiji Tokura; Atsushi Sugahara; Akihito Ogawa

doi:10.3390/s20030706

Depth Image-Based Deep Learning of Grasp Planning for Textureless Planar-Faced Objects in Vision-Guided Robotic Bin-Picking

Sensors (Basel). 2020 Jan 28;20(3):706. doi: 10.3390/s20030706.

Authors

Ping Jiang¹, Yoshiyuki Ishihara¹, Nobukatsu Sugiyama¹, Junji Oaki¹, Seiji Tokura¹, Atsushi Sugahara¹, Akihito Ogawa¹

Affiliation

¹ Corporate Research & Development Center, Toshiba Corporation, 1, Komukai-Toshiba-cho, Saiwai-ku, Kawasaki 212-8582, Japan.

Abstract

Bin-picking of small parcels and other textureless planar-faced objects is a common task at warehouses. A general color image-based vision-guided robot picking system requires feature extraction and goal image preparation of various objects. However, feature extraction for goal image matching is difficult for textureless objects. Further, prior preparation of huge numbers of goal images is impractical at a warehouse. In this paper, we propose a novel depth image-based vision-guided robot bin-picking system for textureless planar-faced objects. Our method uses a deep convolutional neural network (DCNN) model that is trained on 15,000 annotated depth images synthetically generated in a physics simulator to directly predict grasp points without object segmentation. Unlike previous studies that predicted grasp points for a robot suction hand with only one vacuum cup, our DCNN also predicts optimal grasp patterns for a hand with two vacuum cups (left cup on, right cup on, or both cups on). Further, we propose a surface feature descriptor to extract surface features (center position and normal) and refine the predicted grasp point position, removing the need for texture features for vision-guided robot control and sim-to-real modification for DCNN model training. Experimental results demonstrate the efficiency of our system, namely that a robot with 7 degrees of freedom can pick randomly posed textureless boxes in a cluttered environment with a 97.5% success rate at speeds exceeding 1000 pieces per hour.

Keywords: bin picking; deep learning; grasp planning; textureless; visual servoing.