An Efficient Robotic Pushing and Grasping Method in Cluttered Scene

IEEE Trans Cybern. 2024 Apr 17:PP. doi: 10.1109/TCYB.2024.3381639. Online ahead of print.

Abstract

Pushing and grasping (PG) are crucial skills for intelligent robots. These skills enable robots to perform complex grasping tasks in various scenarios. These PG methods can be categorized into single-stage and multistage approaches. Single-stage methods are faster but less accurate, while multistage methods offer high accuracy at the expense of time efficiency. To address this issue, a novel end-to-end PG method called efficient PG network (EPGNet) is proposed in this article. EPGNet achieves both high accuracy and efficiency simultaneously. To optimize performance with fewer parameters, EfficientNet-B0 is used as the backbone of EPGNet. Additionally, a novel cross-fusion module is introduced to enhance network performance in robotic PG tasks. This module fuses and utilizes local and global features, aiding the network in handling objects of varying sizes in different scenes. EPGNet consists of two branches dedicated to predicting PG actions, respectively. Both branches are trained simultaneously within a Q -learning framework. Training data is collected through trial and error, involving the robot performing PG actions. To bridge the gap between simulation and reality, a unique PG dataset is proposed. Additionally, a YOLACT network is trained on the PG dataset to facilitate object detection and segmentation. A comprehensive set of experiments is conducted in simulated environments and real-world scenarios. The results demonstrate that EPGNet outperforms single-stage methods and offers competitive performance compared to multistage methods, all while utilizing fewer parameters. A video is available at https://youtu.be/HNKJjQH0MPc.