DT-HRL: Mastering Long-Sequence Manipulation with Reimagined Hierarchical Reinforcement Learning

Biomimetics (Basel). 2025 Sep 1;10(9):577. doi: 10.3390/biomimetics10090577.

Abstract

Robotic manipulators in warehousing and logistics often face complex tasks that involve multiple steps, frequent task switching, and long-term dependencies. Inspired by the hierarchical structure of human motor control, this paper proposes a Hierarchical Reinforcement Learning (HRL) framework utilizing a multi-task goal-conditioned Decision Transformer (MTGC-DT). The high-level policy treats the Markov decision process as a sequence modeling task, allowing the agent to manage temporal dependencies. The low-level policy is made up of parameterized action primitives that handle physical execution. This design improves long-term reasoning and generalization. This method is evaluated on two common logistics manipulation tasks: sequential stacking and spatial sorting with sparse reward and low-quality dataset. The main contributions include introducing a HRL framework that integrates Decision Transformer (DT) with task and goal embeddings, along with a path-efficiency loss (PEL) correction and designing a parameterized, learnable primitive skill library for low-level control to enhance generalization and reusability. Experimental results demonstrate that the proposed Decision Transformer-based Hierarchical Reinforcement Learning (DT-HRL) achieves over a 10% higher success rate and over 8% average reward compared with the baseline, and a normalized score increase of over 2% in the ablation experiments.

Keywords: decision transformer; hierarchical reinforcement learning; long-sequence tasks; robotic manipulation.