Accelerating Signal-Temporal-Logic-Based Task and Motion Planning of Bipedal Locomotion using Benders Decomposition

TL;DR: We present a novel Benders Decomposition approach with time-shifted cuts accelerating bipedal robot long-horizon task and motion planning by 20x for logistics and automation tasks.

Abstract

Task and motion planning under Signal Temporal Logic constraints is known to be NP-hard. A common class of approaches formulates these hybrid problems, which involve discrete task scheduling and continuous motion planning, as mixed-integer programs (MIP). However, in applications for bipedal locomotion, introduction of non-convex constraints such as kinematic reachability and footstep rotation exacerbates computational complexity of MIPs. In this work, we present a method based on Benders Decomposition to address this challenge where solving the entire monolithic optimization problem can be prohibitively intractable. Benders Decomposition proposes an iterative cutting-plane technique that partitions the problem into a master problem to prototype a plan that meets the task specification, and a series of subproblems for kinematics and dynamics feasibility checks. Our experiments demonstrate that this method achieves faster planning compared to alternative algorithms for solving the resulting optimization program with nonlinear constraints.

Benchmark Scenarios

Our benchmark environments cover a range of application scenarios in which the robot is tasked to access a number of PoIs subject to certain logical and temporal conditions. Some of these environments are adapted from prior works, but we extend the planning horizon to accommodate finer time discretization for footsteps and more complex task specifications. Below, we present four benchmark scenarios, each with a unique task specification and a corresponding video demonstration.

1. Delivery Environment

The robot must collect a box from one shelf of each color before it can pass through Region 7 and ultimately reach the table to unload at Point 15. The task specification is shown below:

\[ \varphi = \neg \pi^{7} \ \mathcal U_{[0, 70]}\ (\pi^9 \vee \pi^{10}) \wedge \neg \pi^{7} \ \mathcal U_{[0, 70]}\ (\pi^{11} \vee \pi^{13}) \wedge \neg \pi^{7} \ \mathcal U_{[0, 70]}\ (\pi^{12} \vee \pi^{14}) \wedge \Diamond_{[0, 70]} \pi^{15} \]

2. Store Environment

The robot is instructed to reach one of Points 8, 9, or 10 within 15 steps after the start, and then sequentially reach one of Points 11 or 12, followed by one of Points 13, 14, and finally Point 15.

\[ \varphi = \Diamond_{[0, 15]} (\pi^8 \vee \pi^9 \vee \pi^{10}) \wedge \neg (\pi^{13} \vee \pi^{14}) \ \mathcal U_{[10, 50]}\ (\pi^{11} \vee \pi^{12}) \wedge \neg \pi^{15} \ \mathcal U_{[40, 70]}\ (\pi^{13} \vee \pi^{14}) \wedge \Diamond_{[50, 70]} \pi^{15} \]

3. Door Puzzle Environment

The robot is required to reach Point 17 at the end which can only be accessed by passing through several doors. Each door is paired with a key that must be collected from a designated PoI. The door-key pairings are outlined in the task specification:

\[ \varphi = \neg \pi^{7} \ \mathcal U_{[0, 130]}\ \pi^{14} \wedge \neg \pi^{8} \ \mathcal U_{[0, 130]}\ \pi^{12} \wedge \neg \pi^{9} \ \mathcal U_{[0, 130]}\ \pi^{13} \wedge \neg \pi^{11} \ \mathcal U_{[0, 130]}\ \pi^{15} \wedge \neg \pi^{10} \ \mathcal U_{[0, 130]}\ \pi^{16} \wedge \Diamond_{[0, 130]} \pi^{17} \]

4. Factory Environment

The robot must visit the charging stations at Points 12 and 13 within 50 steps each time it leaves a charging station. Additionally, after retrieving a box from one of the tables at Points 6, 8, 9, or 11, the robot must visit either Point 7 or 10 within 20 steps to unload it onto the shelves.

\[ \varphi = \square_{[0, 130]} \left(\neg (\pi^{12} \vee \pi^{13})\Rightarrow \Diamond_{[0, 50]}(\pi^{12} \vee \pi^{13})\right) \wedge \square_{[0, 130]} \left((\pi^{6}\vee \pi^{8}\vee \pi^{9}\vee \pi^{11}) \Rightarrow\Diamond_{[0, 20]}(\pi^{7} \vee \pi^{10})\right) \wedge \Diamond_{[0, 130]}\pi^6 \wedge \Diamond_{[0, 130]}\pi^8 \wedge \Diamond_{[0, 130]}\pi^9 \wedge \Diamond_{[0, 130]}\pi^{10} \]

Results and Comparisons

Results

In the figure below, we show the iterations of solutions generated by our proposed method in the Delivery environment. The stripe at the bottom shows the time spent by the BMP and the BSPs, along with the number of cuts added to the BMP at the end of each iteration. The black stars indicate moments when the BSPs are solved as infeasible and multiple feasibility cuts are added, while the yellow stars represent iterations when all BSPs are solved as feasible and only an optimality cut is added. We highlight three key moments: an infeasible solution (A) which is shown as the cyan dots on the top left figure, the first feasible solution (B) which is shown by a sequence of LIP models at each contact instant on the top left figure, and the optimal solution (C) that is represented by a sequence of LIP models at each contact instant on the top right figure.

We observe that while the single cut method may not find a feasible solution within 100 iterations, our proposed method is able to consistently find the optimal solution within 20 iterations.

Comparisons

As a baseline for comparison, we solve our optimization problem using the well-established decomposition approach ADMM. Our ADMM implementation decomposes the problem into an MICP subproblem and an NLP projection step. The constraints in the MICP and NLP subproblems are identical to those in our BMP and BSP, respectively, except that in the NLP, the task schedule is not fixed. Instead, ADMM subproblems promote convergence through additional consensus terms in their objective functions that penalize deviations between residual variables from other subproblems using L2 norms weighted by a penalty parameter. The algorithm alternates between the MICP and NLP steps and updates the dual variables until stopping criteria, such as maximum residual tolerance or maximum iteration count, are met.

The above figure showcases the ADMM convergence for Store environment, showing primal residuals of the position, direction and foothold over iterations. We also compare the result after ADMM hits the threshold and the optimal solution obtained from the BD method in (A) and (B). The Store environment is one of the best convergence scenarios for ADMM among all scenarios.

ADMM convergence is non-monotonic, and despite large penalties for consensus violations, residuals often remain significant after tens of iterations. This indicates that the MICP solution from ADMM does not fully satisfy the foot kinematic reachability constraints, and the NLP solution does not exactly satisfy the STL constraints. In comparison, all the feasible solutions that the BD method obtains strictly satisfy all the constraints.

Conclusion

This work presents a Benders Decomposition approach with a novel time-shifted cut generation mechanism that significantly accelerates STL-guided TAMP for bipedal locomotion with nonlinear kinematic and dynamic constraints. Our method handles long-horizon planning scenarios with over 100 footsteps and achieves solving speeds more than 20 times faster than state-of-the-art methods across all benchmark environments, enabling practical deployment of humanoid robots for logistics and automation tasks.