Blending Data-Driven Priors in Dynamic Games

RSS 2024

Abstract

As intelligent robots like autonomous vehicles become increasingly deployed in the presence of people, the extent to which these systems should leverage model-based game-theoretic planners versus data-driven policies for safe, interaction-aware motion planning remains an open question. Existing dynamic game formulations assume all agents are task-driven and behave optimally. However, in reality, humans tend to deviate from the decisions prescribed by these models, and their behavior is better approximated under a noisy-rational paradigm. In this work, we investigate a principled methodology to blend a data-driven reference policy with an optimization-based game-theoretic policy. We formulate KLGame, an algorithm for solving non-cooperative dynamic game with Kullback-Leibler (KL) regularization with respect to a general, stochastic, and possibly multi-modal reference policy. Our method incorporates, for each decision maker, a tunable parameter that permits modulation between task-driven and data-driven behaviors. We propose an efficient algorithm for computing multi-modal approximate feedback Nash equilibrium strategies of KLGame in real time. Through a series of simulated and real-world autonomous driving scenarios, we demonstrate that KLGame policies can more effectively incorporate guidance from the reference policy and account for noisily-rational human behaviors versus non-regularized baselines

Contributions

We introduce KLGame, a novel stochastic dynamic game that blends interaction-aware task optimization with closed-loop policy guidance via Kullback-Leibler (KL) regularization. We provide an in-depth analysis in the linear-quadratic (LQ) setting with Gaussian reference policies and show KLGame permits an analytical global feedback Nash equilibrium, which naturally generalizes the solution of the maximum-entropy game. We propose an efficient and scalable trajectory optimiza- tion algorithm for computing approximate feedback Nash equilibria of KLGame with general nonlinear dynamics, costs, and multi-modal reference policies. Experimental results on Waymo’s Open Motion Dataset demonstrate the efficacy of KLGame in leveraging data-driven priors compared to state-of-the-art methods.

Approach

Our proposed approach can seamlessly integrate data-driven policies with optimization-based dynamic game solutions. Data-driven behavior predictors, such as transformer-based methods, provide marginal (i.e., single-agent) priors for human motion prediction, but may struggle to model closed-loop, multi-agent interactions.

KLGame allows a robot to incorporate guidance from data-driven rollouts while performing online game-theoretic planning in closed-loop. This method uses a tunable parameter λ that modulates behaviors on a spectrum: λ=0 gives a deterministic dynamic game (task-optimal), and λ→∞ gives multi-modal behavior cloning. We call this tunability policy blending.

KLGame incentivizes the trajectories of planning agents to not only optimize hand-crafted cost heuristics, but also to adhere to a reference policy. In this work, we assume that the reference policy is distilled from data or expert knowledge, is stochastic, and may be multi-modal in general. In contrast to other integrated prediction and planning methods, KLGame (i) provides an analytically and computationally sound methodology for planning under strategy uncertainty, exactly solving the regularized stochastic optimization problem; and (ii) incorporates tunable multi-modal, data-driven motion predictions in the optimal policy through a scalar parameter, allowing the planner to modulate between purely data-driven and purely optimal behaviors.

We study the role of the reference policy in helping KLGame find diverse game solutions more amenable to execution than existing methods. We compare our method against a mixture of deterministic, stochastic, and data-driven baselines on three simulated interaction scenarios with nonconvex costs and mode uncertainty.

In the first experiment we show that mixing the game’s payoff with the expert’s reference policy allows the game solver to break out of a suboptimal equilibrium. We then demonstrate that a blended multi-modal KLGame policy can balance between competitiveness and cautiousness in fast and rivalrous interactions such as car racing. Finally, we validate that the KLGame planner can integrates a state-of-the-art large-scale, data-driven behavior model with an optimization-based game policy - improving safety for complex tasks such as urban autonomous driving - at scale.

Citation


                    @article{lidard2024blending,
                        title={Blending Data-Driven Priors in Dynamic Games},
                        author={Lidard, Justin and Hu, Haimin and Hancock, Asher and Zhang, Zixu and Contreras, Albert Gim{\'o} and Modi, Vikash and
                         DeCastro, Jonathan and Gopinath, Deepak and Rosman, Guy and Leonard, Naomi and Santos, Mar{\'i}a and Fisac, Jaime Fern{\'a}ndez},
                        journal={arXiv preprint arXiv:2402.14174},
                        year={2024}
                      }