Robot Navigation Using Physically Grounded Vision-Language Models in Outdoor Environments

Mohamed Elnoor, Kasun Weerakoon, Gershom Seneviratne, Ruiqi Xian, Tianrui Guan, Mohamed Khalid M Jaffar, Vignesh Rajagopal, Dinesh Manocha

Published in Arxiv, 2024

Abstract

We present a novel autonomous robot navigation algorithm for outdoor environments that is capable of handling diverse terrain traversability conditions. Our approach, VLM-GroNav, uses vision-language models (VLMs) and integrates them with physical grounding that is used to assess intrinsic terrain properties such as deformability and slipperiness. We use proprioceptive-based sensing, which provides direct measurements of these physical properties, and enhances the overall semantic understanding of the terrains. Our formulation uses in-context learning to ground the VLM's semantic understanding with proprioceptive data to allow dynamic updates of traversability estimates based on the robot's real-time physical interactions with the environment. We use the updated traversability estimations to inform both the local and global planners for real-time trajectory replanning. We validate our method on a legged robot (Ghost Vision 60) and a wheeled robot (Clearpath Husky), in diverse real-world outdoor environments with different deformable and slippery terrains. In practice, we observe significant improvements over state-of-the-art methods by up to 50% increase in navigation success rate.

The paper is available here. Please cite our work if you found it useful,

@misc{elnoor2024vlmoutdoornav,
      title={Robot Navigation Using Physically Grounded Vision-Language Models in Outdoor Environments}, 
      author={Mohamed Elnoor and Kasun Weerakoon and Gershom Seneviratne and Ruiqi Xian and Tianrui Guan and Mohamed Khalid M Jaffar and Vignesh Rajagopal and Dinesh Manocha},
      year={2024},
      eprint={2409.20445},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2409.20445}, 
}