LOC-ZSON: Language-driven Object-Centric Zero-Shot Object Retrieval and Navigation

Tianrui Guan, Yurou Yang, Harry Cheng, Muyuan Lin, Richard Kim, Rajasimman Madhivanan, Arnie Sen, Dinesh Manocha

Published in The 2024 IEEE International Conference on Robotics and Automation, 2023

Abstract

In this paper, we present LOC-ZSON, a novel Language-driven Object-Centric image representation for object navigation task within complex scenes. We propose an object-centric image representation and corresponding losses for visual-language model (VLM) fine-tuning, which can handle complex object-level queries. In addition, we design a novel LLM-based augmentation and prompt templates for stability during training and zero-shot inference. We implement our method on Astro robot and deploy it in both simulated and real-world environments for zero-shot object navigation. We show that our proposed method can achieve an improvement of 1.38 - 13.38% in terms of text-to-image recall on different benchmark settings for the retrieval task. For object navigation, we show the benefit of our approach in simulation and real world, showing 5% and 16.67% improvement in terms of navigation success rate, respectively.

Video

Paper	Video / Demo
LOC-ZSON	Video

Please cite our work if you found it useful,

@misc{guan2024loczson,
      title={LOC-ZSON: Language-driven Object-Centric Zero-Shot Object Retrieval and Navigation}, 
      author={Tianrui Guan and Yurou Yang and Harry Cheng and Muyuan Lin and Richard Kim and Rajasimman Madhivanan and Arnie Sen and Dinesh Manocha},
      year={2024},
      eprint={2405.05363},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2405.05363}, 
}