Deepti Hegde

I am a PhD student advised by Dr. Vishal Patel at the Vision and Image Understanding Lab at Johns Hopkins University where I work on 3D computer vision and deep learning. Most recently, my research has focused on vision-language models for 3D scene understanding and autonomous driving.

In Summer 2024 I interned at Qualcomm where I worked on leveraging vision-language models for end-to-end planning for autonomous driving. In Spring 2024, I interned at Microsoft Research working on the application of large language models in visual reasoning tasks for 3D telehealth. Before that, I have spent two summers as an intern at Mitsubishi Electric Research Labs working on 3D object detection with LiDAR for autonomous driving scenarios.

dhegde1[at]jhu[dot]edu / CV / Google Scholar / Twitter / Github

Research Statement

News

March 2025 - Successfully defended my PhD thesis!
February 2025 - DiMA accepted to CVPR 2025!
August 2024 - Accepted to the ECCV 2024 Doctoral Consortium!
July 2024 - Our paper "Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection" accepted to ECCV 2024
June 2024 - Started my summer internship at Qualcomm
April 2024 - Started my spring internship at Microsoft Research
February 2024 - Our work "MonoDiff: Monocular 3D Object Detection and Pose Estimation with Diffusion Models" accepted to CVPR 2024
October 2023 - One paper accepted to WACV 2024
August 2023 - CLIP goes 3D accepted to OpenSUN3D workshop at ICCV

Research

	Distilling Multi-modal Large Language Models for Autonomous Driving CVPR 2025 Deepti Hegde, Rajeev Yasarla, Hong Cai, Shizhong Han, Apratim Bhattacharyya, Shweta Mahajan, Litian Liu, Risheek Garrepalli, Vishal M. Patel, Fatih Porikli arXiv Safe motion planning in autonomous vehicles is crucial, especially in rare and critical "long-tail" scenarios. While recent systems use large language models (LLMs) as planners to improve performance in these situations, they often come with high computational costs at runtime. Our work distills knowledge from multi-modal LLMs to vision-based planners to enable robust open-loop planning while maintaining computational efficiency.
	Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection ECCV 2024 Deepti Hegde, Suhas Lohit, Kuan-Chuan Peng, Mike Jones, Vishal M. Patel arXiv Video We propose a spatio-temporal equivariant learning framework for self supervised pre-training on LiDAR point clouds for the task of 3D object detection. Our experiments show that the best performance arises with a pre-training approach that encourages equivariance to translation, scaling, and flip, rotation and scene flow. For spatial augmentations, we find that depending on the transformation, either a contrastive objective or an equivariance-by-classification objective yields best results.
	CLIP goes 3D: Leveraging Prompt Tuning for Language-Grounded 3D Recognition OpenSun3D @ ICCV 2023 Deepti Hegde, Jeya Maria Jose Valanarasu Vishal M. Patel arXiv code CLIP is not suitable for extracting 3D geometric features as it was trained on only images and text by natural language supervision. We work on addressing this limitation and propose a new framework CG3D (CLIP Goes 3D) where a 3D encoder is trained to exhibit zero-shot capabilities.
	Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection WACV 2024 Deepti Hegde, Vishal M. Patel, arXiv code Addressing the limitations of traditional feature aggregation methods for prototype computation in the presence of noisy labels, we utilize a transformer module to identify outlier ROI's that correspond to incorrect, over-confident annotations, and compute an attentive class prototype. Under an iterative training strategy, the losses associated with noisy pseudo labels are down-weighed and thus refined in the process of self-training.
	Uncertainty-aware Mean Teacher for Source-free Unsupervised Domain Adaptive 3D Object Detection ICRA 2023 Deepti Hegde, Vishwanath Sindagi, Velat Kilic, A. Brinton Cooper, Mark Foster, Vishal Patel, arXiv In order to avoid reinforcing errors caused by label noise, we propose an uncertainty-aware mean teacher framework which implicitly filters incorrect pseudo-labels during training. Leveraging model uncertainty allows the mean teacher network to perform implicit filtering by down-weighing losses corresponding uncertain pseudo-labels.
	Lidar Light Scattering Augmentation (LISA): Physics-based Simulation of Adverse Weather Conditions for 3D Object Detection Velat Kilic, Deepti Hegde, Vishwanath Sindagi, A. Brinton Cooper, Mark Foster, Vishal Patel, arXiv code We propose a physics-based approach to simulate lidar point clouds of scenes in adverse weather conditions. These augmented datasets can then be used to train lidar-based detectors to improve their all-weather reliability. Specifically, we introduce a hybrid Monte-Carlo based approach that treats (i) the effects of large particles by placing them randomly and comparing their back reflected power against the target, and (ii) attenuation effects on average through calculation of scattering efficiencies from the Mie theory and particle size distributions.

source code