Deepti Hegde

I am a PhD student advised by Dr. Vishal Patel at the Vision and Image Understanding Lab at Johns Hopkins University where I work on 3D computer vision and deep learning. Most recently, my research has focused on vision-language models for 3D scene understanding and autonomous driving.

In Summer 2024 I interned at Qualcomm where I worked on leveraging vision-language models for end-to-end planning for autonomous driving. In Spring 2024, I interned at Microsoft Research working on the application of large language models in visual reasoning tasks for 3D telehealth. Before that, I have spent two summers as an intern at Mitsubishi Electric Research Labs working on 3D object detection with LiDAR for autonomous driving scenarios.

I am currently looking for full-time opportunities in these areas.

dhegde1[at]jhu[dot]edu  /  CV  /  Google Scholar  /  Twitter  /  Github

Research Statement

profile photo

News

  • August 2024 - Accepted to the ECCV 2024 Doctoral Consortium!
  • July 2024 - Our paper "Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection" accepted to ECCV 2024
  • June 2024 - Started my summer internship at Qualcomm
  • April 2024 - Started my spring internship at Microsoft Research
  • February 2024 - Our work "MonoDiff: Monocular 3D Object Detection and Pose Estimation with Diffusion Models" accepted to CVPR 2024
  • October 2023 - One paper accepted to WACV 2024
  • August 2023 - CLIP goes 3D accepted to OpenSUN3D workshop at ICCV

Research
cg3d Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection
ECCV 2024

Deepti Hegde*, Suhas Lohit, Kuan-Chuan Peng, Mike Jones, Vishal M. Patel

arXiv Video

We propose a spatio-temporal equivariant learning framework for self supervised pre-training on LiDAR point clouds for the task of 3D object detection. Our experiments show that the best performance arises with a pre-training approach that encourages equivariance to translation, scaling, and flip, rotation and scene flow. For spatial augmentations, we find that depending on the transformation, either a contrastive objective or an equivariance-by-classification objective yields best results.

cg3d CLIP goes 3D: Leveraging Prompt Tuning for Language-Grounded 3D Recognition
OpenSun3D @ ICCV 2023

Deepti Hegde*, Jeya Maria Jose Valanarasu* Vishal M. Patel

arXiv code

CLIP is not suitable for extracting 3D geometric features as it was trained on only images and text by natural language supervision. We work on addressing this limitation and propose a new framework CG3D (CLIP Goes 3D) where a 3D encoder is trained to exhibit zero-shot capabilities.

UncertaintyImg Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection
WACV 2024

Deepti Hegde, Vishal M. Patel,

arXiv code

Addressing the limitations of traditional feature aggregation methods for prototype computation in the presence of noisy labels, we utilize a transformer module to identify outlier ROI's that correspond to incorrect, over-confident annotations, and compute an attentive class prototype. Under an iterative training strategy, the losses associated with noisy pseudo labels are down-weighed and thus refined in the process of self-training.

UncertaintyImg Uncertainty-aware Mean Teacher for Source-free Unsupervised Domain Adaptive 3D Object Detection
ICRA 2023

Deepti Hegde, Vishwanath Sindagi, Velat Kilic, A. Brinton Cooper, Mark Foster,
Vishal Patel,

arXiv

In order to avoid reinforcing errors caused by label noise, we propose an uncertainty-aware mean teacher framework which implicitly filters incorrect pseudo-labels during training. Leveraging model uncertainty allows the mean teacher network to perform implicit filtering by down-weighing losses corresponding uncertain pseudo-labels.

UncertaintyImg Lidar Light Scattering Augmentation (LISA): Physics-based Simulation of Adverse Weather Conditions for 3D Object Detection
Velat Kilic, Deepti Hegde, Vishwanath Sindagi, A. Brinton Cooper, Mark Foster,
Vishal Patel,

arXiv code

We propose a physics-based approach to simulate lidar point clouds of scenes in adverse weather conditions. These augmented datasets can then be used to train lidar-based detectors to improve their all-weather reliability. Specifically, we introduce a hybrid Monte-Carlo based approach that treats (i) the effects of large particles by placing them randomly and comparing their back reflected power against the target, and (ii) attenuation effects on average through calculation of scattering efficiencies from the Mie theory and particle size distributions.

source code