Deepti Hegde
I am a PhD student advised by Dr. Vishal Patel at
the Vision and Image Understanding Lab at Johns Hopkins University where I work on 3D computer vision and deep learning.
Most recently, my research has focused on vision-language models for 3D scene understanding and autonomous driving.
In Summer 2024 I interned at Qualcomm where I worked on leveraging vision-language models for end-to-end planning for autonomous driving. In Spring 2024, I interned at Microsoft Research working on the application of large language models in visual reasoning tasks for 3D telehealth. Before that, I have spent two summers as an intern at Mitsubishi Electric Research Labs working on 3D object detection with LiDAR for autonomous driving scenarios.
I am currently looking for full-time opportunities in these areas.
dhegde1[at]jhu[dot]edu  / 
CV  / 
Google Scholar  / 
Twitter  / 
Github
Research Statement
|
|
News
- August 2024 - Accepted to the ECCV 2024 Doctoral Consortium!
- July 2024 - Our paper "Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection" accepted to ECCV 2024
- June 2024 - Started my summer internship at Qualcomm
- April 2024 - Started my spring internship at Microsoft Research
- February 2024 - Our work "MonoDiff: Monocular 3D Object Detection and Pose Estimation with Diffusion Models" accepted to CVPR 2024
- October 2023 - One paper accepted to WACV 2024
- August 2023 - CLIP goes 3D accepted to OpenSUN3D workshop at ICCV
|
Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection
ECCV 2024
Deepti Hegde*,
Suhas Lohit, Kuan-Chuan Peng, Mike Jones,
Vishal M. Patel
arXiv
Video
We propose a spatio-temporal equivariant learning framework for self supervised pre-training on LiDAR point clouds for the task of 3D object detection. Our experiments show that the best performance arises with a pre-training approach that encourages equivariance to translation, scaling, and flip, rotation and scene flow. For spatial augmentations, we find that depending on the transformation, either a contrastive objective or an equivariance-by-classification objective yields best results.
|
|
CLIP goes 3D: Leveraging Prompt Tuning for Language-Grounded 3D Recognition
OpenSun3D @ ICCV 2023
Deepti Hegde*,
Jeya Maria Jose Valanarasu*
Vishal M. Patel
arXiv
code
CLIP is not suitable for extracting 3D geometric features as it was trained on only images and text by natural language supervision. We work on addressing this limitation and propose a new framework CG3D (CLIP Goes 3D) where a 3D encoder is trained to exhibit zero-shot capabilities.
|
|
Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection
WACV 2024
Deepti Hegde,
Vishal M. Patel,
arXiv
code
Addressing the limitations of traditional feature aggregation methods for prototype computation in the presence of noisy labels,
we utilize a transformer module to identify outlier ROI's that correspond to incorrect, over-confident annotations, and compute an attentive
class prototype. Under an iterative training strategy, the losses associated with noisy pseudo labels are down-weighed and thus refined in the process of
self-training.
|
|
Uncertainty-aware Mean Teacher for Source-free Unsupervised Domain Adaptive 3D Object Detection
ICRA 2023
Deepti Hegde,
Vishwanath Sindagi,
Velat Kilic,
A. Brinton Cooper,
Mark Foster,
Vishal Patel,
arXiv
In order to avoid reinforcing errors caused by label noise, we propose an uncertainty-aware mean teacher framework which implicitly filters incorrect pseudo-labels during training. Leveraging model uncertainty
allows the mean teacher network to perform implicit filtering by down-weighing losses corresponding uncertain pseudo-labels.
|
|
Lidar Light Scattering Augmentation (LISA): Physics-based Simulation of Adverse Weather Conditions for 3D Object Detection
Velat Kilic,
Deepti Hegde,
Vishwanath Sindagi,
A. Brinton Cooper,
Mark Foster,
Vishal Patel,
arXiv
code
We propose a physics-based approach to simulate lidar point clouds of scenes in adverse weather conditions. These augmented datasets can then be used to train lidar-based
detectors to improve their all-weather reliability. Specifically, we introduce a hybrid Monte-Carlo based approach that treats (i) the effects of large particles by placing
them randomly and comparing their back reflected power against the target, and (ii) attenuation effects on average through calculation of scattering efficiencies from the Mie
theory and particle size distributions.
|
|