1/1
CVPR 2024 Paper Alert
Paper Title: 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features
Few pointers from the paper
In this paper authors have presented “3DiffTection”, a state-of-the-art method for 3D object detection from single images, leveraging features from a 3D-aware diffusion model. Annotating large-scale image data for 3D detection is resource-intensive and time-consuming.
Recently, pretrained large image diffusion models have become prominent as effective feature extractors for 2D perception tasks. However, these features are initially trained on paired text and image data, which are not optimized for 3D tasks, and often exhibit a domain gap when applied to the target data.
Their approach bridges these gaps through two specialized tuning strategies: geometric and semantic. For geometric tuning, they fine-tuned a diffusion model to perform novel view synthesis conditioned on a single image, by introducing a novel epipolar warp operator.
This task meets two essential criteria: the necessity for 3D awareness and reliance solely on posed image data, which are readily available (e.g., from videos) and does not require manual annotation.
For semantic refinement, authors further trained the model on target data with detection supervision. Both tuning phases employ ControlNet to preserve the integrity of the original feature capabilities.
In the final step, they harnessed these enhanced capabilities to conduct a test-time prediction ensemble across multiple virtual viewpoints. Through their methodology, they obtained 3D-aware features that are tailored for 3D detection and excel in identifying cross-view point correspondences.
Consequently, their model emerges as a powerful 3D detector, substantially surpassing previous benchmarks, e.g., Cube-RCNN, a precedent in single-view 3D detection by 9.43% in AP3D on the Omni3D-ARkitscene dataset. Furthermore, 3DiffTection showcases robust data efficiency and generalization to cross-domain data.
Organization: @nvidia , @UCBerkeley , @VectorInst , @UofT , @TechnionLive
Paper Authors: @Chenfeng_X , @HuanLing6 , @FidlerSanja , @orlitany
Read the Full Paper here: [2311.04391] 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features
Project Page: https://research.nvidia.com/labs/toronto-ai/3difftection/
Code: Coming
Be sure to watch the attached Demo Video-Sound on
Music by Umasha Pros from @pixabay
Find this Valuable ?
QT and teach your network something new
Follow me , @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.
/search?q=#CVPR2024 /search?q=#3dobjectdetection
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
CVPR 2024 Paper Alert
Paper Title: 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features
Few pointers from the paper
In this paper authors have presented “3DiffTection”, a state-of-the-art method for 3D object detection from single images, leveraging features from a 3D-aware diffusion model. Annotating large-scale image data for 3D detection is resource-intensive and time-consuming.
Recently, pretrained large image diffusion models have become prominent as effective feature extractors for 2D perception tasks. However, these features are initially trained on paired text and image data, which are not optimized for 3D tasks, and often exhibit a domain gap when applied to the target data.
Their approach bridges these gaps through two specialized tuning strategies: geometric and semantic. For geometric tuning, they fine-tuned a diffusion model to perform novel view synthesis conditioned on a single image, by introducing a novel epipolar warp operator.
This task meets two essential criteria: the necessity for 3D awareness and reliance solely on posed image data, which are readily available (e.g., from videos) and does not require manual annotation.
For semantic refinement, authors further trained the model on target data with detection supervision. Both tuning phases employ ControlNet to preserve the integrity of the original feature capabilities.
In the final step, they harnessed these enhanced capabilities to conduct a test-time prediction ensemble across multiple virtual viewpoints. Through their methodology, they obtained 3D-aware features that are tailored for 3D detection and excel in identifying cross-view point correspondences.
Consequently, their model emerges as a powerful 3D detector, substantially surpassing previous benchmarks, e.g., Cube-RCNN, a precedent in single-view 3D detection by 9.43% in AP3D on the Omni3D-ARkitscene dataset. Furthermore, 3DiffTection showcases robust data efficiency and generalization to cross-domain data.
Organization: @nvidia , @UCBerkeley , @VectorInst , @UofT , @TechnionLive
Paper Authors: @Chenfeng_X , @HuanLing6 , @FidlerSanja , @orlitany
Read the Full Paper here: [2311.04391] 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features
Project Page: https://research.nvidia.com/labs/toronto-ai/3difftection/
Code: Coming
Be sure to watch the attached Demo Video-Sound on
Music by Umasha Pros from @pixabay
Find this Valuable ?
QT and teach your network something new
Follow me , @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.
/search?q=#CVPR2024 /search?q=#3dobjectdetection
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196