bnew

Veteran
Joined
Nov 1, 2015
Messages
55,229
Reputation
8,195
Daps
156,174

1/1
🚨CVPR 2024 Paper Alert 🚨

➑️Paper Title: DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans

🌟Few pointers from the paper

🎯In this paper authors have presented β€œDiffHuman”, a probabilistic method for photorealistic 3D human reconstruction from a single RGB image. Despite the ill-posed nature of this problem, most methods are deterministic and output a single solution, often resulting in a lack of geometric detail and blurriness in unseen or uncertain regions.

🎯In contrast, DiffHuman predicts a probability distribution over 3D reconstructions conditioned on an input 2D image, which allowed them to sample multiple detailed 3D avatars that are consistent with the image. DiffHuman is implemented as a conditional diffusion model that denoises pixel-aligned 2D observations of an underlying 3D shape representation.

🎯 During inference, authors may sample 3D avatars by iteratively denoising 2D renders of the predicted 3D representation. Furthermore, authors have also introduced a generator neural network that approximates rendering with considerably reduced runtime (55x speed up), resulting in a novel dual-branch diffusion framework.

🎯 Their experiments showed that DiffHuman can produce diverse and detailed reconstructions for the parts of the person that are unseen or uncertain in the input image, while remaining competitive with the state-of-the-art when reconstructing visible surfaces.

🏒Organization: @Google Research, @Cambridge_Uni

πŸ§™Paper Authors: @AkashSengupta97 , @thiemoall , @nikoskolot , @enric_corona , Andrei Zanfir, @CSminchisescu

1️⃣Read the Full Paper here: [2404.00485] DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans

2️⃣Project Page: DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans

πŸŽ₯ Be sure to watch the attached Informational Video-Sound on πŸ”ŠπŸ”Š

🎡 Music by Dmytro Kuvalin from @pixabay

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#CVPR2024


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,229
Reputation
8,195
Daps
156,174

1/1
🚨 Paper Alert 🚨

➑️Paper Title: 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

🌟Few pointers from the paper

🎯Current multimodal and multitask foundation models like 4M or UnifiedIO show promising results, but in practice their out-of-the-box abilities to accept diverse inputs and perform diverse tasks are limited by the (usually rather small) number of modalities and tasks they are trained on.

🎯In this paper, authors expand upon the capabilities of them by training a single model on tens of highly diverse modalities and by performing co-training on large-scale multimodal datasets and text corpora.

🎯This includes training on several semantic and geometric modalities, feature maps from recent state of the art models like DINOv2 and ImageBind, pseudo labels of specialist models like SAM and 4DHumans, and a range of new modalities that allow for novel ways to interact with the model and steer the generation, for example image metadata or color palettes.

🎯A crucial step in this process is performing discrete tokenization on various modalities, whether they are image-like, neural network feature maps, vectors, structured data like instance segmentation or human poses, or data that can be represented as text.

🎯Through this, authors expanded on the out-of-the-box capabilities of multimodal models and specifically show the possibility of training one model to solve at least 3x more tasks/modalities than existing ones and doing so without a loss in performance.

🎯This enables more fine-grained and controllable multimodal generation capabilities and allowed the authors to study the distillation of models trained on diverse data and objectives into a unified model. They successfully scaled the training to a three billion parameter model using tens of modalities and different datasets.

🏒Organization: Swiss Federal Institute of Technology Lausanne (@EPFL ), @Apple

πŸ§™Paper Authors: @roman__bachmann ,@oguzhanthefatih , @dmizrahi_ , @aligarjani , @mingfei_gao , David Griffiths, Jiaming Hu, @afshin_dn , @zamir_ar

1️⃣Read the Full Paper here: [2406.09406] 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

2️⃣Project Page: 4M: Massively Multimodal Masked Modeling

3️⃣Code: GitHub - apple/ml-4m: 4M: Massively Multimodal Masked Modeling

4️⃣Demo: 4M Demo - a Hugging Face Space by EPFL-VILAB

πŸŽ₯ Be sure to watch the attached Video-Sound on πŸ”ŠπŸ”Š

🎡Music by Yevhen Onoychenko from @pixabay

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,229
Reputation
8,195
Daps
156,174

1/1
🚨Paper Alert 🚨

➑️Paper Title: High-Fidelity Facial Albedo Estimation via Texture Quantization

🌟Few pointers from the paper

🎯Recent 3D face reconstruction methods have made significant progress in shape estimation, but high-fidelity facial albedo reconstruction remains challenging. Existing methods depend on expensive light-stage captured data to learn facial albedo maps. However, a lack of diversity in subjects limits their ability to recover high-fidelity results.

🎯 In this paper, authors have presented a novel facial albedo reconstruction model, β€œHiFiAlbedo”, which recovers the albedo map directly from a single image without the need for captured albedo data.

🎯Their key insight is that the albedo map is the illumination invariant texture map, which enabled the authors to use inexpensive texture data to derive an albedo estimation by eliminating illumination.

🎯 To achieve this, they first collected large-scale ultra-high-resolution facial images and trained a high-fidelity facial texture codebook. By using the FFHQ dataset and limited UV textures, they then fine-tune the encoder for texture reconstruction from the input image with adversarial supervision in both image and UV space.

🎯Finally, they trained a cross-attention module and utilize group identity loss to learn the adaptation from facial texture to the albedo domain. Extensive experimentation has demonstrated that their method exhibits excellent generalizability and is capable of achieving high-fidelity results for in-the-wild facial albedo recovery.

🏒Organization: University of Technology Sydney, Australia, @sjtu1896 , @DeepGlint , China, Insightface, China, @ZJU_China , @imperialcollege

πŸ§™Paper Authors: Zimin Ran, Xingyu Ren, Xiang An, Kaicheng Yang, Xiangzi Dai, Ziyong Feng, Jia Guo, Linchao Zhu, @JiankangDeng

1️⃣Read the Full Paper here: [2406.13149] High-Fidelity Facial Albedo Estimation via Texture Quantization

2️⃣Project Page: High-Fidelity Facial Albedo Estimation via Texture Quantization

πŸŽ₯ Be sure to watch the attached Demo Video-Sound on πŸ”ŠπŸ”Š

🎡 Music by Gregor Quendel from @pixabay

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,229
Reputation
8,195
Daps
156,174

1/1
🚨CVPR 2024 Paper Alert 🚨

➑️Paper Title: 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features

🌟Few pointers from the paper

🎯In this paper authors have presented β€œ3DiffTection”, a state-of-the-art method for 3D object detection from single images, leveraging features from a 3D-aware diffusion model. Annotating large-scale image data for 3D detection is resource-intensive and time-consuming.

🎯Recently, pretrained large image diffusion models have become prominent as effective feature extractors for 2D perception tasks. However, these features are initially trained on paired text and image data, which are not optimized for 3D tasks, and often exhibit a domain gap when applied to the target data.

🎯Their approach bridges these gaps through two specialized tuning strategies: geometric and semantic. For geometric tuning, they fine-tuned a diffusion model to perform novel view synthesis conditioned on a single image, by introducing a novel epipolar warp operator.

🎯This task meets two essential criteria: the necessity for 3D awareness and reliance solely on posed image data, which are readily available (e.g., from videos) and does not require manual annotation.

🎯 For semantic refinement, authors further trained the model on target data with detection supervision. Both tuning phases employ ControlNet to preserve the integrity of the original feature capabilities.

🎯In the final step, they harnessed these enhanced capabilities to conduct a test-time prediction ensemble across multiple virtual viewpoints. Through their methodology, they obtained 3D-aware features that are tailored for 3D detection and excel in identifying cross-view point correspondences.

🎯Consequently, their model emerges as a powerful 3D detector, substantially surpassing previous benchmarks, e.g., Cube-RCNN, a precedent in single-view 3D detection by 9.43% in AP3D on the Omni3D-ARkitscene dataset. Furthermore, 3DiffTection showcases robust data efficiency and generalization to cross-domain data.

🏒Organization: @nvidia , @UCBerkeley , @VectorInst , @UofT , @TechnionLive

πŸ§™Paper Authors: @Chenfeng_X , @HuanLing6 , @FidlerSanja , @orlitany

1️⃣Read the Full Paper here: [2311.04391] 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features

2️⃣Project Page: https://research.nvidia.com/labs/toronto-ai/3difftection/

3️⃣Code: Coming πŸ”œ

πŸŽ₯ Be sure to watch the attached Demo Video-Sound on πŸ”ŠπŸ”Š

🎡 Music by Umasha Pros from @pixabay

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#CVPR2024 /search?q=#3dobjectdetection


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,229
Reputation
8,195
Daps
156,174

1/1
🚨CVPR 2024 Paper Alert 🚨

➑️Paper Title: The Manga Whisperer: Automatically Generating Transcriptions for Comics

🌟Few pointers from the paper

🎯In the past few decades, Japanese comics, commonly referred to as Manga, have transcended both cultural and linguistic boundaries to become a true worldwide sensation. Yet, the inherent reliance on visual cues and illustration within manga renders it largely inaccessible to individuals with visual impairments.

🎯 Authors of this paper seek to address this substantial barrier, with the aim of ensuring that manga can be appreciated and actively engaged by everyone. Specifically, they tackled the problem of diarisation i.e. generating a transcription of who said what and when, in a fully automatic way.

🎯To this end, they made the following contributions:
βž•They presented a unified model, β€œMagi”, that is able to
(a) detect panels, text boxes and character boxes,
(b) cluster characters by identity (without knowing the number of clusters apriori), and
(c) associate dialogues to their speakers.
βž•Authors have also proposed a novel approach that is able to sort the detected text boxes in their reading order and generate a dialogue transcript;
βž•They annotated an evaluation benchmark for this task using publicly available [English] manga pages.

🏒Organization: Visual Geometry Group, Dept. of Engineering Science, University of Oxford (@Oxford_VGG )

πŸ§™Paper Authors: @RagavSachdeva , Andrew Zisserman

1️⃣Read the Full Paper here: [2401.10224] The Manga Whisperer: Automatically Generating Transcriptions for Comics

2⃣️Code: GitHub - ragavsachdeva/magi: Generate a transcript for your favourite Manga: Detect manga characters, text blocks and panels. Order panels. Cluster characters. Match texts to their speakers. Perform OCR. (CVPR'24)

3️⃣Try Here: Magi Demo - a Hugging Face Space by ragavsachdeva

πŸŽ₯ Be sure to watch the attached Demo Video-Sound on πŸ”ŠπŸ”Š

🎡 Music by Vlad Krotov from @pixabay

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#CVPR2024 /search?q=#manga


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,229
Reputation
8,195
Daps
156,174

1/1
🚨CVPR 2024 Paper Alert 🚨

➑️Paper Title: MaGGIe: Masked Guided Gradual Human Instance Matting

🌟Few pointers from the paper

🎯Human matting is a foundation task in image and video processing where human foreground pixels are extracted from the input. Prior works either improve the accuracy by additional guidance or improve the temporal consistency of a single instance across frames.

πŸŒ‹In this paper authors have proposed a new framework β€œMaGGIe, Masked Guided Gradual Human Instance Matting”, which predicts alpha mattes progressively for each human instances while maintaining the computational cost, precision, and consistency.

🎯Their method leverages modern architectures, including transformer attention and sparse
convolution, to output all instance mattes simultaneously without exploding memory and latency.

🎯Although keeping constant inference costs in the multiple-instance scenario, their framework achieves robust and versatile performance on their proposed synthesized benchmark.

🎯With the higher quality image and video matting benchmarks, the novel multi-instance synthesis approach from publicly available sources is introduced to increase the generalization of models in real-world scenarios.

🏒Organization: @UofMaryland , College Park, @AdobeResearch

πŸ§™Paper Authors: @RyanHuynh1108 , Seoung Wug Oh, @abhi2610 , Joon-Young Lee

1️⃣Read the Full Paper here: [2404.16035] MaGGIe: Masked Guided Gradual Human Instance Matting

2️⃣Project Page: CVPR'24 - MaGGIe

3️⃣Code: GitHub - hmchuong/MaGGIe: [CVPR24] MaGGIe: Mask Guided Gradual Human Instance Matting

4️⃣Dataset: chuonghm/MaGGIe-HIM Β· Datasets at Hugging Face

πŸŽ₯ Be sure to watch the attached Demo Video-Sound on πŸ”ŠπŸ”Š

🎡 Music by Jeremiah Alves from @pixabay

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,229
Reputation
8,195
Daps
156,174

1/1
🚨CVPR 2024 Paper Alert 🚨

➑️Paper Title: Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering

🌟Few pointers from the paper

🎯In this paper authors have presented β€œPaint-it”, a text-driven high-fidelity texture map synthesis method for 3D meshes via neural re-parameterized texture optimization. Paint-it synthesizes texture maps from a text description by synthesis-through-optimization, exploiting the Score-Distillation Sampling (SDS).

🎯Authors observed that directly applying SDS yields undesirable texture quality due to its noisy gradients. They revealed the importance of texture parameterization when using SDS.

🎯Specifically, they have proposed Deep Convolutional Physically-Based Rendering (DC-PBR) parameterization, which re-parameterized the physically-based rendering (PBR) texture maps with randomly initialized convolution-based neural kernels, instead of a standard pixel-based parameterization.

🎯They showed that DC-PBR inherently schedules the optimization curriculum according to texture frequency and naturally filters out the noisy signals from SDS.

🎯 In experiments, Paint-it obtains remarkable quality PBR texture maps within 15 min., given only a text description. Authors demonstrated the generalizability and practicality of Paint-it by synthesizing high-quality texture maps for large-scale mesh datasets and showing test-time applications such as relighting and material control using a popular graphics engine.

🏒Organization: @uni_tue , Tübingen AI Center, Germany, Max Planck Institute for Informatics, Germany, Dept. of Electrical Engineering,@postech2020 , Grad. School of AI, @postech2020 , Institute for Convergence Research and Education in Advanced Technology, @yonsei_u

πŸ§™Paper Authors: @kim_youwang , @Tae_Hyun_Oh , @GerardPonsMoll1

1️⃣Read the Full Paper here: [2312.11360] Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering

2️⃣Project Page: Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering

3️⃣Code: GitHub - postech-ami/Paint-it: [CVPR'24] Official PyTorch Implementation of "Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering"

πŸŽ₯ Be sure to watch the attached Demo Video-Sound on πŸ”ŠπŸ”Š

🎡 Music by Pavel Bekirov from @pixabay

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#CVPR2024


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,229
Reputation
8,195
Daps
156,174

1/1
🚨CVPR 2024 Best Paper Alert 🚨

➑️Paper Title: Rich Human Feedback for Text-to-Image Generation

🌟Few pointers from the paper

🎯Recent Text-to-Image (T2I) generation models such as Stable Diffusion and Imagen have made significant progress in generating high-resolution images based on text descriptions.

🎯However, many generated images still suffer from issues such as artifacts/implausibility, misalignment with text descriptions, and low aesthetic quality.

🎯 Inspired by the success of Reinforcement Learning with Human Feedback (RLHF) for large language models, prior works collected human-provided scores as feedback on generated images and trained a reward model to improve the T2I generation.

🎯In this paper, authors enriched the feedback signal by:
βž•marking image regions that are implausible or misaligned with the text, and
βž•annotating which words in the text prompt are misrepresented or missing on the image.

🎯Authors collected such rich human feedback on 18K generated images (RichHF-18K) and train a multimodal transformer to predict the rich feedback automatically.

🎯They showed that the predicted rich human feedback can be leveraged to improve image generation, for example, by selecting high-quality training data to finetune and improve the generative models, or by creating masks with predicted heatmaps to inpaint the problematic regions.

🎯Notably, the improvements generalize to models (Muse) beyond those used to generate the images on which human feedback data were collected (Stable Diffusion variants).

🏒Organization: @UCSanDiego , @Google Research, @USC , @Cambridge_Uni , @BrandeisU

πŸ§™Paper Authors: Youwei Liang, Junfeng He, Gang Li, Peizhao Li, Arseniy Klimovskiy, Nicholas Carolan, Jiao Sun, Jordi Pont-Tuset, Sarah Young, Feng Yang, Junjie Ke, Krishnamurthy Dj Dvijotham, Katie Collins, Yiwen Luo, Yang Li, Kai J Kohlhoff, Deepak Ramachandran, Vidhya Navalpakkam

1️⃣Read the Full Paper here: [2312.10240] Rich Human Feedback for Text-to-Image Generation

2️⃣RichHF-18K dataset: GitHub - google-research-datasets/richhf-18k: RichHF-18K dataset contains rich human feedback labels we collected for our CVPR'24 paper: https://arxiv.org/pdf/2312.10240, along with the file name of the associated labeled images (no urls or images are included in this dataset).

πŸ₯³Heartfelt congratulations to all the talented authors! πŸ₯³

πŸŽ₯ Be sure to watch the attached Paper Demo Video-Sound on πŸ”ŠπŸ”Š

🎡 Music by Sergio Prosvirini from @pixabay

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#cvpr2024 /search?q=#bestpaper


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,229
Reputation
8,195
Daps
156,174

1/1
🚨CVPR 2024 Paper Alert 🚨

➑️Paper Title: RoMa: Robust Dense Feature Matching

🌟Few pointers from the paper

🎯Feature matching is an important computer vision task that involves estimating correspondences between two images of a 3D scene, and dense methods estimate all such correspondences. The aim is to learn a robust model, i.e., a model able to match under challenging real-world changes

🎯In this paper authors have proposed a model which leveraged frozen pretrained features from the foundation model DINOv2. Although these features are significantly more robust than local features trained from scratch, they are inherently coarse.

🎯Therefore authors combined them with specialized ConvNet fine features, creating a precisely localizable feature pyramid. To further improve robustness, authors have proposed a tailored transformer match decoder that predicts anchor probabilities, which enables it to express multimodality.

🎯Finally, they have also proposed an improved loss formulation through regression-by-classification with subsequent robust regression. Authors conducted a comprehensive set of experiments that showed that their method, RoMa, achieves significant gains, setting a new state-of-the-art. In particular, they achieved a 36% improvement on the extremely challenging WxBS benchmark.

🏒Organization: @liu_universitet , East China University of Science and Technology, @chalmersuniv

πŸ§™Paper Authors: Johan Edstedt, Qiyu Sun, Georg BΓΆkman, MΓ₯rten WadenbΓ€ck, Michael Felsberg

1️⃣Read the Full Paper here: [2305.15404] RoMa: Robust Dense Feature Matching

2️⃣Project Page: RoMa: Robust Dense Feature Matching

3️⃣Code: GitHub - Parskatt/RoMa: [CVPR 2024] RoMa: Robust Dense Feature Matching; RoMa is the robust dense feature matcher capable of estimating pixel-dense warps and reliable certainties for almost any image pair.

πŸŽ₯ Be sure to watch the attached Demo Video-Sound on πŸ”ŠπŸ”Š

🎡 Music by Grand_Project from @pixabay

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#CVPR2024


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,229
Reputation
8,195
Daps
156,174

1/1
🚨CVPR 2024 Best Paper Alert 🚨

➑️Paper Title: Generative Image Dynamics

🌟Few pointers from the paper

🎯In this paper authors have presented an approach to modeling an image-space prior on scene motion. Their prior is learned from a collection of motion trajectories extracted from real video sequences depicting natural, oscillatory dynamics of objects such as trees, flowers, candles, and clothes swaying in the wind.

🎯They model dense, long-term motion in the Fourier domain as spectral volumes, which authors found were well-suited to prediction with diffusion models.

🎯Given a single image, their trained model uses a frequency-coordinated diffusion sampling process to predict a spectral volume, which can be converted into a motion texture that spans an entire video.

🎯 Along with an image-based rendering module, the predicted motion representation can be used for a number of downstream applications, such as turning still images into seamlessly looping videos, or allowing users to interact with objects in real images, producing realistic simulated dynamics (by interpreting the spectral volumes as image-space modal bases).

🏒Organization: @Google Research

πŸ§™Paper Authors: @zhengqi_li , Richard Tucker, Noah Snavely, Aleksander Holynski

1️⃣Read the Full Paper here: [2309.07906] Generative Image Dynamics

2️⃣Project Page: Generative Image Dynamics

πŸ₯³Heartfelt congratulations to all the talented authors! πŸ₯³

πŸŽ₯ Be sure to watch the attached Demo Video-Sound on πŸ”ŠπŸ”Š

🎡 Music by Bohdan Kuzmin from @pixabay

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,229
Reputation
8,195
Daps
156,174

1/1
🚨CVPR 2024 Best Paper Runners-Up Alert 🚨

➑️Paper Title: pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

🌟Few pointers from the paper

🎯In this paper authors have introduced β€œpixelSplat”, a feed-forward model that learns to reconstruct 3D radiance fields parameterized by 3D Gaussian primitives from pairs of images.

🎯Their model features real-time and memory-efficient rendering for scalable training as well as fast 3D reconstruction at inference time.

🎯To overcome local minima inherent to sparse and locally supported representations, authors predict a dense probability distribution over 3D and sample Gaussian means from that
probability distribution.

🎯They make this sampling operation differentiable via a reparameterization trick, allowing them to back-propagate gradients through the Gaussian splatting representation.

🎯They benchmark their method on wide-baseline novel view synthesis on the real-world RealEstate10k and ACID datasets, where they outperform state-of-the-art light field transformers and accelerate rendering by 2.5 orders of magnitude while reconstructing an interpretable and editable 3D radiance field.

🏒Organization: @MIT , @SFU , @UofT

πŸ§™Paper Authors: @DavidCharatan , @sizhe_lester_li , @taiyasaki ,@vincesitzmann

1️⃣Read the Full Paper here: [2312.12337] pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

2️⃣Project Page: pixelSplat: 3D Gaussian Splats from Image Pairs

3️⃣Code: GitHub - dcharatan/pixelsplat: [CVPR 2024 Oral, Best Paper Runner-Up] Code for "pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction" by David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann

4️⃣Pre-trained Models: checkpoints – Google Drive

πŸ₯³Heartfelt congratulations to all the talented authors! πŸ₯³

πŸŽ₯ Be sure to watch the attached Demo Video-Sound on πŸ”ŠπŸ”Š

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#CVPR2024


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,229
Reputation
8,195
Daps
156,174

China's AI model glut is a 'significant waste of resources' due to scarce real-world applications for 100+ LLMs says Baidu CEO​

News

By Jowi Morales

published yesterday

Business is unsustainable for the 100+ LLMs competing in China.


Comments (7)

Baidu

(Image credit: Shutterstock)

Robin Li Yanhong, the founder and CEO of Baidu, the biggest search engine in China, said that the country has too many large language models and too few practical applications. Yanhong made this announcement during a recent panel discussion at the World Artificial Intelligence Conference (WAIC) held in Shanghai, as covered by the South China Morning Post.

"In 2023, intense competition among over 100 LLMs has emerged in China, resulting in a significant waste of resources, particularly computing power," said Li. "I've noticed that many people still primarily focus on foundational models. But I want to ask: How about real-world applications? Who has benefitted from them?"

The World Intellectual Property Organization (WIPO) reported last Friday that China has outpaced the U.S. in AI patents six-to-one in the past ten years. However, WIPO data also showed that the country is falling behind in terms of citations, with the China Academy of Sciences the only one from China in the list of top 20 organizations with the most research citations.

Publicly available LLMs in China need to go through regulatory approval in China, to ensure that the Chinese Communist Party (CCP) can effectively control the Chinese people. Over 200 AI firms have applied for a license as of March 2024, with 117 getting a nod from Beijing. Having this many LLMs means that they're all fighting for a slice of the pie, and not everyone will win. Yan Junjjie, CEO of AI startup MiniMax, said that he "expects major industry consolidation in the future, with LLMs being primarily developed by just five companies."

Many large firms have started to rush in to capture the market that OpenAI will leave when its API is no longer accessible in China, from July 9. The largest tech companies β€” Tencent, Baidu, and Alibaba β€” have started offering discounts and plans to entice customers to take their products, something that smaller companies might not be able to sustain.

While competition is good for any market, too many options could also lead to decision fatigue for customers, who are simply overwhelmed by the number of services priced at similar levels. Bigger companies would likely be the winner here, as they have larger war chests that they can use to either acquire smaller competitors or run them into the ground. As Bernard Leong, CEO of Singapore-based Dorje AI said, "There's probably going to be a bloodbath of the large language models and I suspect that there's probably going to be very few players left."
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,229
Reputation
8,195
Daps
156,174

Google claims new AI training tech is 13 times faster and 10 times more power efficient β€” DeepMind's new JEST optimizes training data for impressive gains​

Dallin Grimm

Sun, July 7, 2024 at 3:40 PM EDTΒ·3 min read

 .

Credit:

Google DeepMind, Google's AI research lab, has published new research on training AI models that claims to greatly accelerate both training speed and energy efficiency by an order of magnitude, yielding 13 times more performance and ten times higher power efficiency than other methods. The new JEST training method comes in a timely fashion as conversations about the environmental impact of AI data centers are heating up.

DeepMind's method, dubbed JEST or joint example selection, breaks apart from traditional AI model training techniques in a simple fashion. Typical training methods focus on individual data points for training and learning, while JEST trains based on entire batches. The JEST method first creates a smaller AI model that will grade data quality from extremely high-quality sources, ranking the batches by quality. Then it compares that grading to a larger, lower-quality set. The small JEST model determines the batches most fit for training, and a large model is then trained from the findings of the smaller model.

The paper itself, available here, provides a more thorough explanation of the processes used in the study and the future of the research.

DeepMind researchers make it clear in their paper that this "ability to steer the data selection process towards the distribution of smaller, well-curated datasets" is essential to the success of the JEST method. Success is the correct word for this research; DeepMind claims that "our approach surpasses state-of-the-art models with up to 13Γ— fewer iterations and 10Γ— less computation."

Graphs displaying efficiency and speed gains over traditional AI training methods.

Graphs displaying efficiency and speed gains over traditional AI training methods.

Of course, this system relies entirely on the quality of its training data, as the bootstrapping technique falls apart without a human-curated data set of the highest possible quality. Nowhere is the mantra "garbage in, garbage out" truer than this method, which attempts to "skip ahead" in its training process. This makes the JEST method much more difficult for hobbyists or amateur AI developers to match than most others, as expert-level research skills are likely required to curate the initial highest-grade training data.

The JEST research comes not a moment too soon, as the tech industry and world governments are beginning discussions on artificial intelligence's extreme power demands. AI workloads took up about 4.3 GW in 2023, almost matching the annual power consumption of the nation of Cyprus. And things are definitely not slowing down: a single ChatGPT request costs 10x more than a Google search in power, and Arm's CEO estimates that AI will take up a quarter of the United States' power grid by 2030.

If and how JEST methods are adopted by major players in the AI space remains to be seen. GPT-4o reportedly cost $100 million to train, and future larger models may soon hit the billion-dollar mark, so firms are likely hunting for ways to save their wallets in this department. Hopefuls think that JEST methods will be used to keep current training productivity rates at much lower power draws, easing the costs of AI and helping the planet. However, much more likely is that the machine of capital will keep the pedal to the metal, using JEST methods to keep power draw at maximum for hyper-fast training output. Cost savings versus output scale, who will win?
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,229
Reputation
8,195
Daps
156,174

AI models that cost $1 billion to train are underway, $100 billion models coming β€” largest current models take 'only' $100 million to train: Anthropic CEO​

News

By Jowi Morales

published yesterday

AI training costs are growing exponentially year after year.


Comments (19)

Anthropic CEO Dario Amodei

(Image credit: Norges Bank Investment Management / YouTube)

Anthropic CEO Dario Amodei said in the In Good Company podcast that AI models in development today can cost up to $1 billion to train. Current models like ChatGPT-4o only cost about $100 million, but he expects the cost of training these models to go up to $10 or even $100 billion in as little as three years from now.

"Right now, 100 million. There are models in training today that are more like a billion." Amodei also added, "I think if we go to ten or a hundred billion, and I think that will happen in 2025, 2026, maybe 2027, and the algorithmic improvements continue a pace, and the chip improvements continue a pace, then I think there is in my mind a good chance that by that time we'll be able to get models that are better than most humans at most things."

The Anthropic CEO mentioned these numbers when he discussed the development of AI from generative artificial intelligence (like ChatGPT) to artificial general intelligence (AGI). He said that there wouldn't be a single point where we suddenly reach AGI. Instead, it would be a gradual development where models build upon the developments of past models, much like how a human child learns.

So, if AI models grow ten times more powerful each year, we can rationally expect the hardware required to train them to be at least ten times more powerful, too. As such, hardware could be the biggest cost driver in AI training. Back in 2023, it was reported that ChatGPT would require more than 30,000 GPUs, with Sam Altman confirming that ChatGPT-4 cost $100 million to train.

Last year, over 3.8 million GPUs were delivered to data centers. With Nvidia's latest B200 AI chip costing around $30,000 to $40,000, we can surmise that Dario's billion-dollar estimate is on track for 2024. If advancements in model/quantization research grow at the current exponential rate, then we expect hardware requirements to keep pace unless more efficient technologies like the Sohu AI chip become more prevalent.

We can already see this exponential growth happening. Elon Musk wants to purchase 300,000 B200 AI chips, while OpenAI and Microsoft are reportedly planning a $100 billion AI data center. With all this demand, we could see GPU data center deliveries next year balloon to 38 million if Nvidia and other suppliers can keep up with the market.

However, aside from the supply of the actual chip hardware, these AI firms need to be concerned with power supply and related infrastructure, too. The total estimated power consumption of all data center GPUs sold just last year could power 1.3 million homes. If the data center power requirements continue to grow exponentially, then it's possible that we could run out of enough economically-priced electricity. Furthermore, while these data centers need power plants, they also need an entirely upgraded grid that can handle all the electrons the power-hungry AI chips need to run. For this reason, many tech companies, including Microsoft, are now considering modular nuclear power for their data centers.

Artificial intelligence is quickly gathering steam, and hardware innovations seem to be keeping up. So, Anthropic's $100 billion estimate seems to be on track, especially if manufacturers like Nvidia, AMD, and Intel can deliver. However, as our AI technologies perform exponentially better every new generation, one big question still remains: how will it affect the future of our society?
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,229
Reputation
8,195
Daps
156,174


1/11
Luma's start and end keyframes are a game changer. With a sequence of keyframes from the original film, we can seamlessly remaster stop motion classics like "Jason and the Argonauts" as modern single-take action scenes.

2/11
It's interesting how the uncanny movements of the original stop motion skeletons are preserved in traditional frame interpolation. Maybe it's the lack of motion blur on the skeletons?

3/11
I really, truly hope this is sarcasm.

Because there is no way a person with an even halfway functioning brain would look at this and think it’s in any way serviceable.

4/11
I thought it was obvious!

But I guess I haven't posted much in year(s), so people aren't familiar with my typical humor.

5/11
Hi Jonathan this looks like shyt thanks for sharing πŸ‘

6/11
wow this looks like fukking garbage

7/11
Was the idea to make it actively worse?

8/11
seamlessly?

9/11
"remaster"
"Game changer"

10/11
Never do that again.

11/11
this is ass


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GR6YltZbUAACgur.png
 
Top