bnew

Veteran
Joined
Nov 1, 2015
Messages
55,695
Reputation
8,224
Daps
157,215

1/1
🚨CVPR 2024 Paper Alert 🚨

➡️Paper Title: 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features

🌟Few pointers from the paper

🎯In this paper authors have presented “3DiffTection”, a state-of-the-art method for 3D object detection from single images, leveraging features from a 3D-aware diffusion model. Annotating large-scale image data for 3D detection is resource-intensive and time-consuming.

🎯Recently, pretrained large image diffusion models have become prominent as effective feature extractors for 2D perception tasks. However, these features are initially trained on paired text and image data, which are not optimized for 3D tasks, and often exhibit a domain gap when applied to the target data.

🎯Their approach bridges these gaps through two specialized tuning strategies: geometric and semantic. For geometric tuning, they fine-tuned a diffusion model to perform novel view synthesis conditioned on a single image, by introducing a novel epipolar warp operator.

🎯This task meets two essential criteria: the necessity for 3D awareness and reliance solely on posed image data, which are readily available (e.g., from videos) and does not require manual annotation.

🎯 For semantic refinement, authors further trained the model on target data with detection supervision. Both tuning phases employ ControlNet to preserve the integrity of the original feature capabilities.

🎯In the final step, they harnessed these enhanced capabilities to conduct a test-time prediction ensemble across multiple virtual viewpoints. Through their methodology, they obtained 3D-aware features that are tailored for 3D detection and excel in identifying cross-view point correspondences.

🎯Consequently, their model emerges as a powerful 3D detector, substantially surpassing previous benchmarks, e.g., Cube-RCNN, a precedent in single-view 3D detection by 9.43% in AP3D on the Omni3D-ARkitscene dataset. Furthermore, 3DiffTection showcases robust data efficiency and generalization to cross-domain data.

🏢Organization: @nvidia , @UCBerkeley , @VectorInst , @UofT , @TechnionLive

🧙Paper Authors: @Chenfeng_X , @HuanLing6 , @FidlerSanja , @orlitany

1️⃣Read the Full Paper here: [2311.04391] 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features

2️⃣Project Page: https://research.nvidia.com/labs/toronto-ai/3difftection/

3️⃣Code: Coming 🔜

🎥 Be sure to watch the attached Demo Video-Sound on 🔊🔊

🎵 Music by Umasha Pros from @pixabay

Find this Valuable 💎 ?

♻️QT and teach your network something new

Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#CVPR2024 /search?q=#3dobjectdetection


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,695
Reputation
8,224
Daps
157,215

1/1
🚨CVPR 2024 Paper Alert 🚨

➡️Paper Title: The Manga Whisperer: Automatically Generating Transcriptions for Comics

🌟Few pointers from the paper

🎯In the past few decades, Japanese comics, commonly referred to as Manga, have transcended both cultural and linguistic boundaries to become a true worldwide sensation. Yet, the inherent reliance on visual cues and illustration within manga renders it largely inaccessible to individuals with visual impairments.

🎯 Authors of this paper seek to address this substantial barrier, with the aim of ensuring that manga can be appreciated and actively engaged by everyone. Specifically, they tackled the problem of diarisation i.e. generating a transcription of who said what and when, in a fully automatic way.

🎯To this end, they made the following contributions:
➕They presented a unified model, “Magi”, that is able to
(a) detect panels, text boxes and character boxes,
(b) cluster characters by identity (without knowing the number of clusters apriori), and
(c) associate dialogues to their speakers.
➕Authors have also proposed a novel approach that is able to sort the detected text boxes in their reading order and generate a dialogue transcript;
➕They annotated an evaluation benchmark for this task using publicly available [English] manga pages.

🏢Organization: Visual Geometry Group, Dept. of Engineering Science, University of Oxford (@Oxford_VGG )

🧙Paper Authors: @RagavSachdeva , Andrew Zisserman

1️⃣Read the Full Paper here: [2401.10224] The Manga Whisperer: Automatically Generating Transcriptions for Comics

2⃣️Code: GitHub - ragavsachdeva/magi: Generate a transcript for your favourite Manga: Detect manga characters, text blocks and panels. Order panels. Cluster characters. Match texts to their speakers. Perform OCR. (CVPR'24)

3️⃣Try Here: Magi Demo - a Hugging Face Space by ragavsachdeva

🎥 Be sure to watch the attached Demo Video-Sound on 🔊🔊

🎵 Music by Vlad Krotov from @pixabay

Find this Valuable 💎 ?

♻️QT and teach your network something new

Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#CVPR2024 /search?q=#manga


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,695
Reputation
8,224
Daps
157,215

1/1
🚨CVPR 2024 Paper Alert 🚨

➡️Paper Title: MaGGIe: Masked Guided Gradual Human Instance Matting

🌟Few pointers from the paper

🎯Human matting is a foundation task in image and video processing where human foreground pixels are extracted from the input. Prior works either improve the accuracy by additional guidance or improve the temporal consistency of a single instance across frames.

🌋In this paper authors have proposed a new framework “MaGGIe, Masked Guided Gradual Human Instance Matting”, which predicts alpha mattes progressively for each human instances while maintaining the computational cost, precision, and consistency.

🎯Their method leverages modern architectures, including transformer attention and sparse
convolution, to output all instance mattes simultaneously without exploding memory and latency.

🎯Although keeping constant inference costs in the multiple-instance scenario, their framework achieves robust and versatile performance on their proposed synthesized benchmark.

🎯With the higher quality image and video matting benchmarks, the novel multi-instance synthesis approach from publicly available sources is introduced to increase the generalization of models in real-world scenarios.

🏢Organization: @UofMaryland , College Park, @AdobeResearch

🧙Paper Authors: @RyanHuynh1108 , Seoung Wug Oh, @abhi2610 , Joon-Young Lee

1️⃣Read the Full Paper here: [2404.16035] MaGGIe: Masked Guided Gradual Human Instance Matting

2️⃣Project Page: CVPR'24 - MaGGIe

3️⃣Code: GitHub - hmchuong/MaGGIe: [CVPR24] MaGGIe: Mask Guided Gradual Human Instance Matting

4️⃣Dataset: chuonghm/MaGGIe-HIM · Datasets at Hugging Face

🎥 Be sure to watch the attached Demo Video-Sound on 🔊🔊

🎵 Music by Jeremiah Alves from @pixabay

Find this Valuable 💎 ?

♻️QT and teach your network something new

Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,695
Reputation
8,224
Daps
157,215

1/1
🚨CVPR 2024 Paper Alert 🚨

➡️Paper Title: Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering

🌟Few pointers from the paper

🎯In this paper authors have presented “Paint-it”, a text-driven high-fidelity texture map synthesis method for 3D meshes via neural re-parameterized texture optimization. Paint-it synthesizes texture maps from a text description by synthesis-through-optimization, exploiting the Score-Distillation Sampling (SDS).

🎯Authors observed that directly applying SDS yields undesirable texture quality due to its noisy gradients. They revealed the importance of texture parameterization when using SDS.

🎯Specifically, they have proposed Deep Convolutional Physically-Based Rendering (DC-PBR) parameterization, which re-parameterized the physically-based rendering (PBR) texture maps with randomly initialized convolution-based neural kernels, instead of a standard pixel-based parameterization.

🎯They showed that DC-PBR inherently schedules the optimization curriculum according to texture frequency and naturally filters out the noisy signals from SDS.

🎯 In experiments, Paint-it obtains remarkable quality PBR texture maps within 15 min., given only a text description. Authors demonstrated the generalizability and practicality of Paint-it by synthesizing high-quality texture maps for large-scale mesh datasets and showing test-time applications such as relighting and material control using a popular graphics engine.

🏢Organization: @uni_tue , Tübingen AI Center, Germany, Max Planck Institute for Informatics, Germany, Dept. of Electrical Engineering,@postech2020 , Grad. School of AI, @postech2020 , Institute for Convergence Research and Education in Advanced Technology, @yonsei_u

🧙Paper Authors: @kim_youwang , @Tae_Hyun_Oh , @GerardPonsMoll1

1️⃣Read the Full Paper here: [2312.11360] Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering

2️⃣Project Page: Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering

3️⃣Code: GitHub - postech-ami/Paint-it: [CVPR'24] Official PyTorch Implementation of "Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering"

🎥 Be sure to watch the attached Demo Video-Sound on 🔊🔊

🎵 Music by Pavel Bekirov from @pixabay

Find this Valuable 💎 ?

♻️QT and teach your network something new

Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#CVPR2024


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,695
Reputation
8,224
Daps
157,215

1/1
🚨CVPR 2024 Best Paper Alert 🚨

➡️Paper Title: Rich Human Feedback for Text-to-Image Generation

🌟Few pointers from the paper

🎯Recent Text-to-Image (T2I) generation models such as Stable Diffusion and Imagen have made significant progress in generating high-resolution images based on text descriptions.

🎯However, many generated images still suffer from issues such as artifacts/implausibility, misalignment with text descriptions, and low aesthetic quality.

🎯 Inspired by the success of Reinforcement Learning with Human Feedback (RLHF) for large language models, prior works collected human-provided scores as feedback on generated images and trained a reward model to improve the T2I generation.

🎯In this paper, authors enriched the feedback signal by:
➕marking image regions that are implausible or misaligned with the text, and
➕annotating which words in the text prompt are misrepresented or missing on the image.

🎯Authors collected such rich human feedback on 18K generated images (RichHF-18K) and train a multimodal transformer to predict the rich feedback automatically.

🎯They showed that the predicted rich human feedback can be leveraged to improve image generation, for example, by selecting high-quality training data to finetune and improve the generative models, or by creating masks with predicted heatmaps to inpaint the problematic regions.

🎯Notably, the improvements generalize to models (Muse) beyond those used to generate the images on which human feedback data were collected (Stable Diffusion variants).

🏢Organization: @UCSanDiego , @Google Research, @USC , @Cambridge_Uni , @BrandeisU

🧙Paper Authors: Youwei Liang, Junfeng He, Gang Li, Peizhao Li, Arseniy Klimovskiy, Nicholas Carolan, Jiao Sun, Jordi Pont-Tuset, Sarah Young, Feng Yang, Junjie Ke, Krishnamurthy Dj Dvijotham, Katie Collins, Yiwen Luo, Yang Li, Kai J Kohlhoff, Deepak Ramachandran, Vidhya Navalpakkam

1️⃣Read the Full Paper here: [2312.10240] Rich Human Feedback for Text-to-Image Generation

2️⃣RichHF-18K dataset: GitHub - google-research-datasets/richhf-18k: RichHF-18K dataset contains rich human feedback labels we collected for our CVPR'24 paper: https://arxiv.org/pdf/2312.10240, along with the file name of the associated labeled images (no urls or images are included in this dataset).

🥳Heartfelt congratulations to all the talented authors! 🥳

🎥 Be sure to watch the attached Paper Demo Video-Sound on 🔊🔊

🎵 Music by Sergio Prosvirini from @pixabay

Find this Valuable 💎 ?

♻️QT and teach your network something new

Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#cvpr2024 /search?q=#bestpaper


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,695
Reputation
8,224
Daps
157,215

1/1
🚨CVPR 2024 Paper Alert 🚨

➡️Paper Title: RoMa: Robust Dense Feature Matching

🌟Few pointers from the paper

🎯Feature matching is an important computer vision task that involves estimating correspondences between two images of a 3D scene, and dense methods estimate all such correspondences. The aim is to learn a robust model, i.e., a model able to match under challenging real-world changes

🎯In this paper authors have proposed a model which leveraged frozen pretrained features from the foundation model DINOv2. Although these features are significantly more robust than local features trained from scratch, they are inherently coarse.

🎯Therefore authors combined them with specialized ConvNet fine features, creating a precisely localizable feature pyramid. To further improve robustness, authors have proposed a tailored transformer match decoder that predicts anchor probabilities, which enables it to express multimodality.

🎯Finally, they have also proposed an improved loss formulation through regression-by-classification with subsequent robust regression. Authors conducted a comprehensive set of experiments that showed that their method, RoMa, achieves significant gains, setting a new state-of-the-art. In particular, they achieved a 36% improvement on the extremely challenging WxBS benchmark.

🏢Organization: @liu_universitet , East China University of Science and Technology, @chalmersuniv

🧙Paper Authors: Johan Edstedt, Qiyu Sun, Georg Bökman, Mårten Wadenbäck, Michael Felsberg

1️⃣Read the Full Paper here: [2305.15404] RoMa: Robust Dense Feature Matching

2️⃣Project Page: RoMa: Robust Dense Feature Matching

3️⃣Code: GitHub - Parskatt/RoMa: [CVPR 2024] RoMa: Robust Dense Feature Matching; RoMa is the robust dense feature matcher capable of estimating pixel-dense warps and reliable certainties for almost any image pair.

🎥 Be sure to watch the attached Demo Video-Sound on 🔊🔊

🎵 Music by Grand_Project from @pixabay

Find this Valuable 💎 ?

♻️QT and teach your network something new

Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#CVPR2024


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,695
Reputation
8,224
Daps
157,215

1/1
🚨CVPR 2024 Best Paper Alert 🚨

➡️Paper Title: Generative Image Dynamics

🌟Few pointers from the paper

🎯In this paper authors have presented an approach to modeling an image-space prior on scene motion. Their prior is learned from a collection of motion trajectories extracted from real video sequences depicting natural, oscillatory dynamics of objects such as trees, flowers, candles, and clothes swaying in the wind.

🎯They model dense, long-term motion in the Fourier domain as spectral volumes, which authors found were well-suited to prediction with diffusion models.

🎯Given a single image, their trained model uses a frequency-coordinated diffusion sampling process to predict a spectral volume, which can be converted into a motion texture that spans an entire video.

🎯 Along with an image-based rendering module, the predicted motion representation can be used for a number of downstream applications, such as turning still images into seamlessly looping videos, or allowing users to interact with objects in real images, producing realistic simulated dynamics (by interpreting the spectral volumes as image-space modal bases).

🏢Organization: @Google Research

🧙Paper Authors: @zhengqi_li , Richard Tucker, Noah Snavely, Aleksander Holynski

1️⃣Read the Full Paper here: [2309.07906] Generative Image Dynamics

2️⃣Project Page: Generative Image Dynamics

🥳Heartfelt congratulations to all the talented authors! 🥳

🎥 Be sure to watch the attached Demo Video-Sound on 🔊🔊

🎵 Music by Bohdan Kuzmin from @pixabay

Find this Valuable 💎 ?

♻️QT and teach your network something new

Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,695
Reputation
8,224
Daps
157,215

1/1
🚨CVPR 2024 Best Paper Runners-Up Alert 🚨

➡️Paper Title: pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

🌟Few pointers from the paper

🎯In this paper authors have introduced “pixelSplat”, a feed-forward model that learns to reconstruct 3D radiance fields parameterized by 3D Gaussian primitives from pairs of images.

🎯Their model features real-time and memory-efficient rendering for scalable training as well as fast 3D reconstruction at inference time.

🎯To overcome local minima inherent to sparse and locally supported representations, authors predict a dense probability distribution over 3D and sample Gaussian means from that
probability distribution.

🎯They make this sampling operation differentiable via a reparameterization trick, allowing them to back-propagate gradients through the Gaussian splatting representation.

🎯They benchmark their method on wide-baseline novel view synthesis on the real-world RealEstate10k and ACID datasets, where they outperform state-of-the-art light field transformers and accelerate rendering by 2.5 orders of magnitude while reconstructing an interpretable and editable 3D radiance field.

🏢Organization: @MIT , @SFU , @UofT

🧙Paper Authors: @DavidCharatan , @sizhe_lester_li , @taiyasaki ,@vincesitzmann

1️⃣Read the Full Paper here: [2312.12337] pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

2️⃣Project Page: pixelSplat: 3D Gaussian Splats from Image Pairs

3️⃣Code: GitHub - dcharatan/pixelsplat: [CVPR 2024 Oral, Best Paper Runner-Up] Code for "pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction" by David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann

4️⃣Pre-trained Models: checkpoints – Google Drive

🥳Heartfelt congratulations to all the talented authors! 🥳

🎥 Be sure to watch the attached Demo Video-Sound on 🔊🔊

Find this Valuable 💎 ?

♻️QT and teach your network something new

Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#CVPR2024


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,695
Reputation
8,224
Daps
157,215

Tool preventing AI mimicry cracked; artists wonder what’s next​


Artists must wait weeks for Glaze defense against AI scraping amid TOS updates.​

ASHLEY BELANGER - 7/4/2024, 7:35 AM

Tool preventing AI mimicry cracked; artists wonder what’s next

Enlarge

Aurich Lawson | Getty Images

310

For many artists, it's a precarious time to post art online. AI image generators keep getting better at cheaply replicating a wider range of unique styles, and basically every popular platform is rushing to update user terms to seize permissions to scrape as much data as possible for AI training.

Defenses against AI training exist—like Glaze, a tool that adds a small amount of imperceptible-to-humans noise to images to stop image generators from copying artists' styles. But they don't provide a permanent solution at a time when tech companies appear determined to chase profits by building ever-more-sophisticated AI models that increasingly threaten to dilute artists' brands and replace them in the market.

In one high-profile example just last month, the estate of Ansel Adams condemned Adobe for selling AI images stealing the famous photographer's style, Smithsonian reported. Adobe quickly responded and removed the AI copycats. But it's not just famous artists who risk being ripped off, and lesser-known artists may struggle to prove AI models are referencing their works. In this largely lawless world, every image uploaded risks contributing to an artist's downfall, potentially watering down demand for their own work each time they promote new pieces online.

Unsurprisingly, artists have increasingly sought protections to diminish or dodge these AI risks. As tech companies update their products' terms—like when Meta suddenly announced that it was training AI on a billion Facebook and Instagram user photos last December—artists frantically survey the landscape for new defenses. That's why, counting among those offering scarce AI protections available today, The Glaze Project recently reported a dramatic surge in requests for its free tools.

Designed to help prevent style mimicry and even poison AI models to discourage data scraping without an artist's consent or compensation, The Glaze Project's tools are now in higher demand than ever. University of Chicago professor Ben Zhao, who created the tools, told Ars that the backlog for approving a "skyrocketing" number of requests for access is "bad." And as he recently posted on X (formerly Twitter), an "explosion in demand" in June is only likely to be sustained as AI threats continue to evolve. For the foreseeable future, that means artists searching for protections against AI will have to wait.

Even if Zhao's team did nothing but approve requests for WebGlaze, its invite-only web-based version of Glaze, "we probably still won't keep up," Zhao said. He has warned artists on X to expect delays.

Compounding artists' struggles, at the same time as demand for Glaze is spiking, the tool has come under attack by security researchers who claimed it was not only possible but easy to bypass Glaze's protections. For security researchers and some artists, this attack calls into question whether Glaze can truly protect artists in these embattled times. But for thousands of artists joining the Glaze queue, the long-term future looks so bleak that any promise of protections against mimicry seems worth the wait.

Attack cracking Glaze sparks debate​

Millions have downloaded Glaze already, and many artists are waiting weeks or even months for access to WebGlaze, mostly submitting requests for invites on social media. The Glaze Project vets every request to verify that each user is human and ensure bad actors don't abuse the tools, so the process can take a while.

The team is currently struggling to approve hundreds of requests submitted daily through direct messages on Instagram and Twitter in the order they are received, and artists requesting access must be patient through prolonged delays. Because these platforms' inboxes aren’t designed to sort messages easily, any artist who follows up on a request gets bumped to the back of the line—as their message bounces to the top of the inbox and Zhao's team, largely volunteers, continues approving requests from the bottom up.

"This is obviously a problem," Zhao wrote on X while discouraging artists from sending any follow-ups unless they've already gotten an invite. "We might have to change the way we do invites and rethink the future of WebGlaze to keep it sustainable enough to support a large and growing user base."

Glaze interest is likely also spiking due to word of mouth. Reid Southen, a freelance concept artist for major movies, is advocating for all artists to use Glaze. Reid told Ars that WebGlaze is especially "nice" because it's "available for free for people who don't have the GPU power to run the program on their home machine."

"I would highly recommend artists use Glaze to protect their images," Southen told Ars. "There aren't many viable ways right now for artists to protect themselves from unauthorized scraping and training off their images and still keep their work online. Glaze is a great option, especially because it works on the pixel level, and the image can be uploaded anywhere."

But just as Glaze's userbase is spiking, a bigger priority for the Glaze Project has emerged: protecting users from attacks disabling Glaze's protections—including attack methods exposed in June by online security researchers in Zurich, Switzerland. In a paper published on Arxiv.org without peer review, the Zurich researchers, including Google DeepMind research scientist Nicholas Carlini, claimed that Glaze's protections could be "easily bypassed, leaving artists vulnerable to style mimicry."

Very quickly after the attack methods were exposed, Zhao's team responded by releasing an update that Zhao told Ars "doesn't completely address" the attack but makes it "much harder."

Tension then escalated after the Zurich team claimed that The Glaze Project's solution "missed the mark" and gave Glaze users a "false sense of security."

Another researcher on the Zurich team, Robert Hönig, told Ars that Glaze's protections are "not that bad," but an "art thief has all the time in the world to wait for some attack that will eventually break it," which he said puts any artist posting work online at a "disadvantage." In a blog post, Carlini wrote that the Glaze Project has a "noble goal," but "the damage has already been done for everyone who published their images with the first version of the defense," because once an artwork has been posted online, that version will inevitably remain available in an archive somewhere.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,695
Reputation
8,224
Daps
157,215
On the Glaze about page, Zhao's team makes clear that "Glaze is not a permanent solution against AI mimicry," as it relies on techniques that can always "be overcome by a future algorithm" and "possibly" render "previously protected art vulnerable."

Zhao told Ars that the impact of Hönig's team's attack appeared limited because it mostly targeted a prior version of Glaze, although Hönig told Ars that the newest version doesn't protect against every known robust mimicry attack detailed in his team's paper.

Zhao's team has since confirmed that updates will be posted to their website as they conduct further tests reimplementing Hönig's team's attack, with the expectation that any findings will be used to further strengthen Glaze protections against mimicry.

While both sides agree that Glaze's most recent update (v. 2.1) offers some protection for artists, they fundamentally disagree over how to best protect artists from looming threats of AI style mimicry. A debate has been sparked on social media, with one side arguing that artists urgently need tools like Glaze until more legal protections exist and the other insisting that these uncertain times call for artists to stop posting any work online if they don't want it to be copied by tomorrow's best image generator.

How Glaze protects artists​

For artists who have no choice but to continue promoting work online, tools like Glaze can feel indispensable.

Recent Statista data showed that online art sales in 2023 reached nearly $12 billion, roughly double what artists made selling art online in 2019. Nearly a fifth of all art sales globally happened online last year, Statista reported, and competition online will likely only increase as big-budget tech companies heavily promote any eye-popping leaps in the quality of AI image generators' outputs.

Tools that help prevent style mimicry—like Glaze, Mist, and AntiDreamBooth—give artists a way to wall off their works from AI without losing visibility online.

Glaze works by making small changes to images that distort what the AI sees—essentially tricking the AI models into seeing something "quite different" from what the artist created, Glaze's about page says, which helps prevent tools from copying artists' unique styles.

"At a high level, Glaze works by understanding the AI models that are training on human art and using machine learning algorithms, computing a set of minimal changes to artworks, such that it appears unchanged to human eyes but appears to AI models like a dramatically different art style," The Glaze Project webpage explains.

The Glaze Project also created Nightshade, which "can help deter model trainers who disregard copyrights, opt-out lists, and do-not-scrape/robots.txt directives" by transforming images into "poison" samples that scramble AI models. Instead of training on images without an artist's consent, AI models "learn unpredictable behaviors that deviate from expected norms." An example The Glaze Project gives is a tool fielding "a prompt that asks for an image of a cow flying in space" that might instead generate "an image of a handbag floating in space."

Zhao told Ars that demand for the tools is spiking globally, with users "in just about every country from South Africa to Indonesia to Norway" and more.

These surges often occur after artists have "a negative reaction to some of the new policy announcements by tech companies," Zhao said. After Meta updated Instagram's policy, for example, there was an exodus of artists to Cara, a social media and portfolio platform where artists can apply Glaze when uploading works online.

In addition to using Cara, Zhao's team recommends that artists use Glaze and Nightshade together. In the future, he hopes to integrate the tools so that they can be applied through a single process. But integrating the tools has proven more challenging than anticipated, Zhao told Ars, since the tools somewhat step on each other's toes; both want to use the same pixels to "accomplish their slightly different goals."

With demand spiking for tools and other competing research priorities to attend to—including studying how easy or hard it is to identify if images are AI-generated today—Zhao's lab is currently overextended, and integrating the tools is a low priority. While his team works through Glaze invite requests and runs more tests on the most recent attack, they're also trying to figure out how to extend protections to videos.

For Zhao, the priority remains protecting as many artists as possible right now. In addition to responding to the attack, the most recent Glaze update includes a version for Intel Macs, expanding protections to more systems after Zhao said that "a bunch" of "unhappy" Mac users complained that they didn't have access to Glaze 2.0.

"There's always some sort of intermixing of priorities," Zhao told Ars. "There are certain things on the tool side—for example, like this attack—that we have to always manage because it's sort of the promise we have made [to Glaze users]. But in addition to that, we also want to add new tools and add new protective measures to try to change the landscape of how we are dealing with AI and unauthorized training."

Southen told Ars that he has been impressed by the Glaze team's improvements to its tools.

"The very nature of machine learning and adversarial development means that no solution is likely to hold forever, which is why it's great that the Glaze team is on top of current developments and always testing and tuning things to better protect artists' work as we push for things like legislation, regulation, and, of course, litigation," Southen said.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,695
Reputation
8,224
Daps
157,215

How does the Glaze attack work?​

Before Hönig's team published their attack, they alerted Zhao's team to their findings, which provided an opportunity to study the attack and update Glaze. On his blog, Carlini explained that because none of the previously glazed images using earlier versions of the tool could be patched, his team decided not to wait for the Glaze update before posting details on how to execute the attack because it was "strictly better to publish the attack" on Glaze "as early as possible" to warn artists of the potential vulnerability.

Hönig told Ars that breaking Glaze was "simple." His team found that "low-effort and 'off-the-shelf' techniques"—such as image upscaling, "using a different finetuning script" when training AI on new data, or "adding Gaussian noise to the images before training"—"are sufficient to create robust mimicry methods that significantly degrade existing protections."

Sometimes, these attack techniques must be combined, but Hönig's team warned that a motivated, well-resourced art forger might try a variety of methods to break protections like Glaze. Hönig said that thieves could also just download glazed art and wait for new techniques to come along, then quietly break protections while leaving no way for the artist to intervene, even if an attack is widely known. This is why his team discourages uploading any art you want protected online.

Ultimately, Hönig's team's attack works by simply removing the adversarial noise that Glaze adds to images, making it once again possible to train an AI model on the art. They described four methods of attack that they claim worked to remove mimicry protections provided by popular tools, including Glaze, Mist, and Anti-DreamBooth. Three were considered "more accessible" because they don't require technical expertise. The fourth was more complex, leveraging algorithms to detect protections and purify the image so that AI can train on it.

This wasn't the first time Glaze was attacked, but it struck some artists as the most concerning, with Hönig's team boasting an apparent ability to successfully employ tactics previously proven ineffective at disabling mimicry defenses.

The Glaze team responded by reimplementing the attack, using a different code than Hönig's team, then updating Glaze to be more resistant to the attack, as they understood it from their own implementation. But Hönig told Ars that his team still gets different results using their code, finding Glaze to be only moderately resistant to attacks targeting different styles. Carlini wrote that after testing Glaze 2.1 on "our own denoiser implementation," the team found that "most of the claims made in the Glaze update don’t hold at all," remaining "effective" against some styles, such as cartoon style.

Perhaps more troubling to Carlini, however, was that the Glaze Project only tested the strongest attack method documented in his team's paper, seemingly not addressing other techniques that could leave artists vulnerable.

"In fact, we show that Glaze can be bypassed to various extent by a multitude of methods, including by doing nothing at all," Carlini wrote.

According to Carlini, his team's key finding is that "simply using a different fine-tuning script than the Glaze authors already weakens Glaze’s protections significantly." After pushback, the Glaze team decided to run more tests reimplementing the attack using Carlini's team's code. Zhao confirmed that Glaze's website will be updated to reflect the results of those tests.

In the meantime, Carlini concluded, "Glaze likely provides some form of protection, in the sense that by using it, artists are probably not worse-off than by not using it."

"But such 'better than nothing' security is a very low bar," Carlini wrote. "This could easily mislead artists into a false sense of security and deter them from seeking alternative forms of protection, e.g., the use of other (also imperfect) tools such as watermarks, or private releases of new art styles to trusted customers."

Debating how to best protect artists from AI​

The Glaze Project has talked to a wide range of artists, including those "whose styles are intentionally copied," who not only "see loss in commissions and basic income" but suffer when "low quality synthetic copies scattered online dilute their brand and reputation," their website said.

Zhao told Ars that tools like Glaze and Nightshade provide a way for artists to fight the power imbalance between them and well-funded AI companies accused of stealing and copying their works. His team considers Glaze to be the "strongest tool for artists to protect against style mimicry," Glaze's website said, and to keep it that way, he promises to "work to improve its robustness, updating it as necessary to protect it against new attacks."

Part of preserving artist protections, The Glaze Project site explained, is protecting Glaze code, deciding not to open-source the code to "raise the bar for adaptive attacks." Zhao apparently declined to share the code with Carlini's team, explaining that "right now, there are quite literally many thousands of human artists globally who are dealing with ramifications of generative AI’s disruption to the industry, their livelihood, and their mental well-being… IMO, literally everything else takes a back seat compared to the protection of these artists.”

However, Carlini's team contends that The Glaze Project declining to share the code with security researchers makes artists more vulnerable because artists can then be blindsided by or even oblivious to evolving attacks that the Glaze team might not even be aware of.

"We don’t disagree in the slightest that we should be trying to help artists," Carlini and a co-author wrote in a blog post following the Glaze team's response. "But let’s be clear: the best way to help artists is not to pitch them a tool while refusing security analysis of that tool. If there are flaws in the approach, then we should discover them early so they can be fixed. And that’s easiest to do by openly studying the tool that’s being used."

Battle lines drawn, this tense debate seemingly got personal when Zhao claimed in a Discord chat screenshot taken by Carlini that "Carlini doesn't give a shyt" about potential harms to artists from publishing his team's attack. Carlini's team responded by calling Glaze's response to the attack "misleading."

The security researchers have demanded that Glaze update its post detailing vulnerabilities to artists, and The Glaze Project has promised that updates will follow testing being conducted while the team juggles requests for invites and ongoing research priorities.

Artists still motivated to support Glaze​

Yet for some artists waiting for access to Glaze, the question isn't whether the tool is worth the wait; it's whether The Glaze Project can sustain the project on limited funding. Zhao told Ars that as requests for invites spike, his team has "been getting a lot of unsolicited emails about wanting to donate to Glaze."

The Glaze Project is funded by research grants and donations from various organizations, including the National Science Foundation, DARPA, Amazon AWS, and C3.ai. The team's goal is not to profit off the tools but to "make a strong impact" defending artists who "generally barely make a living" against looming generative AI threats potentially capable of "destroying the human artist community."

"We are not interested in profit," the project's website says. "There is no business model, no subscription, no hidden fees, no startup. We made Glaze free for anyone to use."

While a gift link will soon be created, Zhao insisted that artists should not direct limited funds to researchers who can always write grants or seek funding from better-resourced donors. Zhao said that he has been asked by so many artists where they can donate to support the project that he has come up with a standard reply.

"If you're an artist, you should keep your money," Zhao said.

Southen, who recently gave a talk at the Conference on Computer Vision and Pattern Recognition "about how machine learning researchers and developers can better interface with artists and respect our work and needs," hopes to see more tools like Glaze introduced, as well as "more ethical" AI tools that "artists would actually be happy to use that respect people's property and process."

"I think there are a lot of useful applications for AI in art that don't need to be generative in nature and don't have to violate people's rights or displace them, and it would be great to see developers lean in to helping and protecting artists rather than displacing and devaluing us," Southen told Ars.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,695
Reputation
8,224
Daps
157,215

China's AI model glut is a 'significant waste of resources' due to scarce real-world applications for 100+ LLMs says Baidu CEO​

News

By Jowi Morales

published yesterday

Business is unsustainable for the 100+ LLMs competing in China.


Comments (7)

Baidu

(Image credit: Shutterstock)

Robin Li Yanhong, the founder and CEO of Baidu, the biggest search engine in China, said that the country has too many large language models and too few practical applications. Yanhong made this announcement during a recent panel discussion at the World Artificial Intelligence Conference (WAIC) held in Shanghai, as covered by the South China Morning Post.

"In 2023, intense competition among over 100 LLMs has emerged in China, resulting in a significant waste of resources, particularly computing power," said Li. "I've noticed that many people still primarily focus on foundational models. But I want to ask: How about real-world applications? Who has benefitted from them?"

The World Intellectual Property Organization (WIPO) reported last Friday that China has outpaced the U.S. in AI patents six-to-one in the past ten years. However, WIPO data also showed that the country is falling behind in terms of citations, with the China Academy of Sciences the only one from China in the list of top 20 organizations with the most research citations.

Publicly available LLMs in China need to go through regulatory approval in China, to ensure that the Chinese Communist Party (CCP) can effectively control the Chinese people. Over 200 AI firms have applied for a license as of March 2024, with 117 getting a nod from Beijing. Having this many LLMs means that they're all fighting for a slice of the pie, and not everyone will win. Yan Junjjie, CEO of AI startup MiniMax, said that he "expects major industry consolidation in the future, with LLMs being primarily developed by just five companies."

Many large firms have started to rush in to capture the market that OpenAI will leave when its API is no longer accessible in China, from July 9. The largest tech companies — Tencent, Baidu, and Alibaba — have started offering discounts and plans to entice customers to take their products, something that smaller companies might not be able to sustain.

While competition is good for any market, too many options could also lead to decision fatigue for customers, who are simply overwhelmed by the number of services priced at similar levels. Bigger companies would likely be the winner here, as they have larger war chests that they can use to either acquire smaller competitors or run them into the ground. As Bernard Leong, CEO of Singapore-based Dorje AI said, "There's probably going to be a bloodbath of the large language models and I suspect that there's probably going to be very few players left."
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,695
Reputation
8,224
Daps
157,215

Google claims new AI training tech is 13 times faster and 10 times more power efficient — DeepMind's new JEST optimizes training data for impressive gains​

Dallin Grimm

Sun, July 7, 2024 at 3:40 PM EDT·3 min read

 .

Credit:

Google DeepMind, Google's AI research lab, has published new research on training AI models that claims to greatly accelerate both training speed and energy efficiency by an order of magnitude, yielding 13 times more performance and ten times higher power efficiency than other methods. The new JEST training method comes in a timely fashion as conversations about the environmental impact of AI data centers are heating up.

DeepMind's method, dubbed JEST or joint example selection, breaks apart from traditional AI model training techniques in a simple fashion. Typical training methods focus on individual data points for training and learning, while JEST trains based on entire batches. The JEST method first creates a smaller AI model that will grade data quality from extremely high-quality sources, ranking the batches by quality. Then it compares that grading to a larger, lower-quality set. The small JEST model determines the batches most fit for training, and a large model is then trained from the findings of the smaller model.

The paper itself, available here, provides a more thorough explanation of the processes used in the study and the future of the research.

DeepMind researchers make it clear in their paper that this "ability to steer the data selection process towards the distribution of smaller, well-curated datasets" is essential to the success of the JEST method. Success is the correct word for this research; DeepMind claims that "our approach surpasses state-of-the-art models with up to 13× fewer iterations and 10× less computation."

Graphs displaying efficiency and speed gains over traditional AI training methods.

Graphs displaying efficiency and speed gains over traditional AI training methods.

Of course, this system relies entirely on the quality of its training data, as the bootstrapping technique falls apart without a human-curated data set of the highest possible quality. Nowhere is the mantra "garbage in, garbage out" truer than this method, which attempts to "skip ahead" in its training process. This makes the JEST method much more difficult for hobbyists or amateur AI developers to match than most others, as expert-level research skills are likely required to curate the initial highest-grade training data.

The JEST research comes not a moment too soon, as the tech industry and world governments are beginning discussions on artificial intelligence's extreme power demands. AI workloads took up about 4.3 GW in 2023, almost matching the annual power consumption of the nation of Cyprus. And things are definitely not slowing down: a single ChatGPT request costs 10x more than a Google search in power, and Arm's CEO estimates that AI will take up a quarter of the United States' power grid by 2030.

If and how JEST methods are adopted by major players in the AI space remains to be seen. GPT-4o reportedly cost $100 million to train, and future larger models may soon hit the billion-dollar mark, so firms are likely hunting for ways to save their wallets in this department. Hopefuls think that JEST methods will be used to keep current training productivity rates at much lower power draws, easing the costs of AI and helping the planet. However, much more likely is that the machine of capital will keep the pedal to the metal, using JEST methods to keep power draw at maximum for hyper-fast training output. Cost savings versus output scale, who will win?
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,695
Reputation
8,224
Daps
157,215

AI models that cost $1 billion to train are underway, $100 billion models coming — largest current models take 'only' $100 million to train: Anthropic CEO​

News

By Jowi Morales

published yesterday

AI training costs are growing exponentially year after year.


Comments (19)

Anthropic CEO Dario Amodei

(Image credit: Norges Bank Investment Management / YouTube)

Anthropic CEO Dario Amodei said in the In Good Company podcast that AI models in development today can cost up to $1 billion to train. Current models like ChatGPT-4o only cost about $100 million, but he expects the cost of training these models to go up to $10 or even $100 billion in as little as three years from now.

"Right now, 100 million. There are models in training today that are more like a billion." Amodei also added, "I think if we go to ten or a hundred billion, and I think that will happen in 2025, 2026, maybe 2027, and the algorithmic improvements continue a pace, and the chip improvements continue a pace, then I think there is in my mind a good chance that by that time we'll be able to get models that are better than most humans at most things."

The Anthropic CEO mentioned these numbers when he discussed the development of AI from generative artificial intelligence (like ChatGPT) to artificial general intelligence (AGI). He said that there wouldn't be a single point where we suddenly reach AGI. Instead, it would be a gradual development where models build upon the developments of past models, much like how a human child learns.

So, if AI models grow ten times more powerful each year, we can rationally expect the hardware required to train them to be at least ten times more powerful, too. As such, hardware could be the biggest cost driver in AI training. Back in 2023, it was reported that ChatGPT would require more than 30,000 GPUs, with Sam Altman confirming that ChatGPT-4 cost $100 million to train.

Last year, over 3.8 million GPUs were delivered to data centers. With Nvidia's latest B200 AI chip costing around $30,000 to $40,000, we can surmise that Dario's billion-dollar estimate is on track for 2024. If advancements in model/quantization research grow at the current exponential rate, then we expect hardware requirements to keep pace unless more efficient technologies like the Sohu AI chip become more prevalent.

We can already see this exponential growth happening. Elon Musk wants to purchase 300,000 B200 AI chips, while OpenAI and Microsoft are reportedly planning a $100 billion AI data center. With all this demand, we could see GPU data center deliveries next year balloon to 38 million if Nvidia and other suppliers can keep up with the market.

However, aside from the supply of the actual chip hardware, these AI firms need to be concerned with power supply and related infrastructure, too. The total estimated power consumption of all data center GPUs sold just last year could power 1.3 million homes. If the data center power requirements continue to grow exponentially, then it's possible that we could run out of enough economically-priced electricity. Furthermore, while these data centers need power plants, they also need an entirely upgraded grid that can handle all the electrons the power-hungry AI chips need to run. For this reason, many tech companies, including Microsoft, are now considering modular nuclear power for their data centers.

Artificial intelligence is quickly gathering steam, and hardware innovations seem to be keeping up. So, Anthropic's $100 billion estimate seems to be on track, especially if manufacturers like Nvidia, AMD, and Intel can deliver. However, as our AI technologies perform exponentially better every new generation, one big question still remains: how will it affect the future of our society?
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,695
Reputation
8,224
Daps
157,215


1/11
Luma's start and end keyframes are a game changer. With a sequence of keyframes from the original film, we can seamlessly remaster stop motion classics like "Jason and the Argonauts" as modern single-take action scenes.

2/11
It's interesting how the uncanny movements of the original stop motion skeletons are preserved in traditional frame interpolation. Maybe it's the lack of motion blur on the skeletons?

3/11
I really, truly hope this is sarcasm.

Because there is no way a person with an even halfway functioning brain would look at this and think it’s in any way serviceable.

4/11
I thought it was obvious!

But I guess I haven't posted much in year(s), so people aren't familiar with my typical humor.

5/11
Hi Jonathan this looks like shyt thanks for sharing 👍

6/11
wow this looks like fukking garbage

7/11
Was the idea to make it actively worse?

8/11
seamlessly?

9/11
"remaster"
"Game changer"

10/11
Never do that again.

11/11
this is ass


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GR6YltZbUAACgur.png
 
Top