bnew

Veteran
Joined
Nov 1, 2015
Messages
57,369
Reputation
8,499
Daps
160,091

Computer Science > Computation and Language

[Submitted on 5 Oct 2023]

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, Christopher Potts
The ML community is rapidly exploring techniques for prompting language models (LMs) and for stacking them into pipelines that solve complex tasks. Unfortunately, existing LM pipelines are typically implemented using hard-coded "prompt templates", i.e. lengthy strings discovered via trial and error. Toward a more systematic approach for developing and optimizing LM pipelines, we introduce DSPy, a programming model that abstracts LM pipelines as text transformation graphs, i.e. imperative computational graphs where LMs are invoked through declarative modules. DSPy modules are parameterized, meaning they can learn (by creating and collecting demonstrations) how to apply compositions of prompting, finetuning, augmentation, and reasoning techniques. We design a compiler that will optimize any DSPy pipeline to maximize a given metric. We conduct two case studies, showing that succinct DSPy programs can express and optimize sophisticated LM pipelines that reason about math word problems, tackle multi-hop retrieval, answer complex questions, and control agent loops. Within minutes of compiling, a few lines of DSPy allow GPT-3.5 and llama2-13b-chat to self-bootstrap pipelines that outperform standard few-shot prompting (generally by over 25% and 65%, respectively) and pipelines with expert-created demonstrations (by up to 5-46% and 16-40%, respectively). On top of that, DSPy programs compiled to open and relatively small LMs like 770M-parameter T5 and llama2-13b-chat are competitive with approaches that rely on expert-written prompt chains for proprietary GPT-3.5. DSPy is available at this https URL
Subjects:Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:arXiv:2310.03714 [cs.CL]
(or arXiv:2310.03714v1 [cs.CL] for this version)
[2310.03714] DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
Focus to learn more

Submission history

From: Omar Khattab [view email]
[v1] Thu, 5 Oct 2023 17:37:25 UTC (77 KB)





[CL] DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
O Khattab, A Singhvi, P Maheshwari, Z Zhang, K Santhanam, S Vardhamanan… [Stanford University & UC Berkeley & Amazon Alexa AI] (2023)
arxiv.org/abs/2310.03714

- Introduces DSPy, a new programming model for designing AI systems using pipelines of pretrained language models (LMs) and other tools

- DSPy contributes three main abstractions: signatures, modules, and teleprompters

- Signatures abstract the input/output behavior of a module using natural language typed declarations

- Modules replace hand-prompting techniques and can be composed into pipelines

- Teleprompters are optimizers that improve modules via prompting or finetuning

- Case studies on math word problems and multi-hop QA show DSPy programs outperform hand-crafted prompts

- With DSPy, small LMs like Llama2-13b-chat can be competitive with large proprietary LMs using expert prompts

- DSPy offers a systematic way to explore complex LM pipelines without extensive prompt engineering
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,369
Reputation
8,499
Daps
160,091

About​

Official implementation of "Separate Anything You Describe"

audio-agi.github.io/Separate-Anything-Y

Separate Anything You Describe

arXiv GitHub Stars githubio Open In Colab Hugging Face Spaces Replicate

This repository contains the official implementation of "Separate Anything You Describe".

We introduce AudioSep, a foundation model for open-domain sound separation with natural language queries. AudioSep demonstrates strong separation performance and impressive zero-shot generalization ability on numerous tasks such as audio event separation, musical instrument separation, and speech enhancement. Check the separated audio examples in the Demo Page!



DEMO:

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,369
Reputation
8,499
Daps
160,091








ⓍTTS is a Voice generation model that lets you clone voices into different languages by using just a quick 3-second audio clip. Built on Tortoise, ⓍTTS has important model changes that make cross-language voice cloning and multi-lingual speech generation super easy. There is no need for an excessive amount of training data that spans countless hours.

This is the same model that powers Coqui Studio, and Coqui API, however we apply a few tricks to make it faster and support streaming inference.

Features​

  • Supports 14 languages.
  • Voice cloning with just a 3-second audio clip.
  • Emotion and style transfer by cloning.
  • Cross-language voice cloning.
  • Multi-lingual speech generation.
  • 24khz sampling rate.

Languages​

As of now, XTTS-v1 (v1.1) supports 14 languages: English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese, and Japanese.

Stay tuned as we continue to add support for more languages. If you have any language requests, please feel free to reach out!

Code​

The current implementation only supports inference.




 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,369
Reputation
8,499
Daps
160,091


Research & Development

Chiba researchers simplify generation of 3D holographic displays...​

19 Oct 2023

...while a team at Tohoku manipulates the behavior of light “as if it were under the influence of gravity”.


Chiba approach uses neural networks to transform 2D images into 3D holograms.
Chiba approach uses neural networks to transform 2D images into 3D holograms.

Holograms have long held the promise of offering immersive 3D experiences, but the challenges involved in generating them have limited their widespread use. Leveraging recent developments in deep learning, researchers from Chiba University, Japan, have now developed what they describe as a “game-changing” approach that utilizes neural networks to transform 2D color images into 3D holograms.


This approach can simplify 3D hologram generation and can find applications in numerous fields, including healthcare and entertainment. Holograms that offer a 3D view of objects provide a level of detail that is unattainable by regular 2D images.

Holograms, which offer enormous potential for medical imaging, manufacturing, and virtual reality, are traditionally constructed by recording the 3Ddata of an object and the interactions of light with the object. However, this technique is computationally highly intensive as it requires the use of a special camera to capture the 3D images. This makes the generation of holograms challenging and limits their widespread use.

Deep-learning methods have also been proposed for generating holograms. These can create holograms directly from the 3D data captured using RGB-D cameras that capture both color and depth information of an object. This approach circumvents many computational challenges associated with the conventional method and represents an easier approach for generating holograms.

The Chiba researchers led by Professor Tomoyoshi Shimobaba of the Graduate School of Engineering, propose a novel approach based on deep learning that further streamlines hologram generation by producing 3D images directly from regular 2D color images captured using ordinary cameras. Yoshiyuki Ishii and Tomoyoshi Ito of the Graduate School of Engineering, Chiba University were also a part of this study, published in Optics and Lasers in Engineering.

Prof. Shimobaba commented, “There are several problems in realizing holographic displays, including the acquisition of 3D data, the computational cost of holograms, and the transformation of hologram images to match the characteristics of a holographic display device. We undertook this study because we believe that deep learning has developed rapidly in recent years and has the potential to solve these problems.”

Three neural networks

The Chiba approach employs three deep neural networks (DNNs) to transform a regular 2D color image into data that can be used to display a 3D scene or object as a hologram. The first DNN makes use of a color image captured using a regular camera as the input and then predicts the associated depth map, providing information about the 3D structure of the image.

Both the original RGB image and the depth map created by the first DNN are then utilized by the second DNN to generate a hologram. Finally, the third DNN refines the hologram generated by the second DNN, making it suitable for display on different devices. The researchers found that the time taken by the proposed approach to process data and generate a hologram was superior to that of a state-of-the-art graphics processing unit.

Prof. Shimobaba added, “Another noteworthy benefit of our approach is that the reproduced image of the final hologram can represent a natural 3D reproduced image. Moreover, since depth information is not used during hologram generation, this approach is inexpensive and does not require 3D imaging devices such as RGB-D cameras after training.”

In the near future, this approach can find potential applications in heads-up and head- mounted displays for generating high-fidelity 3D displays. Likewise, it can revolutionize the generation of an in-vehicle holographic head-up display, which may be able to present the necessary information on people, roads, and signs to passengers in 3D.

Conceptual image of the distorted photonic crystal.
Conceptual image of the distorted photonic crystal.

Photonic crystals bend light ‘like gravity’


A collaborative group of researchers at Tohoku University, Japan, has manipulated the behavior of light as if it were under the influence of gravity. The findings, which were published in Physical Review A, have significant implications for the world of optics and materials science, for example in the development of “6G” communications.

Einstein’s theory of relativity has long established that the trajectory of electromagnetic waves – including light and terahertz electromagnetic waves – can be deflected by gravitational fields. Scientists have recently theoretically predicted that replicating the effects of gravity - i.e., pseudogravity - is possible by deforming crystals in the lower normalized energy (or frequency) region.
“We set out to explore whether lattice distortion in photonic crystals can produce pseudogravity effects,” said Professor Kyoko Kitamura from Tohoku University’s Graduate School of Engineering.

Photonic crystals possess certain properties that enable scientists to manipulate and control the behavior of light, serving as traffic controllers for light within crystals. They are constructed by periodically arranging two or more different materials with varying abilities to interact with and slow down light in a regular, repeating pattern. Furthermore, pseudogravity effects due to adiabatic changes have been observed in photonic crystals.

Kitamura and her colleagues modified photonic crystals by introducing lattice distortion: gradual deformation of the regular spacing of elements, which disrupted the grid-like pattern of protonic crystals. This manipulated the photonic band structure of the crystals, resulting in a curved beam trajectory in-medium. Specifically, they employed a silicon distorted photonic crystal with a primal lattice constant of 200 micrometers and terahertz waves. Experiments successfully demonstrated the deflection of these waves.
“Much like gravity bends the trajectory of objects, we came up with a means to bend light within certain materials,” said Kitamura. Associate Professor Masayuki Fujita from Osaka University added, “Such in-plane beam steering within the terahertz range could be harnessed in 6G communication. Academically, the findings show that photonic crystals could harness gravitational effects, opening new pathways within the field of graviton physics.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,369
Reputation
8,499
Daps
160,091

Idea2Img
Idea2Img

Iterative Self-Refinement with GPT-4V(ision)
for Automatic Image Design and Generation​

Zhengyuan Yang, Jianfeng Wang, Linjie Li, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang

Microsoft Azure AI
arXiv

Built upon GPT-4V(ision), Idea2Img is a multimodal iterative self-refinement system that enhances any T2I model for automatic image design and generation, enabling various new image creation functionalities togther with better visual qualities. Click for zooming up.

"IDEA," "T2I," and "Idea2Img" are the input, baseline, and our results, respectively.​

Abstract​

We introduce “Idea to Image”, a system that enables multimodal iterative self-refinement with GPT-4V(ision) for automatic image design and generation. Humans can quickly identify the characteristics of different text-to-image (T2I) models via iterative explorations. This enables them to efficiently convert their high-level generation ideas into effective T2I prompts that can produce good images. We investigate if systems based on large multimodal models (LMMs) can develop analogous multimodal self-refinement abilities that enable exploring unknown models or environments via self-refining tries. Idea2Img cyclically generates revised T2I prompts to synthesize draft images, and provides directional feedback for prompt revision, both conditioned on its memory of the probed T2I model’s characteristics. The iterative self-refinement brings Idea2Img various advantages over base T2I models. Notably, Idea2Img can process input ideas with interleaved image-text sequences, follow ideas with design instructions, and generate images of better semantic and visual qualities. The user preference study validates the efficacy of multimodal iterative self-refinement on automatic image design and generation.

Idea2Img Design​

Idea2Img involves an LMM, GPT-4V(ision), interacting with a T2I model to probe its usage for automatic image design and generation. Idea2Img takes GPT-4V for improving, assessing, and verifying multimodal contents.
  1. Revised Prompt Generation (Improving): Idea2Img generates N text prompts that correspond to the input multimodal user IDEA, conditioned on the previous text feedback and refinement history.
  2. Draft Image Selection (Assessing): Idea2Img carefully compares N draft images for the same IDEA and select the most promising one.
  3. Feedback Reflection (Verifying): Idea2Img examines the discrepancy between the draft image and the IDEA. Idea2Img then provides feedback on what is incorrect, the plausible causes, and how T2I prompts may be revised to obtain a better image.

Idea2Img framework enables LMMs to mimic humanlike exploration to use a T2I model, enabling the design and generation of an imagined image specified as a multimodal input IDEA.​

Idea2Img's Execution Flow​

We overview of the Idea2Img’s full execution flow blow. More details can be found in our paper.

Idea2Img applies LMMs functioning in different roles to refine the T2I prompts. Specifically, they will (1) generate and revise text prompts for the T2I model, (2) select the best draft images, and (3) provide feedback on the errors and revision directions. Idea2Img is enhanced with a memory module that stores all prompt exploration histories, including previous draft images, text prompts, and feedback.

Flow chart of Idea2Img’s full execution flow.​

Generation Results​



Click each panel below for the zoomed in view.​




GPT-4V(ision) Outputs​



Click each panel below for the zoomed in view.

From left to right, for GPT-4V Feedback Reflection (Left), Revised Prompt Generation (Center), and Draft Image selection (Right).​




 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,369
Reputation
8,499
Daps
160,091



qgZtnvU.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,369
Reputation
8,499
Daps
160,091


Banner

📐 The 🤗 Open ASR Leaderboard ranks and evaluates speech recognition models on the Hugging Face Hub.

We report the Average WER (⬇️) and RTF (⬇️) - lower the better. Models are ranked based on their Average WER, from lowest to highest. Check the 📈 Metrics tab to understand how the models are evaluated.

If you want results for a model that is not listed here, you can submit a request for it to be included ✉️✨.

The leaderboard currently focuses on English speech recognition, and will be expanded to multilingual evaluation in later versions.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,369
Reputation
8,499
Daps
160,091

Researchers unveil ‘3D-GPT’, an AI that can generate 3D worlds from simple text commands​

Michael Nuñez@MichaelFNunez

October 20, 2023 7:11 PM

Credit: arxiv.org

Credit: arxiv.org

VentureBeat presents: AI Unleashed - An exclusive executive event for enterprise data leaders. Network and learn with industry peers. Learn More



Researchers from the Australian National University, the University of Oxford, and the Beijing Academy of Artificial Intelligence have developed a new AI system called “3D-GPT” that can generate 3D models simply from text-based descriptions provided by a user.

The system, described in a paper published on arXiv, offers a more efficient and intuitive way to create 3D assets compared to traditional 3D modeling workflows.


3D-GPT is able to “dissect procedural 3D modeling tasks into accessible segments and appoint the apt agent for each task,” according to the paper. It utilizes multiple AI agents that each focus on a different part of understanding the text prompt and executing modeling functions.
arxiv.org

credit: arxiv.org

“3D-GPT positions LLMs [large language models] as proficient problem solvers, dissecting the procedural 3D modeling tasks into accessible segments and appointing the apt agent for each task,” the researchers stated.


The key agents include a “task dispatch agent” that parses the text instructions, a “conceptualization agent” that adds details missing from the initial description, and a “modeling agent” that sets parameters and generates code to drive 3D software like Blender.

By breaking down the modeling process and assigning specialized AI agents, 3D-GPT is able to interpret text prompts, enhance the descriptions with extra detail, and ultimately generate 3D assets that match what the user envisioned.

“It enhances concise initial scene descriptions, evolving them into detailed forms while dynamically adapting the text based on subsequent instructions,” the paper explained.

credit: arxiv.org

The system was tested on prompts like “a misty spring morning, where dew-kissed flowers dot a lush meadow surrounded by budding trees.” 3D-GPT was able to generate complete 3D scenes with realistic graphics that accurately reflected elements described in the text.

While the quality of the graphics is not yet photorealistic, the early results suggest this agent-based approach shows promise for simplifying 3D content creation. The modular architecture could also allow each agent component to be improved independently.
“Our empirical investigations confirm that 3D-GPT not only interprets and executes instructions, delivering reliable results but also collaborates effectively with human designers,” the researchers wrote.

credit: arxiv.org

By generating code to control existing 3D software instead of building models from scratch, 3D-GPT provides a flexible foundation to build on as modeling techniques continue to advance.

The researchers conclude that their system “highlights the potential of LLMs in 3D modeling, offering a basic framework for future advancements in scene generation and animation.”

This research could revolutionize the 3D modeling industry, making the process more efficient and accessible. As we move further into the metaverse era, with 3D content creation serving as a catalyst, tools like 3D-GPT could prove invaluable to creators and decision-makers in a range of industries, from gaming and virtual reality to cinema and multimedia experiences.

The 3D-GPT framework is still in its early stages and has some limitations, but its development marks a significant step forward in AI-driven 3D modeling and opens up exciting possibilities for future advancements.




Abstract​

The significance of 3D asset modeling is undeniable in the metaverse era. Traditional methods for 3D modeling of realistic synthetic scenes involve the painstaking tasks of complex design, refinement, and client communication.
To reduce workload, we introduce 3D-GPT, a framework utilizing large language models (LLMs) for instruction-driven 3D modeling. In this context, 3D-GPT empowers LLMs as adept problem-solvers, breaking down the 3D modeling task into manageable segments and determining the appropriate agent for each.
3D-GPT comprises three pivotal agents: task dispatch agent, conceptualization agent, and modeling agent. Together, they collaboratively pursue two essential goals. First, it systematically enhances concise initial scene descriptions, evolving them into intricate forms while dynamically adapting the text based on subsequent instructions. Second, it seamlessly integrates procedural generation, extracting parameter values from enriched text to effortlessly interface with 3D software for asset creation.
We show that 3D-GPT provides trustworthy results and collaborate effectively with human designers. Furthermore, it seamlessly integrates with Blender, unlocking expanded manipulation possibilities. Our work underscores the vast potential of LLMs in 3D modeling, laying the groundwork for future advancements in scene generation and animation.


3D-GPT: PROCEDURAL 3D MODELING WITH LARGELANGUAGE MODELS


 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,369
Reputation
8,499
Daps
160,091






ARTIFICIAL INTELLIGENCE

This new data poisoning tool lets artists fight back against generative AI​

The tool, called Nightshade, messes up training data in ways that could cause serious damage to image-generating AI models.

By

October 23, 2023
poisoned fumes spread through a still life painting causing glitches

STEPHANIE ARNETT/MITTR | REIJKSMUSEUM, ENVATO

A new tool lets artists add invisible changes to the pixels in their art before they upload it online so that if it’s scraped into an AI training set, it can cause the resulting model to break in chaotic and unpredictable ways.

The tool, called Nightshade, is intended as a way to fight back against AI companies that use artists’ work to train their models without the creator’s permission. Using it to “poison” this training data could damage future iterations of image-generating AI models, such as DALL-E, Midjourney, and Stable Diffusion, by rendering some of their outputs useless—dogs become cats, cars become cows, and so forth. MIT Technology Review got an exclusive preview of the research, which has been submitted for peer review at computer security conference Usenix.

AI companies such as OpenAI, Meta, Google, and Stability AI are facing a slew of lawsuits from artists who claim that their copyrighted material and personal information was scraped without consent or compensation. Ben Zhao, a professor at the University of Chicago, who led the team that created Nightshade, says the hope is that it will help tip the power balance back from AI companies towards artists, by creating a powerful deterrent against disrespecting artists’ copyright and intellectual property. Meta, Google, Stability AI, and OpenAI did not respond to MIT Technology Review’s request for comment on how they might respond.

Zhao’s team also developed Glaze, a tool that allows artists to “mask” their own personal style to prevent it from being scraped by AI companies. It works in a similar way to Nightshade: by changing the pixels of images in subtle ways that are invisible to the human eye but manipulate machine-learning models to interpret the image as something different from what it actually shows.


The team intends to integrate Nightshade into Glaze, and artists can choose whether they want to use the data-poisoning tool or not. The team is also making Nightshade open source, which would allow others to tinker with it and make their own versions. The more people use it and make their own versions of it, the more powerful the tool becomes, Zhao says. The data sets for large AI models can consist of billions of images, so the more poisoned images can be scraped into the model, the more damage the technique will cause.

A targeted attack

Nightshade exploits a security vulnerability in generative AI models, one arising from the fact that they are trained on vast amounts of data—in this case, images that have been hoovered from the internet. Nightshade messes with those images.

Related Story​

wizard with sword confronts a dragon
This artist is dominating AI-generated art. And he’s not happy about it.


Greg Rutkowski is a more popular prompt than Picasso.

Artists who want to upload their work online but don’t want their images to be scraped by AI companies can upload them to Glaze and choose to mask it with an art style different from theirs. They can then also opt to use Nightshade. Once AI developers scrape the internet to get more data to tweak an existing AI model or build a new one, these poisoned samples make their way into the model’s data set and cause it to malfunction.

Poisoned data samples can manipulate models into learning, for example, that images of hats are cakes, and images of handbags are toasters. The poisoned data is very difficult to remove, as it requires tech companies to painstakingly find and delete each corrupted sample.

The researchers tested the attack on Stable Diffusion’s latest models and on an AI model they trained themselves from scratch. When they fed Stable Diffusion just 50 poisoned images of dogs and then prompted it to create images of dogs itself, the output started looking weird—creatures with too many limbs and cartoonish faces. With 300 poisoned samples, an attacker can manipulate Stable Diffusion to generate images of dogs to look like cats.
A table showing a grid of thumbnails of generated images of Hemlock attack-poisoned concepts from SD-XL models contrasted with images from the clean SD-XL model in increments of 50, 100, and 300 poisoned samples.

COURTESY OF THE RESEARCHERS

Generative AI models are excellent at making connections between words, which helps the poison spread. Nightshade infects not only the word “dog” but all similar concepts, such as “puppy,” “husky,” and “wolf.” The poison attack also works on tangentially related images. For example, if the model scraped a poisoned image for the prompt “fantasy art,” the prompts “dragon” and “a castle in The Lord of the Rings” would similarly be manipulated into something else.
a table contrasting the poisoned concept Fantasy art in the clean model and a poisoned model with the results of related prompts in clean and poisoned models, A painting by Michael Whelan, A dragon, and A castle in the Lord of the Rings

COURTESY OF THE RESEARCHERS

Zhao admits there is a risk that people might abuse the data poisoning technique for malicious uses. However, he says attackers would need thousands of poisoned samples to inflict real damage on larger, more powerful models, as they are trained on billions of data samples.
“We don’t yet know of robust defenses against these attacks. We haven’t yet seen poisoning attacks on modern [machine learning] models in the wild, but it could be just a matter of time,” says Vitaly Shmatikov, a professor at Cornell University who studies AI model security and was not involved in the research. “The time to work on defenses is now,” Shmatikov adds.

Gautam Kamath, an assistant professor at the University of Waterloo who researches data privacy and robustness in AI models and wasn’t involved in the study, says the work is “fantastic.”

The research shows that vulnerabilities “don’t magically go away for these new models, and in fact only become more serious,” Kamath says. “This is especially true as these models become more powerful and people place more trust in them, since the stakes only rise over time.”


A powerful deterrent

Junfeng Yang, a computer science professor at Columbia University, who has studied the security of deep-learning systems and wasn’t involved in the work, says Nightshade could have a big impact if it makes AI companies respect artists’ rights more—for example, by being more willing to pay out royalties.

AI companies that have developed generative text-to-image models, such as Stability AI and OpenAI, have offered to let artists opt out of having their images used to train future versions of the models. But artists say this is not enough. Eva Toorenent, an illustrator and artist who has used Glaze, says opt-out policies require artists to jump through hoops and still leave tech companies with all the power.

Toorenent hopes Nightshade will change the status quo.
“It is going to make [AI companies] think twice, because they have the possibility of destroying their entire model by taking our work without our consent,” she says.

Autumn Beverly, another artist, says tools like Nightshade and Glaze have given her the confidence to post her work online again. She previously removed it from the internet after discovering it had been scraped without her consent into the popular LAION image database.
“I’m just really grateful that we have a tool that can help return the power back to the artists for their own work,” she says.
 
Top