The A.I Megathread (LLM , GPT , Development)

bnew · Jul 13, 2023

Stability AI releases Stable Doodle, a sketch-to-image tool | TechCrunch

Stability has released a new tool through Clipdrop, the AI art platform it acquired recently, that turns sketches into artwork.

techcrunch.com

Stability AI releases Stable Doodle, a sketch-to-image tool

Kyle Wiggers @kyle_l_wiggers / 8:00 AM EDT•July 13, 2023
Comment

Image Credits: Stability AI

Stability AI, the startup behind the image-generating model Stable Diffusion, is launching a new service that turns sketches into images.

The sketch-to-image service, Stable Doodle, leverages the latest Stable Diffusion model to analyze the outline of a sketch and generate a “visually pleasing” artistic rendition of it. It’s available starting today through ClipDrop, a platform Stability acquired in March through its purchase of Init ML, an AI startup founded by ex-Googlers,

“Stable Doodle is geared toward both professionals and novices, regardless of their familiarity with AI tools,” Stability AI writes in a blog post shared with TechCrunch via email. “With Stable Doodle, anyone with basic drawing skills and online access can generate high-quality original images in seconds.”

There’s plenty of sketch-to-image AI tools out there, including open source projects and ad-supported apps. But Stable Doodle is unique in that it allows for more “precise” control over the image generation, Stability AI contends.

Under the hood, powering Stable Doodle is a Stable Diffusion model — Stable Diffusion XL — paired with a “conditional control solution” developed by one of Tencent’s R&D divisions, the Applied Research Center (ARC). Called T2I-Adapter, the control solution both allows Stable Diffusion XL to accept sketches as input and guides the model to enable better fine-tuning of the output artwork.

Image Credits: Stability AI

“T2I-Adapter enable Stable Doodle to understand the outlines of sketches and generate images based on prompts combined with the outlines defined by the model,” Stability AI explains in the blog post.

This writer didn’t have the opportunity to test Stable Doodle prior to its release. But the cherry-picked images Stability AI sent me looked quite good, at least in comparison to the doodle that inspired them.

In addition to a sketch, Stable Doodle accepts a prompt to guide the image generation process, such as “A comfy chair, ‘isometric’ style” or “Cat with a jeans jacket, ‘digital art’ style.” There’s a limit to the customization, though — at launch, Stable Doodle only supports 14 styles of art.

Stability AI envisions Stable Doodle serving as a tool for designers, illustrators and other professionals to “free up valuable time” and “maximize efficiency” in their work. At the same time, the company cautions that the quality of output images is dependent on the detail of the initial drawing and the descriptiveness of the prompt, as well as the complexity of the scene being depicted.

“Ideas drawn as sketches can be immediately implemented into works to create designs for clients, material for presentation decks and websites or even create logos,” the company proposes. “Moving forward, Stable Doodle will enable users to import a sketch. Further, we will include use cases for specific verticals, including real estate applications, for example.”

Image Credits: Stability AI

With tools like Stable Doodle, Stability AI is chasing after new sources of revenue following a lull in its commercial endeavors. (Stable Doodle is free, but subject to limits.) In April, Semafor reported that Stability AI was burning through cash, leading to an executive hunt to help ramp up sales.

Last month, Stability AI raised $25 million through a convertible note (i.e. debt that converts to equity), bringing its total raised to over $125 million. But it hasn’t closed new funding at a higher valuation. The startup was last valued at $1 billion; reportedly, Stability was seeking to quadruple that within the next few months.

Clipdrop - Sketch to image

Transform your doodles into real images in seconds

clipdrop.co

ClipDrop - Cleanup Pictures - Apps on Google Play

Edit pictures with AI

play.google.com

‎ClipDrop - Cleanup Pictures

‎Clipdrop is an app suite allowing you to easily modify your images with AI. - Incredibly accurate Background Removal - Remove objects, text, defects, or people from pictures - Relight your photos & drawings in seconds - Teleport anything, anywhere with AI - Generate images from text - Create...

apps.apple.com

bnew · Jul 13, 2023

Subscribe to read | Financial Times

News, analysis and comment from the Financial Times, the worldʼs leading global business publication

www.ft.com

Meta to release commercial AI model in effort to catch rivals

Microsoft-backed OpenAI and Google are surging ahead in Silicon Valley development race

Meta released its own language model to researchers and academics earlier this year, but the new version will be more widely available and customisable by companies © FT montage/Bloomberg/Dreamstime

Cristina Criddle and Madhumita Murgia in London, Hannah Murphy in San Francisco and Leila Abboud in Paris

Meta is poised to release a commercial version of its artificial intelligence model, allowing start-ups and businesses to build custom software on top of the technology.

The move will allow Meta to compete with Microsoft-backed OpenAI and Google, which are surging ahead in the race to develop generative AI. The software, which can create text, images and code, is powered by large language models (LLMs) that are trained on huge amounts of data and require vast computing power.

Meta released its own language model, known as LLaMA, to researchers and academics earlier this year, but the new version will be more widely available and customisable by companies, three people familiar with the plans said. The release is expected imminently, one of the people said.

Meta says its LLMs are “open-source”, by which it means details of the new model will be released publicly. This contrasts with the approach of competitors such as OpenAI, whose latest model GPT-4 is a so-called black box in which the data and code used to build the model are not available to third parties.

“The competitive landscape of AI is going to completely change in the coming months, in the coming weeks maybe, when there will be open source platforms that are actually as good as the ones that are not,” vice-president and chief AI scientist at Meta, Yann LeCun, said at a conference in Aix-en-Provence last Saturday.

Meta’s impending release comes as a race among Silicon Valley tech groups to establish themselves as dominant AI participants is heating up.

Writing in the Financial Times this week, Meta’s global affairs chief Nick Clegg extolled the virtues of an open source approach, saying “openness is the best antidote to the fears surrounding AI”. But the move also helps Meta in its attempts to catch up with rivals, as an open model would allow companies of all sizes to improve the technology and build applications on it.

Meta has been working on AI research and development for more than a decade but has appeared to be on the back foot after OpenAI’s ChatGPT, a conversational chatbot, was released in November, spurring other Big Tech groups to launch similar products.

“The goal is to diminish the current dominance of OpenAI,” said one person with knowledge of high-level strategy at Meta.

Meta declined to comment.

While Meta’s technology is open source and currently free, two people familiar with the matter said the company had been exploring charging enterprise customers for the ability to fine-tune the model to their needs by using their own proprietary data. One person said there were no current plans to charge and Meta would not do so in the upcoming release. Meta’s intention to release its AI model under a commercial licence was first reported by The Information.

Joelle Pineau, Meta’s vice-president of AI research, declined to comment on the development of a new AI model and how it might be monetised but said: “At the end of the day, because you release something [open source], you don’t completely give up on the intellectual property of that work.”

“We haven’t been shy about the fact that we do want to be using these models [in our] products,” she added.

In 2021, chief executive Mark Zuckerberg announced a pivot to build an avatar-filled digital world known as a metaverse and has spent more than $10bn a year on the project. That costly ambition has proven unpopular with investors and Meta has recently raced to increase its AI investment.

Earlier this year, the social networking giant set up a generative AI unit led by chief product officer Chris Cox. Pineau said Cox’s team straddled the research side of AI but also product development, as it was “creating totally new businesses”.

Zuckerberg and other executives have hinted at a push towards creating multiple AI chatbots for individuals, advertisers and businesses across Meta platforms Instagram, WhatsApp and Facebook, powered by its LLMs.

The benefit of open source models includes a higher take-up by users who then input more data for the AI to process. The more data an LLM has, the more powerful its capabilities can become.

Furthermore, open source models allow researchers and developers to spot and address bugs, improving the technology and security simultaneously — at a time when technology companies such as Meta have faced years of scrutiny over various privacy and misinformation scandals.

While providing software for free can seem antithetical to making money, experts believe corporations can also use this strategy to capture new markets.

“Meta realised they were behind on the current AI hype cycle, and this gives them a way to open up the ecosystem and seem like they are doing the right thing, being charitable and giving back to the community,” said one person familiar with the company’s thinking.

Still, there are clear risks with open source AI, which can be shaped and abused by bad actors. Child safety groups report a rise in child sexual abuse imagery generated by AI online, for instance.

Researchers also found that a previous Meta AI model, BlenderBot 2, released in 2021, was spreading misinformation. Meta said it made the BlenderBot 3 more resistant to this content, although users still found it generated false information.

There are also regulatory and legal risks concerning intellectual property and copyright. On Monday, comedian and actor Sarah Silverman filed a lawsuit against Meta and OpenAI over claims her work was used to train models without her consent.

Meta released its open source model LLaMA to researchers in February. A month later, it leaked more widely via the online forum *****, prompting developers to build on top of it in breach of Meta’s licensing rules, which specify it should not be used in commercial products.

“This model is out there in ways that we wish it wasn’t,” Pineau said.

Other AI companies, such as French start-up Mistral, are also examining the potential of releasing open source versions of their technology. OpenAI, which has released open source AI models for speech and image recognition previously, said its team was looking into developing an open source LLM, provided they were able to reduce the risks of misuse below a minimum threshold.

“We have a choice between deciding that artificial intelligence is too dangerous a technology to remain open and putting it under lock and key and in the hands of a small number of companies that will control it,” Meta’s AI chief LeCun said. “Or, on the contrary, open source platforms that call for contributions . . . from all over the world.”

bnew · Jul 13, 2023

https://www.axios.com/2023/07/13/ap-openai-news-sharing-tech-deal

Exclusive: AP strikes news-sharing and tech deal with OpenAI

Sara Fischer, author of Axios Media Trends

Share on facebook (opens in new window)
Share on twitter (opens in new window)
Share on linkedin (opens in new window)
Share on email (opens in new window)

Illustration: Sarah Grillo/Axios

The Associated Press on Thursday said it reached a two-year deal with OpenAI, the parent company to ChatGPT, to share access to select news content and technology.

Why it matters: The deal marks one of the first official news-sharing agreements made between a major U.S. news company and an artificial intelligence firm.

Details: As part of the deal, OpenAI will license some of the AP’s text archive dating back to 1985 to help train its artificial intelligence algorithms.

The AP will get access to OpenAI’s technology and product expertise.
The two firms are still working through the technical details of how the sharing will work on the back end, a spokesperson said.
Brad Lightcap, OpenAI's chief operating officer, said AP's "feedback—along with access to their high-quality, factual text archive—will help to improve the capabilities and usefulness of OpenAI’s systems.”

Be smart: The AP was one of the first major national news organizations to use automation technology in its news report.

About a decade ago, it began automating corporate earnings reports before later using automation for its coverage of local sporting events.
It has since expanded its use of automation in other parts of the news-gathering and production processes, including helping partner newsrooms adopt automation for coverage of local public safety incidents, and translating weather alerts into Spanish.
Earlier this year, AP launched an AI-enabled search tool that makes it easier for its clients, which are primarily other newsrooms, to access its vast trove of photos and videos using descriptive language, rather than traditional metadata.

Yes, but: The company does not yet use generative AI in its news stories.

The partnership with OpenAI is meant to help the firm understand responsible use cases to potentially leverage generative AI in news products and services in the future.

The big picture: The news industry is grappling with ways to best leverage artificial intelligence to improve output, while also protecting its work from being used to train AI algorithms without permission or compensation.

In striking a deal with OpenAI, AP hopes to be an industry leader in developing standards and best practices around generative AI for other newsrooms.
"AP firmly supports a framework that will ensure intellectual property is protected and content creators are fairly compensated for their work," said Kristin Heitmann, AP senior vice president and chief revenue officer.
"News organizations must have a seat at the table to ensure this happens, so that newsrooms large and small can leverage this technology to benefit journalism,” she added.

What's next: Asked whether the AP is working to secure similar deals with other AI companies, like Google, a spokesperson said, "We have longstanding relationships with many technology companies and ongoing dialogue with each about new opportunities."

Go deeper... ChatGPT: Newsrooms reckon with AI following CNET saga

bnew · Jul 13, 2023

Now Google’s Bard AI chatbot can talk and respond to visual prompts

New features for Bard.

www.theverge.com

Now Google’s Bard AI chatbot can talk and respond to visual prompts

It’s also now available in the EU.

By Jay Peters, a news editor who writes about technology, video games, and virtual worlds. He’s submitted several accepted emoji proposals to the Unicode Consortium.
Jul 13, 2023, 3:01 AM EDT

Illustration: The Verge

Google is adding some new features to its Bard AI chatbot, including the ability for Bard to speak its answers to you and for it to respond to prompts that also include images. The chatbot is also now available in much of the world, including the EU.

In a blog post, Google is positioning Bard’s spoken responses as a helpful way to “correct pronunciation of a word or listen to a poem or script.” You’ll be able to hear spoken responses by entering a prompt and selecting the sound icon. Spoken responses will be available in more than 40 languages and are live now, according to Google.

The feature that lets you add images to prompts is something that Google first showed off at its I/O conference in May. In one example, Google suggested you could use this to ask for help writing a funny caption about a picture of two dogs. Google says the feature is now available in English and is expanding to new languages “soon.”

Google is introducing a few other new features, too, including the ability to pin and rename conversations, share responses with your friends, and change the tone and style of the responses you get back from Bard.

Google first opened up access to Bard in March, but at the time, it was available only in the US and the UK. The company has been rolling out the chatbot to many more countries since then, and that now includes “all countries in the EEA [European Economic Area] and Brazil,” Google spokesperson Jennifer Rodstrom tells The Verge. That expansion in Europe is a notable milestone; the company’s planned Bard launch in the EU was delayed due to privacy concerns.

bnew · Jul 13, 2023

https://archive.is/OjYWc

MosaicML's MPT-30B-Instruct GGML

TheBloke/mpt-30B-instruct-GGML · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

bnew · Jul 14, 2023

The Flat (@the.flat.game) on Threads

The wonders of AI allows us to build a mocap studio with a very small budget. Take that AAA companies! We're coming after you 🤑🤑🤑🤑🤑🤑🤑

www.threads.net

Wayback Machine

https://archive.is/WhrIR

MOVE Ai

FREE YOUR MOTION. CAPTURE YOUR IMAGINATION. No more suits, no more restrictions. High fidelity motion capture made easy

www.move.ai

bnew · Jul 14, 2023

What can you do with 16K tokens in LangChain? | OpenAI | LangChain Tutorial Series

https://archive.is/YvCPy

GitHub - samwit/langchain-tutorials: A set of LangChain Tutorials from my youtube channel

A set of LangChain Tutorials from my youtube channel - GitHub - samwit/langchain-tutorials: A set of LangChain Tutorials from my youtube channel

github.com

GitHub - samwit/llm-tutorials: A set of LLM Tutorials from my youtube channel

A set of LLM Tutorials from my youtube channel . Contribute to samwit/llm-tutorials development by creating an account on GitHub.

github.com

bnew · Jul 14, 2023

Reinforcement Learning From Human Feedback, RLHF. Overview of the Process. Strengths and Weaknesses.

978 views Jun 21, 2023 #PPO #MachineLearning #ReinforcementLearning
Dive into the captivating world of Reinforcement Learning with Human Feedback (RLfH), one of the most sophisticated topics in fine-tuning large language models. This comprehensive guide offers an overview of crucial concepts, focusing on powerful techniques like Proximal Policy Optimization (PPO) and Trust Region Policy Optimization (TRPO).

We begin with an exploration of reinforcement learning's overarching goal: alignment. Uncover the importance of developing models that are not just accurate but also well-behaved and user-friendly, and learn how this approach aids in curbing misleading or inappropriate responses.

Moving forward, we delve into key concepts integral to RLfH such as state and observation space, action space, policy space, trajectories, and reward functions. Discover how derivatives play a pivotal role in calculating gradients and updates for our weights, and grasp the significance of the Hessian matrix in gauging loss sensitivity.

As we unpack RLfH, we unravel the complexities of the PPO and TRPO algorithms. Learn how these techniques aim to modify the network's parameters to achieve desirable behavior, thereby ensuring the alignment of the model's responses with user expectations. We provide an easy-to-follow walkthrough of these algorithms, explaining the significance of their objective functions and their treatment of the KL Divergence, a measure of the difference between two probability distributions.

Then, we guide you through the implementation of these principles into an RLfH pipeline, highlighting the key steps: initial training, collection of human feedback, and the iterative process of reinforcement learning. Understand the tangible benefits of this approach, such as enhanced performance, adaptability, continuous improvement, and safety, as well as the challenges it poses, namely scalability and subjectivity.

Wrapping up, we introduce an exemplary PPO implementation using a library. Experiment, play, and learn in this interactive Google Collab, seeing firsthand the impact of different hyperparameters and data set changes.

This video offers an enlightening journey into the intricacies of RLfH, designed to give you a solid grasp of these complex concepts. Whether you're a professional or just intrigued by the potential of reinforcement learning, you're sure to find value here. Stay tuned for more content on large language models, fine-tuning validations, and much more! Please like, subscribe, and let us know what you'd like to learn next in the comments. Happy learning!

bnew · Jul 14, 2023

Large Language Models Process Explained. What Makes Them Tick and How They Work Under the Hood!

1,049 views Jul 6, 2023 #Attention #MachineLearning #LargeLanguageModels
Explore the fascinating world of large language models in this comprehensive guide. We'll begin by laying a foundation with key concepts such as softmax, layer normalization, and feed forward layers. Next, we'll delve into the first step of these models, tokenization, explaining the difference between word-wise, character-wise, and sub-word tokenization, as well as how they impact the models' understanding and flexibility.

After setting a strong base, we'll dive deeper into the fascinating process of embedding and positional encoding. Learn how these techniques translate human language into a language that models can understand, ultimately creating a space where tokens can relate to one another.

Then, prepare for the main event as we pull back the curtain on attention — the magic that enables our models to understand context when completing sentences. We'll decode complex concepts like the query, key, and value matrices, guiding you through their function in the model's computation process.

Finally, we'll dive into the concept of multi-attention heads and their role in improving a model's ability to handle complex inputs. This video provides an accessible understanding of large language models without drowning you in complex math, offering a perfect balance between clarity and depth. Whether you're a seasoned pro or just getting started in the field of AI, this video promises to enhance your understanding of these incredible computational tools.

bnew · Jul 14, 2023

SuperHOT, 8k and 16k Local Token Context! How Does It Work? What We Believed About LLM’s Was Wrong.

2,217 views Jun 29, 2023 #SuperHot #LanguageModels
Hey everyone! Today, we're delving deep into SuperHot, an innovative approach that dramatically extends the context length of the links we're used to, from 8K to a whopping 16,000 tokens. You might wonder, how is this possible? Or what were the hurdles we had to overcome? In this video, we unravel these questions and discuss the strategies we developed to address the issue of context length.

We start off by clarifying a few critical concepts, such as the attention layer and its quadratic complexity, the implications of sinusoidal positional encoding, and the intricacies of previous techniques like Alibi and Landmark. We also cover neuron activation with RelU, the softmax function, and the dot product, all critical to understanding the changes made.

We then dive into the positional encoding problem and how we began to tackle it. You'll learn about different types of positional encoding, including sinusoidal, linear, and the revolutionary Rotary encodings. Discover why these methods were important in providing hints about what was actually happening with our models.

Towards the end, we delve into the power behind Rotary positional embeddings and how these play a pivotal role in relative and absolute encoding. We'll discuss why this should theoretically allow our models to extrapolate into higher token counts and the reality of why this doesn't always happen.

Finally, we introduce SuperHot, a game-changer in context length extension. We discuss its workings, how it memorizes the tokens and their positional encodings, and why repeating positional encodings allowed us to cheat the system and extend our token count dramatically.

So, if you're ready to learn about the next level of language models and how SuperHot is paving the way for extending context length, this video is for you. Please remember to like, subscribe, and drop us a comment if you found this video helpful, and stay tuned for our next video where we talk about llms from start to finish. See you in the next one!

bnew · Jul 14, 2023

https://archive.is/c6dDn

newarkhiphop · Jul 14, 2023

GitHub - HighwayofLife/awesome-chatgpt-plugins: A curated list of all of the ChatGPT plugins available within ChatGPT plus, includes detailed descriptions and usage docs, as well as unofficial sources of plugins

A curated list of all of the ChatGPT plugins available within ChatGPT plus, includes detailed descriptions and usage docs, as well as unofficial sources of plugins - GitHub - HighwayofLife/awesome-...

github.com

bnew · Jul 15, 2023

In-context Autoencoder for Context Compression in a Large Language Model

https://arxiv.org/pdf/2307.06945.pdf

Abstract

We propose the In-context Autoencoder (ICAE) for context compression in a large language model (LLM). The ICAE has two modules: a learnable encoder adapted with LoRA from an LLM for compressing a long context into a limited number of memory slots, and a fixed decoder which is the target LLM that can condition on the memory slots for various purposes. We first pretrain the ICAE using both autoencoding and language modeling objectives on massive text data, enabling it to generate memory slots that accurately and comprehensively represent the original context. Then, we fine-tune the pretrained ICAE on a small amount of instruct data to enhance its interaction with a variety of prompts for producing desirable responses. Our experimental results demonstrate that the ICAE learned with our proposed pretraining and fine-tuning paradigm can effectively produce memory slots with 4× context compression, which can be well conditioned on by the target LLM for various purposes. The promising results demonstrate significant implications of the ICAE for its novel approach to the long context problem and its potential to reduce computation and memory overheads for LLM inference in practice, suggesting further research effort in context management for an LLM.

Our code and data will be released shortly

Long context

As artificial intelligence becomes an increasingly powerful force, some of the world’s biggest companies are worrying about how the technology will be used ethically, and how the public will perceive its spread. To combat these problems (among others), five tech companies — Google, Amazon, MicrosoD, Facebook, and IBM — set up a research group called the Partnership on AI. … AI taking white collar jobs, eroding trust in public media, becoming embedded in public ins&tu&ons like the courts and hospitals: these are the sorts of problems facing the industry in the future.

bnew · Jul 15, 2023

Meta Unveils CM3leon: A Breakthrough AI Model for Advanced Text-to-Image Generation and Image Understanding

With its versatile capabilities and improved performance, CM3leon represents a significant step towards higher-fidelity image generation and understanding, paving the way for enhanced creativity and applications in the metaverse.

www.maginative.com

Meta Unveils CM3leon: A Breakthrough AI Model for Advanced Text-to-Image Generation and Image Understanding

With its versatile capabilities and improved performance, CM3leon represents a significant step towards higher-fidelity image generation and understanding, paving the way for enhanced creativity and applications in the metaverse.

CHRIS MCKAY

JULY 14, 2023 • 3 MIN READ

Image Credit: Meta

Today, Meta shared it's latest research on CM3leon (pronounced “chameleon”), a transformer-based model that achieves state-of-the-art results on text-to-image generation and shows new capabilities for multimodal AI. CM3leon marks the first time an autoregressive model has matched the performance of leading generative diffusion models on key benchmarks.

In recent years, generative AI models capable of creating images from text prompts have progressed rapidly. Models like Midjourney, DALL-E 2 and Stable Diffusion can conjure photorealistic scenes and portraits from short text descriptions. These models use a technique called diffusion—a process that involves iteratively reducing noise in an image composed entirely of noise, and gradually bringing it closer to the desired target. While diffusion-based methods yield impressive results, their computational intensity poses challenges, as they can be expensive to run and often lack the speed required for real-time applications.

CM3leon takes a different approach. As a transformer-based model, it utilizes the power of attention mechanisms to weigh the relevance of input data, whether it's text or images. This architectural distinction allows CM3leon to achieve faster training speeds and better parallelization, making it more efficient than traditional diffusion-based methods.

CM3leon was trained efficiently on a dataset of licensed images using just a single TPU pod, and reaches an FID score of 4.88 on the MS-COCO dataset. Meta researchers say the model is over 5x more efficient than comparable transformer architectures.

But raw performance metrics don't tell the full story. Where CM3leon truly shines is in handling more complex prompts and image editing tasks. For example, CM3leon can accurately render an image from a prompt like "A small cactus wearing a straw hat and neon sunglasses in the Sahara desert."

The model also excels at making edits to existing images based on free-form text instructions, like changing the sky color or adding objects in specific locations. These capabilities far surpass what leading models like DALL-E 2 can currently achieve.

Text Guided Image Editing

CM3leon's versatile architecture allows it to move fluidly between text, images, and compositional tasks. Beyond text-to-image generation, CM3leon can generate captions for images, answer questions about image content, and even create images based on textual descriptions of bounding boxes and segmentation maps. This combination of modalities into a single model is unprecedented among publicly revealed AI systems.

Object-to-image

Given a text description of the bounding box segmentation of the image, CM3leon can generate an image.

Super-resolution results

A separate super-resolution stage can be integrated with CM3leon output that significantly improves resolution and detail. Below are four example images for each of the prompts: (1) A steaming cup of coffee with mountains in the background. Resting during road trip. (2) Beautiful, majestic road during sunset. Aesthetic. (3) Small circular island in the middle of a lake. Forests surrounding the lake. High Contrast

CM3leon's success can be attributed to its unique architecture and training methods. The model employs a decoder-only transformer architecture, similar to established text-based models, but with the added capability of handling both text and images. Training involves retrieval augmentation, building upon recent work in the field, and instruction fine-tuning across various image and text generation tasks.

By applying a technique called supervised fine-tuning across modalities, Meta was able to significantly boost CM3leon's performance at image captioning, visual QA, and text-based editing. Despite being trained on just 3 billion text tokens, CM3leon matches or exceeds the results of other models trained on up to 100 billion tokens.

Meta has yet to announce plans to release CM3leon publicly. But the model defines a new bar for multimodal AI and shows the power of techniques like retrieval augmentation and supervised fine-tuning. It's a remarkable achievement that points to a future where AI systems can smoothly transition between understanding, editing, and generating across images, video, and text.

bnew · Jul 15, 2023

The A.I Megathread (LLM , GPT , Development)

Veteran

Stability AI releases Stable Doodle, a sketch-to-image tool​

Veteran

Meta to release commercial AI model in effort to catch rivals​

Veteran

Exclusive: AP strikes news-sharing and tech deal with OpenAI​

Veteran

Now Google’s Bard AI chatbot can talk and respond to visual prompts​

It’s also now available in the EU.​

Veteran

Veteran

Veteran

What can you do with 16K tokens in LangChain? | OpenAI | LangChain Tutorial Series​

Veteran

Reinforcement Learning From Human Feedback, RLHF. Overview of the Process. Strengths and Weaknesses.​

Veteran

Large Language Models Process Explained. What Makes Them Tick and How They Work Under the Hood!​

Veteran

SuperHOT, 8k and 16k Local Token Context! How Does It Work? What We Believed About LLM’s Was Wrong.​

Veteran

Moderator

Veteran

In-context Autoencoder for Context Compression in a Large Language Model​

Veteran

Meta Unveils CM3leon: A Breakthrough AI Model for Advanced Text-to-Image Generation and Image Understanding​

CHRIS MCKAY​

Text Guided Image Editing​

Object-to-image​

Super-resolution results​

Veteran

Stability AI releases Stable Doodle, a sketch-to-image tool

Meta to release commercial AI model in effort to catch rivals

Exclusive: AP strikes news-sharing and tech deal with OpenAI

Now Google’s Bard AI chatbot can talk and respond to visual prompts

It’s also now available in the EU.

What can you do with 16K tokens in LangChain? | OpenAI | LangChain Tutorial Series

Reinforcement Learning From Human Feedback, RLHF. Overview of the Process. Strengths and Weaknesses.

Large Language Models Process Explained. What Makes Them Tick and How They Work Under the Hood!

SuperHOT, 8k and 16k Local Token Context! How Does It Work? What We Believed About LLM’s Was Wrong.

In-context Autoencoder for Context Compression in a Large Language Model

Meta Unveils CM3leon: A Breakthrough AI Model for Advanced Text-to-Image Generation and Image Understanding

CHRIS MCKAY

Text Guided Image Editing

Object-to-image

Super-resolution results