bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805

Meta is putting AI chatbots everywhere​

The Meta AI assistant is coming to WhatsApp, Messenger, and Instagram, along with dozens of AI characters based on celebrities like MrBeast and Charli D’Amelio.​

By Alex Heath, a deputy editor and author of the Command Line newsletter. He’s covered the tech industry for over a decade at The Information and other outlets.


Sep 27, 2023, 5:30 PM UTC

A screenshot of Meta AI image generation.

An example of what Meta’s AI assistant can do. Meta

Meta is officially entering the AI chatbot wars, starting with its own assistant and a slew of AI characters it’s releasing in WhatsApp, Instagram, and Messenger.

For anyone who has used OpenAI’s ChatGPT, or other chatbots like Anthropic’s Claude, Meta’s AI will immediately feel familiar. Meta sees it as a general-purpose assistant for everything from planning a trip with friends in a group chat to answering questions you’d normally ask a search engine. On that latter piece, Meta is announcing a partnership with Microsoft’s Bing to provide real-time web results, which sets Meta AI apart from a lot of the other free AIs out there that don’t have super recent information.

Another big aspect of the Meta AI is its ability to generate images like Midjourney or OpenAI’s DALL-E via the prompt “/imagine.” In my brief demo, it produced compelling high-res photos in a few seconds. Like all of Meta’s AI features being announced this week, this image generation is totally free to use.

Ahmad Al-Dahle, Meta’s VP of generative AI who has been leading the assistant’s development, wouldn’t tell me exactly what it’s trained on. He described it as a “custom-made” large language model that is “based on a lot of the core principles behind Llama 2,” Meta’s latest quasi-open source model that is being quickly adopted across various industries.

The rapid adoption of Llama 2 has helped Meta refine how its own assistant works, he says. “We just saw huge demand for the models, and then we saw an incredible amount of innovation happening on the models that really helped us understand their performance, understand their weaknesses, and help us iterate and leverage some of those components directly into product.”

In terms of how Meta AI differs from Llama 2, Al-Dahle says his team spent time “refining additional data sets for conversations so that we can create a tone that is conversational and friendly in the way that the assistant responds. A lot of existing AIs can be like robotic or bland.” Meta expanded the model’s context window, or the ability to leverage previous interactions to generate what the model produces next, “so that we can build a deeper, more capable back and forth” with users. He says Meta AI has also been tuned to give “very concise” answers.

Some of Meta’s AI characters are familiar faces. Image: Meta

Alongside Meta’s assistant, the company is beginning to roll out an initial roster of 28 AI characters across its messaging apps. Many of them are based on celebrities like Charli D’Amelio, Dwyane Wade, Kendall Jenner, MrBeast, Snoop Dogg, and Paris Hilton. Others are themed to specific use cases like a travel agent.

An interesting twist is an aspect of these characters that Al-Dahle calls “embodiments.” As you chat with one of them, their profile image subtly animates based on the conversation. The effect is more immersive than the 2D chatbots I’ve interacted with to date.

Related​


During my brief time trying Meta AI last week, I tried getting it to slip up and say something bad. It told me that covid vaccines are safe and that it can’t help me make a dirty bomb. It wouldn’t give me advice on how to break up with someone, which suggests that Meta has added a lot of safeguards to avoid as many PR disasters as it can. Al-Dahle says the company spent 6,000 hours red-teaming the model to find problematic use cases and that employees have been creating thousands of conversations with it daily in the run-up to release.

Image: Meta

For now, Meta AI isn’t trained on public user data across Instagram and Facebook, though it sounds like that is coming. It’s easy to imagine asking it to “show me reels from the south of Italy” and that being a compelling use case that other chatbots can’t replicate. “We see a long roadmap for us to tie in some of our own social integrations as part of the assistant to make it even more useful,” says Al-Dahle.

“We see a long roadmap for us to tie in some of our own social integrations”

After talking with Al-Dahle and other Meta execs, it’s clear that the company sees its unrivaled distribution — billions of daily users across its messaging apps — as a key competitive edge against ChatGPT and others. The assistant is “right there inside of your chat context, and our chat applications are quite popular,” says Al-Dahle. “You don’t have to pull yourself out of context to interact or engage or get the assistant to help you.”

OpenAI may have kick-started the chatbot race, but given Meta’s immense scale through its social networks, its assistant may actually be the AI that most people use for the first time.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805

Meta

Introducing New AI Experiences Across Our Family of Apps and Devices​

September 27, 2023

Video Player

Media error: Format(s) not supported or source(s) not found
Download File: https://about.fb.com/wp-content/uploads/2023/09/Social-Profiles-for-AI_Header-1.mp4?_=1

Takeaways​

  • We’re starting to roll out AI stickers across our apps, and soon you’ll be able to edit your images or even co-create them with friends on Instagram using our new AI editing tools, restyle and backdrop.
  • We’re introducing Meta AI in beta, an advanced conversational assistant that’s available on WhatsApp, Messenger, and Instagram, and is coming to Ray-Ban Meta smart glasses and Quest 3. Meta AI can give you real-time information and generate photorealistic images from your text prompts in seconds to share with friends. (Available in the US only)
  • We’re also launching 28 more AIs in beta, with unique interests and personalities. Some are played by cultural icons and influencers, including Snoop Dogg, Tom Brady, Kendall Jenner, and Naomi Osaka.
  • Over time, we’re making AIs for businesses and creators available, and releasing our AI studio for people and developers to build their own AIs.
  • These new AI experiences also come with a new set of challenges for our industry. We’re rolling out our new AIs slowly and have built in safeguards.

AI is enabling new forms of connection and expression, thanks to the power of generative technologies. And today at Connect, we introduced you to new AI experiences and features that can enhance your connections with others – and give you the tools to be more creative, expressive, and productive.

AI Stickers​

Billions of stickers are sent across our platforms every month, adding another fun and creative way for people to communicate and express themselves. Today, we announced new AI stickers that enable you to effortlessly generate customized stickers for your chats and stories. Using technology from Llama 2 and our foundational model for image generation called Emu, our AI tool turns your text prompts into multiple unique, high-quality stickers in seconds. This new feature, which is rolling out to select English-language users over the next month in WhatsApp, Messenger, Instagram, and Facebook Stories, provides infinitely more options to convey how you’re feeling at any moment. AI stickers will roll out to select English language users over the next month.
Animation showing AI-generated stickers

Image Editing With AI​

Soon, you’ll be able to transform your images or even co-create AI-generated images with friends. Restyle and backdrop – two new features that are coming soon to Instagram – use the technology from Emu. Backdrop also leverages learnings from our Segment Anything Model.

Restyle lets you reimagine your images by applying the visual styles you describe. Think of typing a descriptor like “watercolor” or a more detailed prompt like “collage from magazines and newspapers, torn edges” to describe the new look and feel of the image you want to create.
Animation showing Restyle tool

Backdrop changes the scene or background of your image. Prompts like “put me in front of a sublime aurora borealis” or “surrounded by puppies” will cue the tool to create an image of the primary subject in the foreground with the background you described.
Animation showing Backdrop tool

We know how important transparency is when it comes to the content AI generates, so images created with restyle and backdrop will indicate the use of AI to reduce the chances of people mistaking them for human-generated content. We’re also experimenting with forms of visible and invisible markers.

We want these experiences to be safe and trustworthy, while bringing new forms of creativity, entertainment, and expression into your day.

An Assistant That Spans Our Apps and Devices​

Meta AI is a new assistant you can interact with like a person, available on WhatsApp, Messenger, Instagram, and coming soon to Ray-Ban Meta smart glasses and Quest 3. It’s powered by a custom model that leverages technology from Llama 2 and our latest large language model (LLM) research. In text-based chats, Meta AI has access to real-time information through our search partnership with Bing and offers a tool for image generation.
Animation showing Meta AI

Here’s an example of how you might use Meta AI:

Imagine you and your friends are in a group chat discussing which trailhead to try in Santa Cruz. Meta AI surfaces options directly in the chat, so you can decide as a group which location to explore. What if after the hike you want a creative way to commemorate the day? Meta AI can help. Type “@MetaAI /imagine” followed by a descriptive text prompt like “create a button badge with a hiker and redwood trees,” and it will create a digital merit badge in the chat with your friends.

A Universe of Characters at Your Fingertips​

Our journey with AIs is just beginning, and it isn’t purely about building AIs that only answer questions. We’ve been creating AIs that have more personality, opinions, and interests, and are a bit more fun to interact with. Along with Meta AI, there are 28 more AIs that you can message on WhatsApp, Messenger, and Instagram. You can think of these AIs as a new cast of characters – all with unique backstories.

And because interacting with them should feel like talking to familiar people, we did something to build on this even further. We partnered with cultural icons and influencers to play and embody some of these AIs. They’ll each have profiles on Instagram and Facebook, so you can explore what they’re all about.
  • Charli D’Amelio as Coco, Dance enthusiast
  • Chris Paul as Perry, Pro golfer helping you perfect your stroke
  • Dwyane Wade as Victor, Ironman triathlete motivating you to be your best self
  • Izzy Adesanya as Luiz, Showy MMA prospect who can back up his trash talk
  • Kendall Jenner as Billie, No-BS, ride-or-die companion
  • LaurDIY as Dylan, Quirky DIY and Craft expert and companion for Gen Z
  • MrBeast as Zach, The big brother who will roast you — because he cares
  • Naomi Osaka as Tamika, Anime-obsessed Sailor Senshi in training
  • Paris Hilton as Amber, Detective partner for solving whodunnits
  • Raven Ross as Angie, Workout class queen who balances fitness with meditation
  • Roy Choi as Max, Seasoned sous chef for culinary tips and tricks
  • Sam Kerr as Sally, Free-spirited friend who’ll tell you when to take a deep breath
  • Snoop Dogg as Dungeon Master, Choose your own adventure with the Dungeon Master
  • Tom Brady as Bru, Wisecracking sports debater who pulls no punches

We’re going to start rolling these out in beta in the United States today. We’ll add new characters in the coming weeks played by Bear Grylls, Chloe Kim, and Josh Richards among others.
Animation showing chats with AIs

It’s still early days for our AIs. Right now, their knowledge base – with the exception of Meta AI, Bru, and Perry – is limited to information that largely existed prior to 2023, which means some responses may be dated. We aim to bring search to many more of our AIs in the coming months – like we have done with Meta AI – so that conversations can be timely and relevant too.

We are committed to building responsibly with safety in mind. We are continuing to test and evolve the capabilities of our AIs, and will improve the experience over time through what we learn from your interactions with them. Your direct feedback and the conversations you have with our AIs are core parts of what will help us improve our AI models, and ultimately enhance the experience at scale.

What’s Coming Next​

We introduced AI studio today, the platform that supports the creation of our AIs and we plan to make it available for people outside of Meta – coders and non-coders alike – to build AIs. Developers will be able to build third-party AIs for our messaging services with our APIs in the coming weeks, starting on Messenger then expanding to WhatsApp.

Businesses will also be able to create AIs that reflect their brand’s values and improve customer service experiences. From small businesses looking to scale to large brands wanting to enhance communications, AIs can help businesses engage with their customers across our apps. We’re launching this in alpha and will scale it further next year.

And for creators, they’ll be able to build AIs that extend their virtual presence across our apps. These AIs will have to be sanctioned by them and directly controlled by the creator.

We’re also building a sandbox that will be released in the coming year, enabling anyone to experiment with creating their own AI. As our universe of AIs continues to grow and evolve, we’ll bring this sandbox to the metaverse, giving you the chance to build AIs that adopt an even greater level of realism, embodiment, and connectedness.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805

ChatGPT users can now browse internet, OpenAI says​

Reuters

September 27, 20234:40 PM EDTUpdated 2 hours ago

OpenAI and ChatGPT logos are seen in this illustration taken, February 3, 2023. REUTERS/Dado Ruvic/Illustration/

OpenAI and ChatGPT logos are seen in this illustration taken, February 3, 2023. REUTERS/Dado Ruvic/Illustration/File Photo Acquire Licensing Rights

Sept 27 (Reuters) - ChatGPT users will now be able to surf the web, Microsoft-backed (MSFT.O) OpenAI said on Wednesday, expanding the data the viral chatbot can access beyond its earlier September 2021 cutoff.

The artificial intelligence startup said its latest browsing feature would allow websites to control how ChatGPT can interact with them.

"Browsing is available to Plus and Enterprise users today, and we'll expand to all users soon. To enable, choose Browse with Bing in the selector under GPT-4," OpenAI said in a post on social media platform X, formerly known as Twitter.


The startup also announced a major update earlier this week that would enable ChatGPT to have voice conversations with users and interact with them using images, moving it closer to popular AI assistants like Apple's (AAPL.O) Siri.

OpenAI had earlier tested a feature that allowed users to access the latest information through the Bing search engine within its premium ChatGPT Plus offering. But it later disabled it because of fears that it could allow users to bypass paywalls.


ChatGPT became the fastest-growing consumer application in history earlier this year, reaching 100 million monthly active users in January, before being supplanted by Meta's Threads app.

Its rise has driven up investor interest in OpenAI, with media including Reuters reporting on Tuesday that the startup is talking to shareholders about a possible sale of existing shares at a much higher valuation than a few months ago.

Reporting by Samrhitha Arunasalam in Bengaluru; Editing by Devika Syamnath and Pooja Desai




ChatGPT can now search the web in real time​

/

OpenAI promises up-to-date information with direct links to sources for subscribers only, but others will get the feature.​

By Wes Davis, a weekend editor who covers the latest in tech and entertainment. He has written news, reviews, and more as a tech journalist since 2020.

Sep 27, 2023, 4:38 PM EDT|
A rendition of OpenAI’s logo, which looks like a stylized whirlpool.

Illustration: The Verge

OpenAI posted today that ChatGPT can once more trawl the web for current information, offering answers taken directly from “current and authoritative” sources, which it cites in its responses. The feature, called Browse with Bing, is only open to those with Plus and Enterprise subscriptions for now, but the company says it will roll it out “to all users soon.”

Microsoft’s Bing Chat on Windows, in the Edge browser, and in third-party browser plugins could already return live information from the web, and so can Google’s Bard in Chrome and other browsers. Both also offer links when searching, as ChatGPT’s Browse with Bing feature now does. Meta just announced at Meta Connect that it will also use Bing to power real-time web results in the Meta AI Assistant it’s adding to WhatsApp, Instagram, and Messenger.

Related​


It’s a little confusing to get ChatGPT to search the web for you. The company provides instructions for the browser version, but I didn’t find the same for the iOS app. I figured it out, though. Assuming you have a subscription, it’s: three dots menu > Settings > New Features > Browse with Bing. Then, start a new chat, tap GPT-4, and “Browse with Bing.” Then your searches should return information from current websites.

It’s a little slow, but it works. And when it answers a question for you, you can click the link to the site to compare the answers. Now I know that, according to MediaMass — a website I’ve never heard of — AC/DC might be working on a new album! Given AI bots’ tendency to hallucinate, being able to check them on their sources is a huge improvement that not only means you can actually verify they’re not lying to you, but also, it’s just nice to give credit where it’s due.

A screenshot of a ChatGPT transcript.

AC/DC may or may not be working on a new album, according to sources cited by ChatGPT. Screenshot: Wes Davis / The Verge

OpenAI added the ability to browse the internet within its ChatGPT iOS app in late June but quickly pulled it. Users had figured out they could coax the chatbot into giving them otherwise paywalled content by feeding a URL directly to it. Since then, OpenAI’s automated crawler that feeds information to the model powering ChatGPT has begun identifying itself with a user agent so that sites can filter themselves out of its analysis with updates to their Robots.txt file forbidding it.


If you subscribe to one of OpenAI’s plans and want to try out the Browse with Bing feature, here are the company’s instructions:
Click on ‘Profile & Settings’

Select ‘Beta features’

Toggle on ‘Browse with Bing’

Choose Browse with Bing in the selector under GPT-4.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805

Mistral AI makes its first large language model free for everyone​

Devin Coldewey@techcrunch / 2:41 PM EDT•September 27, 2023
Comment
mistral-7b-v0.1

Image Credits: Mistral AI

The most popular language models out there may be accessed via API, but open models — as far as that term can be taken seriously — are gaining ground. Mistral, a French AI startup that raised a huge seed round in June, has just taken the wraps off its first model, which it claims outperforms others of its size — and it’s totally free to use without restrictions.

The Mistral 7B model is available today for download by various means, including a 13.4-gigabyte torrent (with a few hundred seeders already). The company has also started a GitHub repository and Discord channel for collaboration and troubleshooting.

Most importantly, the model was released under the Apache 2.0 license, a highly permissive scheme that has no restrictions on use or reproduction beyond attribution. That means the model could be used by a hobbyist, a multi-billion-dollar corporation, or the Pentagon alike, as long as they have a system capable of running it locally or are willing to pay for the requisite cloud resources.


Mistral 7B is a further refinement of other “small” large language models like Llama 2, offering similar capabilities (according to some standard benchmarks) at a considerably smaller compute cost. Foundation models like GPT-4 can do much more, but are far more expensive and difficult to run, leading them to be made available solely through APIs or remote access.

“Our ambition is to become the leading supporter of the open generative AI community, and bring open models to state-of-the-art performance,” wrote Mistral’s team in a blog post accompanying the model’s release. “Mistral 7B’s performance demonstrates what small models can do with enough conviction. This is the result of three months of intense work, in which we assembled the Mistral AI team, rebuilt a top-performance MLops stack, and designed a most sophisticated data processing pipeline, from scratch.”

For some (perhaps most), that list may sound like more than three months’ work, but the founders had a head start in that they had worked on similar models at Meta and Google DeepMind. That doesn’t make it easy, exactly, but at least they knew what they were doing.

Of course, although it can be downloaded and used by everyone, that is very different from being “open source” or some variety of that term, as we discussed last week at Disrupt. Though the license is highly permissive, the model itself was developed privately, using private money, and the datasets and weights are likewise private.


And that is what appears to make up Mistral’s business model: The free model is free to use, but if you want to dig in, you’ll want their paid product. “[Our commercial offering] will be distributed as white-box solutions, making both weights and code sources available. We are actively working on hosted solutions and dedicated deployment for enterprises,” the blog post reads.

I’ve asked Mistral for clarification around some of the openness and their plans for releases in the future, and will update this post if I hear back from them.





















 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805

Mistral 7B​

The best 7B model to date, Apache 2.0​

  • September 27, 2023
  • Mistral AI team

Mistral AI team is proud to release Mistral 7B, the most powerful language model for its size to date.

Mistral 7B in short​

Mistral 7B is a 7.3B parameter model that:
  • Outperforms Llama 2 13B on all benchmarks
  • Outperforms Llama 1 34B on many benchmarks
  • Approaches CodeLlama 7B performance on code, while remaining good at English tasks
  • Uses Grouped-query attention (GQA) for faster inference
  • Uses Sliding Window Attention (SWA) to handle longer sequences at smaller cost

We’re releasing Mistral 7B under the Apache 2.0 license, it can be used without restrictions.

Mistral 7B is easy to fine-tune on any task. As a demonstration, we’re providing a model fine-tuned for chat, which outperforms Llama 2 13B chat.

Performance in details​

We compared Mistral 7B to the Llama 2 family, and re-run all model evaluations ourselves for fair comparison.
histograms
Performance of Mistral 7B and different Llama models on a wide range of benchmarks. For all metrics, all models were re-evaluated with our evaluation pipeline for accurate comparison. Mistral 7B significantly outperforms Llama 2 13B on all metrics, and is on par with Llama 34B (since Llama 2 34B was not released, we report results on Llama 34B). It is also vastly superior in code and reasoning benchmarks.

The benchmarks are categorized by their themes:
  • Commonsense Reasoning: 0-shot average of Hellaswag, Winogrande, PIQA, SIQA, OpenbookQA, ARC-Easy, ARC-Challenge, and CommonsenseQA.
  • World Knowledge: 5-shot average of NaturalQuestions and TriviaQA.
  • Reading Comprehension: 0-shot average of BoolQ and QuAC.
  • Math: Average of 8-shot GSM8K with maj@8 and 4-shot MATH with maj@4
  • Code: Average of 0-shot Humaneval and 3-shot MBPP
  • Popular aggregated results: 5-shot MMLU, 3-shot BBH, and 3-5-shot AGI Eval (English multiple-choice questions only)
table

An interesting metric to compare how models fare in the cost/performance plane is to compute “equivalent model sizes”. On reasoning, comprehension and STEM reasoning (MMLU), Mistral 7B performs equivalently to a Llama 2 that would be more than 3x its size. This is as much saved in memory and gained in throughput.
effective_sizes
Results on MMLU, Commonsense Reasoning, World Knowledge and Reading comprehension for Mistral 7B and Llama 2 (7B/13/70B). Mistral 7B largely outperforms Llama 2 13B on all evaluations, except on knowledge benchmarks, where it is on par (this is likely due to its limited parameter count, which restricts the amount of knowledge it can compress).

Note: Important differences between our evaluation and the LLaMA2 paper’s:
  • For MBPP, we use the hand-verified subset
  • For TriviaQA, we do not provide Wikipedia contexts

Flash and Furious: Attention drift​

Mistral 7B uses a sliding window attention (SWA) mechanism (Child et al., Beltagy et al.), in which each layer attends to the previous 4,096 hidden states. The main improvement, and reason for which this was initially investigated, is a linear compute cost of O(sliding_window.seq_len). In practice, changes made to FlashAttention and xFormers yield a 2x speed improvement for sequence length of 16k with a window of 4k. A huge thanks to Tri Dao and Daniel Haziza for helping include these changes on a tight schedule.

Sliding window attention exploits the stacked layers of a transformer to attend in the past beyond the window size: A token i at layer k attends to tokens [i-sliding_window, i] at layer k-1. These tokens attended to tokens [i-2*sliding_window, i]. Higher layers have access to informations further in the past than what the attention patterns seems to entail.
Local attention

Finally, a fixed attention span means we can limit our cache to a size of sliding_window tokens, using rotating buffers (read more in our reference implementation repo). This saves half of the cache memory for inference on sequence length of 8192, without impacting model quality.

Fine-tuning Mistral 7B for chat​

To show the generalization capabilities of Mistral 7B, we fine-tuned it on instruction datasets publicly available on HuggingFace. No tricks, no proprietary data. The resulting model, Mistral 7B Instruct, outperforms all 7B models on MT-Bench, and is comparable to 13B chat models.
MT-Bench

The Mistral 7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. We’re looking forward to engaging with the community on ways to make the models finally respect guardrails, allowing for deployment in environnements requiring moderated outputs.

Acknowledgements​

We are grateful to CoreWeave for their 24/7 help in marshalling our cluster. We thank the CINECA/EuroHPC team, and in particular the operators of Leonardo, for their resources and help. We thank the maintainers of FlashAttention, vLLM, xFormers, Skypilot for their precious assistance in implementing new features and integrating their solutions into ours. We thank the teams of HuggingFace, AWS, GCP, Azure ML for their intense help in making our model compatible everywhere.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805

The new Mistral model can handle 100k+ token sequences with linear (ish) complexity.

How does it do it?

The answer is Sliding Window Attention.

In standard, decoder-only, causal LMs (like the GPT family), each token can "attend to" (i.e. "look at") every token that has come before it.

In Sliding Window Attention, earlier layers have a narrower view of history, and this progressively builds up the deeper you go into the model.

In the first layer of the model, each token can look at some number of previous tokens, known as the "window size". Mistral used a window size of 4096, so we'll go with that.

So, each token at the first layer can attend to the previous 4096 tokens. In the second layer, it's the exact same deal -- each token can look at the previous 4096 tokens.

So what's the big deal? Aren't we really just using some local attention mechanism and cutting off access as we get deeper into the sequence?

Not quite! You see, in the second layer, the model effectively is viewing 4096*2 tokens, since the previous 4096 tokens themselves attended to the 4096 tokens prior to that.

By the time you get to the end of their 32 layer model, the final layer has effectively attended to over 131k tokens. And, size the attention matrices never goes beyond the 4096 dimension, we essentially get increased context lengths >4096 "for free" (sorta).

It's somewhat analogous to dilated convolutions in networks like WaveNet, except rather than allowing each layer to view every 2^N previous tokens, we're allowing it to view the most recent chunks.

This is already implemented in Flash Attention, and Mistral claims that gives a 2x speedup in addition to all of these other efficiency gains. Incredible!

Mistal 7B base and instruct release post
Read some tips, summaries, and resources👀

Experiment with it
I wrote a notebook showing how to use the proper prompt formatting + free Inference API + demo for anyone to try it out!

Notebook: huggingface.co/spaces/osanse…

Why is Mistral 7B interesting?
🏆Strongest <20B pretrained model
🤏On par with many models of 30B params
Apache 2.0 license
🖥️Does decently in code tasks (code llama 7B quality)
📏 Thanks to windowed attention, you can use up to 200K tokens of context (using Rope, using 4 A10G GPUs)
🤗 Integrated into transformers, vllm, tgi, and more!
🥔 Thanks to the small size, you can host on a server or locally

Link to leaderboard: Open LLM Leaderboard - a Hugging Face Space by HuggingFaceH4
g5cZCro.png



 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805

UPDATE

Open-sourcing SQLCoder: a state-of-the-art LLM for SQL generation​

We are thrilled to open-source Defog SQLCoder: a 15B parameter LLM that outperforms gpt-3.5-turbo on text-to-SQL tasks.

Aug 20, 2023

rishabh-wendy-author.webp
Rishabh Srivastava & Wendy AwCo-Founder & Machine Learning Engineer

sqlcoder.webp

Introducing SQLCoder​

We are thrilled to open source Defog SQLCoder – a state-of-the-art LLM for converting natural language questions to SQL queries. SQLCoder significantly outperforms all major open-source models and slightly outperforms gpt-3.5-turbo and text-davinci-003 (models that are 10 times its size) on our open-source evaluation framework.

You can explore SQLCoder in our interactive demo at https://defog.ai/sqlcoder-demo

sql-eval.webp


SQLCoder is a 15B parameter LLM, and a fine-tuned implementation of StarCoder. SQLCoder has been fine-tuned on hand-crafted SQL queries in increasing orders of difficulty. When fine-tuned on an individual database schema, it matches or outperforms GPT-4 performance.


You can find our Github repo here, and our model weights on Huggingface here. You can also use our interactive demo here to explore our model in the browser.

Motivation​

In the last 3 months, we have deployed SQLCoder with enterprise customers in healthcare, finance, and government. These customers often have sensitive data that they do not want going out of their servers, and using self-hosted models has been the only way for them while using LLMs.

We were able to build a SOTA model that was competitive with closed source models. We did this while standing on the shoulders of giants like StarCoder, and open-sourcing the model weights is our attempt at giving back to the community.

Approach​

Dataset Creation​

We created a hand-curated dataset of prompt-completion pairs focused on text-to-SQL tasks. This dataset was created from 10 different schemas, and with questions of varying levels of difficulty. Additionally, we also created an evaluation dataset of 175 questions from 7 new schemas that were not a part of the 10 schemas in our training data.

We made sure that we selected complex schemas with 4-20 tables in both our training and evaluation datasets. This is because schemas with 1 or 2 tables tend to only result in simple, straightforward queries due to limited relations.

Question Classification​

Once the dataset was created, we classified each question in the dataset into “easy”, “medium”, “hard”, and “extra-hard” categories. This classification was done by adapting the rubric utilized by the Spider dataset to gauge SQL hardness.

Finally, we divided the dataset into two distinct subparts – one with easy and medium questions, and the other with hard and extra-hard questions.

Fine-tuning​

We fine-tuned the model in two stages. First, we fine-tuned the base StarCoder model on just our easy and medium questions. Then, we fine-tuned the resulting model (codenamed defog-easy) on hard and extra hard questions to get SQLcoder.

Evaluation​

We evaluated our model on a custom dataset we created. Evaluating the “correctness” of SQL queries is difficult. We considered using GPT-4 as a “judge”, but ran into multiple issues with that. We also realized that two different SQL queries can both be "correct". For the question “who are the 10 most recent users from Toronto”, both of the following are correct in their own ways:


1-- query 1

2SELECT userid, username, created_at

3from users

4where city='Toronto'

5order by created_at DESC LIMIT 10;

67-- query 2

8SELECT userid, firstname || ' ' || lastname, created_at

9from users

10where city='Toronto'

11order by created_at DESC LIMIT 10;

With this in mind, we had to build a custom framework to evaluate query correctness. You can read more about the framework here. In addition to open-sourcing our model weights, we have also open-sourced our evaluation framework and evaluation dataset.

Results​

Defog SQLCoder outperforms all major models except GPT-4 on our evaluation framework. In particular, it outperforms gpt-3.5-turbo and text-davinci-003, which are models more than 10x its size.

sql-eval.webp


These results are for generic SQL databases, and do not reflect SQLCoder’s performance on individual database schemas. When fine-tuned on individual database schemas, SQLCoder has the same or better performance as OpenAI’s GPT-4, with lower latency (on an A100 80GB).

Future direction​

We will make the following updates to Defog in the coming weeks:
  • Training the model on more hand-curated data, along with broader questions
  • Tuning the model further with Reward Modeling and RLHF
  • Pretraining a model from scratch that specializes in data analysis (SQL + Python)

Explore the model​

You can explore our model at https://defog.ai/sqlcoder-demo

About the authors​

Rishabh Srivastava is a co-founder of Defog. Before starting Defog, he founded Loki.ai – serving more than 5 billion API requests for Asian enterprises.

Wendy Aw is a Machine Learning Engineer at Defog, working on model-finetuning and dataset curation. Before joining Defog, Wendy spent most of the last decade as a copywriter, where she helped build some of the world’s biggest brands.





 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805

MetaMath:

Bootstrap Your Own Mathematical Questions for Large Language Models​

Longhui Yu1,*, Weisen Jiang2,3,*, Han Shi4,‡, Jincheng Yu3,4, Zhengying Liu4,
Yu Zhang2, James T. Kwok3, Zhenguo Li4, Adrian Weller5,6, Weiyang Liu5,7,‡

1Peking University, 2Southern University of Science and Technology, 3Hong Kong University of Science and Technology, 4Huawei Noah's Ark Lab, 5University of Cambridge, 6The Alan Turing Institute, 7Max Planck Institute for Intelligent Systems
*Equal Contribution, ‡Corresponding author
longhuiyu98@gmail.com, shi.han@huawei.com, wl396@cam.ac.uk
🤗Models 🤗Dataset Code arXiv

Abstract​

Large language models (LLMs) have pushed the limits of natural language understanding and exhibited excellent problem-solving ability. Despite the great success, most existing open-source LLMs (e.g., LLaMA-2) are still far away from satisfactory for solving mathematical problems due to its complex reasoning procedures. To bridge this gap, we propose MetaMath, a finetuned language model that specializes in mathematical reasoning. Specifically, we start by bootstrapping mathematical questions by rewriting the question from multiple perspectives, which results in a new dataset called MetaMathQA. Then we finetune the LLaMA-2 models on MetaMathQA. Experimental results on two popular benchmarks (i.e., GSM8K and MATH) for mathematical reasoning emonstrate that MetaMath outperforms all open-source LLMs by a significant margin. Our MetaMath-7B model achieves 66.5% on GSM8K and 19.8% on MATH, exceeding the state-of-the-art models of the same size by 11.5% and 8.7%. Particularly, MetaMath-70B achieves an accuracy of 82.3% on GSM8K, slightly better than GPT-3.5-Turbo. We release the MetaMathQA dataset, the pretrained MetaMath models with different mode size and the training code for public use.

Overview​

metamath

Figure 1: Overview of the MetaMathQA data and MetaMath.​

MetaMathQA​

Answer Augmentation (AnsAug)

Generated Answer: James buys 5 packs of beef that are 4 pounds each, so he buys a total of 5 * 4 = 20 pounds of beef. The price of beef is $5.50 per pound, so he pays 20 * $5.50 = $110. The answer is: 110

Rephrasing Question

Question: What is the total amount that James paid when he purchased 5 packs of beef, each weighing 4 pounds, at a price of $5.50 per pound?

Rephrase the above question: Each pack of beef weighs 4 pounds, so 5 packs weigh 4 * 5 = 20 pounds in total. The price per pound of beef is $5.50, so the total cost for 20 pounds is 20 * $5.50 = $110. Therefore, James paid a total of $110. The answer is: 110.

Self-Verification Question

Question: James buys x packs of beef that are 4 pounds each. The price of beef is $5.50 per pound. He paid 110. What is the value of unknown variable x?

Answer: To solve this problem, we need to determine the value of x, which represents the number of packs of beef that James bought. Each pack of beef weighs 4 pounds and costs $5.50 per pound. The total amount James paid is $110. We can set up the equation as follows: Number of packs of beef * Weight per pack * Price per pound = Total amount paid; x * 4 * $5.50 = $110; Let's simplify and solve for x: 4x * $5.50 = $110; 22x = $110; To isolate x, we divide both sides of the equation by 22: 22x / 22 = $110 / 22; x = $5; The value of x is 5.

FOBAR Question

Question: James buys x packs of beef that are 4 pounds each. The price of beef is $5.50 per pound. How much did he pay? If we know the answer to the above question is 110, what is the value of unknown variable x?

Answer: James buys x packs of beef that are 4 pounds each, so he buys a total of 4x pounds of beef. The price of beef is $5.50 per pound, so the total cost of the beef is 5.50 * 4x = 22x. We are given that the total cost is $110, so we can write: 22x = 110. Dividing both sides by 22, we get: x = 5. The value of x is 5.

Comprehensive Results​

Model#paramsGSM8KMATH
Closed-source Model
GPT-4-92.042.5
GPT-3.5-Turbo-80.834.1
PaLM8B4.11.5
PaLM62B33.04.4
PaLM540B56.58.8
PaLM-2540B80.734.3
Flan-PaLM 2540B84.733.2
Minerva8B16.214.1
Minerva62B52.427.6
Minerva540B58.833.6
Open-source models (1-10B)
LLaMA-17B11.02.9
LLaMA-27B14.62.5
MPT7B6.83.0
Falcon7B6.82.3
InternLM7B31.2-
GPT-J6B34.9-
ChatGLM 26B32.4-
Qwen7B51.6-
Baichuan-27B24.55.6
SFT7B41.6-
RFT7B50.3-
WizardMath7B54.910.7
MetaMath (ours)7B66.519.8
Open-source models (11-50B)
LLaMA-113B17.83.9
LLaMA-133B35.67.1
LLaMA-213B28.73.9
LLaMA-234B42.26.2
MPT30B15.23.1
Falcon40B19.62.5
GAL30B-12.7
Vicuna13B27.6-
Baichuan-213B52.810.1
SFT13B50.0-
RFT13B54.8-
WizardMath13B63.914.0
MetaMath (ours)13B72.322.4
Open-source models (50-70B)
LLaMA-165B50.910.6
LLaMA-270B56.813.5
RFT70B64.8-
WizardMath70B81.622.7
MetaMath (ours) ‡70B82.326.6

Table 1: Comparison of testing accuracy to existing LLMs on GSM8K and MATH. ‡Due to the computing resource limitation, we finetune MetaMath-70B using QLoRA.​

BibTeX​

@misc{yu2023metamath,
title={MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models},
author={Longhui Yu and Weisen Jiang and Han Shi and Jincheng Yu and Zhengying Liu and Yu Zhang and James T. Kwok and Zhenguo Li and Adrian Weller and Weiyang Liu},
year={2023},
eprint={2309.12284},
archivePrefix={arXiv},
primaryClass={cs.CL}
}









 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805

Meta just dropped a banger:

LLaMA 2 Long.

- Continued pretraining LLaMA on long context and studied the effects of pretraining text lengths.

- Apparently having abundant long texts in the pretraing dataset is not the key to achieving strong performance.

- They also perform a large experiment session comparing different length scaling techniques.

- Surpassed gpt-3.5-turbo-16k’s on a multiple long-context tasks.

- They also study the effect of instruction tuning with RL + SFT and all combinations between the two.

The model weights are not out yet.
Hopefully Soon! 🙏





Y25vqJq.png




Meta quietly unveils Llama 2 Long AI that beats GPT-3.5 Turbo and Claude 2 on some tasks​

Carl Franzen@carlfranzen

September 29, 2023 11:00 AM

A colorful collage style digital painting of a cute black and white llama standing upright in front of a red and yellow and aqua abstract background.

Credit: VentureBeat made with Midjourney

VentureBeat presents: AI Unleashed - An exclusive executive event for enterprise data leaders. Network and learn with industry peers. Learn More



Meta Platforms showed off a bevy of new AI features for its consumer-facing services Facebook, Instagram and WhatsApp at its annual Meta Connect conference in Menlo Park, California, this week.

But the biggest news from Mark Zuckerberg’s company may have actually come in the form of a computer science paper published without fanfare by Meta researchers on the open access and non-peer reviewed website arXiv.org.


The paper introduces Llama 2 Long, a new AI model based on Meta’s open source Llama 2 released in the summer, but that has undergone “continual pretraining from Llama 2 with longer training sequences and on a dataset where long texts are upsampled,” according to the researcher-authors of the paper.

As a result of this, Meta’s newly elongated AI model outperforms some of the leading competition in generating responses to long (higher character count) user prompts, including OpenAI’s GPT-3.5 Turbo with 16,000-character context window, as well as Claude 2 with its 100,000-character context window.



How LLama 2 Long came to be​

Meta researchers took the original Llama 2 available in its different training parameter sizes — the values of data and information the algorithm can change on its own as it learns, which in the case of Llama 2 come in 7 billion, 13 billion, 34 billion, and 70 billion variants — and included more longer text data sources than the original Llama 2 training dataset. Another 400 billion tokens-worth, to be exact.

Then, the researchers kept the original Llama 2’s architecture the same, and only made a “necessary modification to the positional encoding that is crucial for the model to attend longer.”

That modification was to the Rotary Positional Embedding (RoPE) encoding, a method of programming the transformer model underlying LLMs such as Llama 2 (and LLama 2 Long), which essentially maps their token embeddings (the numbers used to represent words, concepts, and ideas) onto a 3D graph that shows their positions relative to other tokens, even when rotated. This allows a model to produce accurate and helpful responses, with less information (and thus, less computing storage taken up) than other approaches.

The Meta researchers “decreased the rotation angle” of its RoPE encoding from Llama 2 to Llama 2 Long, which enabled them to ensure more “distant tokens,” those occurring more rarely or with fewer other relationships to other pieces of information, were still included in the model’s knowledge base.

Usingreinforcement learning from human feedback (RLHF), a common AI model training method where AI is rewarded for correct answers with human oversight to check it, and synthetic data generated by Llama 2 chat itself, the researchers were able to improve its performance in common LLM tasks including coding, math, language understanding, common sense reasoning, and answering a human user’s prompted questions.

Screen-Shot-2023-09-29-at-1.54.18-PM.png
Graph of Llama 2 Long results taken from the paper “Effective Long-Context Scaling of Foundation Models,” dated September 27, 2023.

With such impressive results relative to both Llama 2 regular and Anthropic’s Claude 2 and OpenAI’s GPT-3.5 Turbo, it’s little wonder the open-source AI community on Reddit and Twitter and Hacker News have been expressing their admiration and excitement about Llama 2 since the paper’s release earlier this week — it’s a big validation of Meta’s “open source” approach toward generative AI, and indicates that open source can compete with the closed source, “pay to play” models offered by well-funded startups.



 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805

Drive Like a Human: Rethinking Autonomous Driving with Large Language Models​

Daocheng Fu, Xin Li, Licheng Wen, Min Dou, Pinlong Cai, Botian Shi, Yu Qiao
In this paper, we explore the potential of using a large language model (LLM) to understand the driving environment in a human-like manner and analyze its ability to reason, interpret, and memorize when facing complex scenarios. We argue that traditional optimization-based and modular autonomous driving (AD) systems face inherent performance limitations when dealing with long-tail corner cases. To address this problem, we propose that an ideal AD system should drive like a human, accumulating experience through continuous driving and using common sense to solve problems. To achieve this goal, we identify three key abilities necessary for an AD system: reasoning, interpretation, and memorization. We demonstrate the feasibility of employing an LLM in driving scenarios by building a closed-loop system to showcase its comprehension and environment-interaction abilities. Our extensive experiments show that the LLM exhibits the impressive ability to reason and solve long-tailed cases, providing valuable insights for the development of human-like autonomous driving. The related code are available at this https URL .
Subjects:Robotics (cs.RO); Computation and Language (cs.CL)
Cite as:arXiv:2307.07162 [cs.RO]
(or arXiv:2307.07162v1 [cs.RO] for this version)
[2307.07162] Drive Like a Human: Rethinking Autonomous Driving with Large Language Models

Focus to learn more

Submission history​

From: Licheng Wen [view email]

[v1] Fri, 14 Jul 2023 05:18:34 UTC (2,012 KB)



 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805

RealFill​

Reference-Driven Generation for Authentic Image Completion​

Luming Tang1,2, Nataniel Ruiz1, Qinghao Chu1, Yuanzhen Li1, Aleksander Holynski1, David E. Jacobs1,
Bharath Hariharan2, Yael Pritch1, Neal Wadhwa1, Kfir Aberman1, Michael Rubinstein1

1Google Research, 2Cornell University
arXiv


RealFill is able to complete the image with what should have been there.​

Abstract​

Recent advances in generative imagery have brought forth outpainting and inpainting models that can produce high-quality, plausible image content in unknown regions, but the content these models hallucinate is necessarily inauthentic, since the models lack sufficient context about the true scene. In this work, we propose RealFill, a novel generative approach for image completion that fills in missing regions of an image with the content that should have been there. RealFill is a generative inpainting model that is personalized using only a few reference images of a scene. These reference images do not have to be aligned with the target image, and can be taken with drastically varying viewpoints, lighting conditions, camera apertures, or image styles. Once personalized, RealFill is able to complete a target image with visually compelling contents that are faithful to the original scene. We evaluate RealFill on a new image completion benchmark that covers a set of diverse and challenging scenarios, and find that it outperforms existing approaches by a large margin.

Method​

Authentic Image Completion: Given a few reference images (up to five) and one target image that captures roughly the same scene (but in a different arrangement or appearance), we aim to fill missing regions of the target image with high-quality image content that is faithful to the originally captured scene. Note that for the sake of practical benefit, we focus particularly on the more challenging, unconstrained setting in which the target and reference images may have very different viewpoints, environmental conditions, camera apertures, image styles, or even moving objects.

RealFill: For a given scene, we first create a personalized generative model by fine-tuning a pre-trained inpainting diffusion model on the reference and target images. This fine-tuning process is designed such that the adapted model not only maintains a good image prior, but also learns the contents, lighting, and style of the scene in the input images. We then use this fine-tuned model to fill the missing regions in the target image through a standard diffusion sampling process.

Results​

Given the reference images on the left, RealFill is able to either uncrop or inpaint the target image on the right, resulting in high-quality images that are both visually compelling and also faithful to the references, even when there are large differences between references and targets including viewpoint, aperture, lighting, image style, and object motion.

Comparison with Baselines​

A comparison of RealFill and baseline methods. Transparent white masks are overlayed on the unaltered known regions of the target images.
  • Paint-by-Example does not achieve high scene fidelity because it relies on CLIP embeddings, which only capture high-level semantic information.
  • Stable Diffusion Inpainting produces plausible results, they are inconsistent with the reference images because prompts have limited expressiveness.

In contrast, RealFill generates high-quality results that have high fidelity with respect to the reference images.


comparison_compressed.jpeg

Limitations​

  • RealFill needs to go through a gradient-based fine-tuning process on input images, rendering it relatively slow.
  • When viewpoint change between reference and target images is very large, RealFill tends to fail at recovering the 3D scene, especially when there is only a single reference image.
  • Because RealFill mainly relies on the image prior inherited from the base pre-trained model, it also fails to handle cases where that are challenging for the base model, such as text for Stable Diffusion.

failure_compressed.jpeg

Acknowledgements​

We would like to thank Rundi Wu, Qianqian Wang, Viraj Shah, Ethan Weber, Zhengqi Li, Kyle Genova, Boyang Deng, Maya Goldenberg, Noah Snavely, Ben Poole, Ben Mildenhall, Alex Rav-Acha, Pratul Srinivasan, Dor Verbin and Jon Barron for their valuable discussion and feedbacks, and thank Zeya Peng, Rundi Wu, Shan Nan for their contribution to the evaluation dataset. A special thanks to Jason Baldridge, Kihyuk Sohn, Kathy Meier-Hellstern, and Nicole Brichtova for their feedback and support for the project.



 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805

Fake News Detectors are Biased against Texts Generated by Large Language Models​

Jinyan Su, Terry Yue Zhuo, Jonibek Mansurov, Di Wang, Preslav Nakov
The spread of fake news has emerged as a critical challenge, undermining trust and posing threats to society. In the era of Large Language Models (LLMs), the capability to generate believable fake content has intensified these concerns. In this study, we present a novel paradigm to evaluate fake news detectors in scenarios involving both human-written and LLM-generated misinformation. Intriguingly, our findings reveal a significant bias in many existing detectors: they are more prone to flagging LLM-generated content as fake news while often misclassifying human-written fake news as genuine. This unexpected bias appears to arise from distinct linguistic patterns inherent to LLM outputs. To address this, we introduce a mitigation strategy that leverages adversarial training with LLM-paraphrased genuine news. The resulting model yielded marked improvements in detection accuracy for both human and LLM-generated news. To further catalyze research in this domain, we release two comprehensive datasets, \texttt{GossipCop++} and \texttt{PolitiFact++}, thus amalgamating human-validated articles with LLM-generated fake and real news.
Comments:The first two authors contributed equally
Subjects:Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:arXiv:2309.08674 [cs.CL]
(or arXiv:2309.08674v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2309.08674
Focus to learn more

Submission history​

From: Terry Yue Zhuo [view email]
[v1] Fri, 15 Sep 2023 18:04:40 UTC (9,845 KB)

 
Top