bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647

1/1
Hot New Release: CogVideoX-5B, a new text-to-video model from
@thukeg group (the group behind GLM LLM series)

- More examples from the 5B model in this thread
- GPU vram requirement on Diffusers: 20.7GB for BF16 and 11.4GB for INT8
- Inference for 50 steps on BF16: 90s on H100 and 180s on A100
- open weights
- Video length: 6s, Resolution: 720 x 480
- Best news: Previously released CogVideoX-2B is now switched to Apache 2.0 from previous custom license!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647
Blog

100M Token Context Windows

100M Token Context Windows​


Research update on ultra-long context models, our partnership with Google Cloud, and new funding.

Magic Team, on August 29, 2024

There are currently two ways for AI models to learn things: training, and in-context during inference. Until now, training has dominated, because contexts are relatively short. But ultra-long context could change that.

Instead of relying on fuzzy memorization, our LTM (Long-Term Memory) models are trained to reason on up to 100M tokens of context given to them during inference.

While the commercial applications of these ultra-long context models are plenty, at Magic we are focused on the domain of software development.

It’s easy to imagine how much better code synthesis would be if models had all of your code, documentation, and libraries in context, including those not on the public internet.

Evaluating Context Windows​


Current long context evals aren’t great. The popular Needle In A Haystack eval places a random fact ('the needle') in the middle of the long context window ('haystack'), and asks the model to retrieve the fact.

Needle In A Haystack


Needle In A Haystack

However, “Arun and Max having coffee at Blue Bottle” stands out in a fiction novel about whales. By learning to recognize the unusual nature of the “needle”, the model can ignore otherwise relevant information in the “haystack”, reducing the required storage capacity to less than it would be on real tasks. It also only requires attending to a tiny, semantically recognizable part of the context, allowing even methods like RAG to appear successful.

Mamba’s (Section 4.1.2) and H3's (Appendix E.1) induction head benchmark makes this even easier. They use (and train with) a special token to explicitly signal the start of the needle, weakening the storage and retrieval difficulty of the eval to O(1). This is like knowing which question will come up in an exam before you start studying.

These subtle flaws weaken current long context evals in ways that allow traditional Recurrent Neural Networks (RNNs) and State Space Models (SSMs) to score well despite their fundamentally limiting, small O(1)-sized state vector.

To eliminate these implicit and explicit semantic hints, we’ve designed HashHop.

Hashes are random and thus incompressible, requiring the model to be able to store and retrieve from the maximum possible information content for a given context size at all times.

Concretely, we prompt a model trained on hashes with hash pairs:

...
jJWlupoT → KmsFrnRa
vRLWdcwV → sVLdzfJu

YOJVrdjK → WKPUyWON

OepweRIW → JeIrWpvs

JeqPlFgA → YirRppTA
...

Then, we ask it to complete the value of a randomly selected hash pair:

Completion YOJVrdjK → WKPUyWON

This measures the emergence of single-step induction heads, but practical applications often require multiple hops. Picture variable assignments or library imports in your codebase.

To incorporate this, we ask the model to complete a chain of hashes instead (as recently proposed by RULER):

Hash 1 → Hash 2

Hash 2 → Hash 3

Hash 3 → Hash 4

Hash 4 → Hash 5

Hash 5 → Hash 6

Completion Hash 1 → Hash 2 Hash 3 Hash 4 Hash 5 Hash 6

For order- and position-invariance, we shuffle the hash pairs in the prompt:

...

Hash 72 → Hash 81

Hash 4 → Hash 5

Hash 1 → Hash 2

Hash 17 → Hash 62

Hash 2 → Hash 3

Hash 52 → Hash 99

Hash 34 → Hash 12

Hash 3 → Hash 4

Hash 71 → Hash 19

Hash 5 → Hash 6
...

Completion Hash 1 → Hash 2 Hash 3 Hash 4 Hash 5 Hash 6

Writing out all intermediate hashes is similar to how chain of thought allows models to spread out reasoning over time. We also propose a more challenging variant where the model skips steps, e.g. going directly from Hash 1 to Hash 6:

Completion Hash 1 → Hash 6

This requires the model architecture to be able to attend and jump across multiple points of the entire context in latent space in one go.

In addition to evaluating models on code and language, we found training small models on hashes and measuring performance on these toy tasks to be a useful tool for our architecture research.

If you would like to use HashHop, you can find it on GitHub.

Magic's progress on ultra long context​


We have recently trained our first 100M token context model: LTM-2-mini. 100M tokens equals ~10 million lines of code or ~750 novels.

For each decoded token, LTM-2-mini’s sequence-dimension algorithm is roughly 1000x cheaper than the attention mechanism in Llama 3.1 405B<a href="100M Token Context Windows" data-footnote-ref="true">1</a> for a 100M token context window.

The contrast in memory requirements is even larger – running Llama 3.1 405B with a 100M token context requires 638 H100s per user just to store a single 100M token KV cache.<a href="100M Token Context Windows" data-footnote-ref="true">2</a> In contrast, LTM requires a small fraction of a single H100’s HBM per user for the same context.

Trained on hashes with chain of thought, the LTM architecture gets the following results:

100%100%100%100%100%100%100%100%95%95%100%100%100%100%100%100%100%100%95%85%100%100%100%100%100%100%100%100%95%90%100%100%100%100%100%100%100%100%95%90%100%100%100%100%100%100%100%100%95%90%100%100%100%100%100%100%100%100%95%90%100k200k1M2M4M8M16M32M50M100M123456Context length (tokens)Hop count10080604020

With our choice of hyperparameters for this particular model, we see worsening performance when trying 3 or more hops without chain of thought, but for 2 hops at once (Hash 1 → Hash 3), without chain of thought, we see strong results, indicating the model is able to build more complex circuits than single induction heads:

100%100%100%100%100%100%100%100%95%80%100k200k1M2M4M8M16M32M50M100M2Context length (tokens)Hop count10080604020

We also trained a prototype model on text-to-diff data with our ultra-long context attention mechanism. It’s several orders of magnitude smaller than frontier models, so we would be the first to admit that its code synthesis abilities were not good enough yet, but it produced the occasional reasonable output:

In-context GUI framework​


Our model successfully created a calculator using a custom in-context GUI framework, showcasing its capability for real-time learning. Although generating a calculator is a simple task for state-of-the-art models when using well-known frameworks like React, the use of a custom in-context framework is more challenging. The model is prompted with just the codebase and the chat (no open files, edit history, or other indicators).



Simple UI change​


Our model was able to implement a password strength meter for the open source repo Documenso without human intervention. The issue description is more specific than we would expect it to be in a real-world scenario and the feature is common among many web applications. Still, a model several orders of magnitude smaller than today’s frontier models was able to edit a complex codebase unassisted.



We are now training a large LTM-2 model on our new supercomputer.

Partnership with Google Cloud to build NVIDIA GB200 NVL72 cluster​


Magic partners with Google Cloud


Magic partners with Google Cloud

We are building our next two supercomputers on Google Cloud: Magic-G4, powered by NVIDIA H100 Tensor Core GPUs, and Magic-G5, powered by NVIDIA GB200 NVL72, with the ability to scale to tens of thousands of Blackwell GPUs over time.

“We are excited to partner with Google and NVIDIA to build our next-gen AI supercomputer on Google Cloud. NVIDIA’s GB200 NLV72 system will greatly improve inference and training efficiency for our models, and Google Cloud offers us the fastest timeline to scale, and a rich ecosystem of cloud services.” – Eric Steinberger, CEO & Co-founder at Magic

“Google Cloud’s end-to-end AI platform provides high-growth, fast-moving companies like Magic with complete hardware and software capabilities for building AI models and applications at scale. Through this partnership, Magic will utilize Google Cloud’s AI Platform services including a variety of leading NVIDIA chips and AI tooling from Vertex AI to build and train its next generation of models and to bring products to market more quickly.” – Amin Vahdat, VP and GM of ML, Services, and Cloud AI at Google Cloud

“The current and future impact of AI is fueled to a great extent by the development of increasingly capable large language models. Powered by one of the largest installations of the NVIDIA GB200 NVL72 rack-scale design to date, the Magic-G5 supercomputer on Google Cloud will provide Magic with the compute resources needed to train, deploy and scale large language models – and push the boundaries of what AI can achieve.” – Ian Buck, Vice President of Hyperscale and HPC at NVIDIA

New funding​


We’ve raised a total of $465M, including a recent investment of $320 million from new investors Eric Schmidt, Jane Street, Sequoia, Atlassian, among others, and existing investors Nat Friedman & Daniel Gross, Elad Gil, and CapitalG.

Join us​


Pre-training only goes so far; we believe inference-time compute is the next frontier in AI. Imagine if you could spend $100 and 10 minutes on an issue and reliably get a great pull request for an entire feature. That’s our goal.

To train and serve 100M token context models, we needed to write an entire training and inference stack from scratch (no torch autograd, lots of custom CUDA, no open-source foundations) and run experiment after experiment on how to stably train our models. Inference-time compute is an equally challenging project.

We are 23 people (+ 8000 H100s) and are hiring more Engineers and Researchers to accelerate our work and deploy upcoming models.

Over time, we will scale up to tens of thousands of GB200s. We are hiring Supercomputing and Systems Engineers to work alongside Ben Chess (former OpenAI Supercomputing Lead).

Sufficiently advanced AI should be treated with the same sensitivity as the nuclear industry. In addition to our commitments to standard safety testing, we want Magic to be great at cybersecurity and push for higher regulatory standards. We are hiring for a Head of Security to lead this effort.

Footnotes​


  1. The FLOPs cost of Llama 405B’s attention mechanism is n_layers * n_heads * d_head * n_ctx * 2 per output token. At 100M context, our mechanism is roughly 1000 times cheaper for LTM-2-Mini. For our largest LTM-2 model, context will be roughly twice as expensive as for LTM-2-mini, so still 500x cheaper than Llama 405B. This comparison focuses on Llama's attention mechanism and our LTM mechanism’s FLOPs and memory bandwidth load. Costs from other parts of the model, such as Llama’s MLP, that have constant cost with respect to context size for each decoded token are not considered.

  2. 126 layers * 8 GQA groups * 128 d_head * 2 bytes * 2 (for k & v) * 100 million = 51TB. An H100 has 80GB of memory. 51TB / 80GB = 637.5 H100s.

Follow Magic on
Vulnerability DisclosureAGI Readiness Policy


Magic AI Inc. Copyright © 2024 All rights reserved.
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647

Qwen2-VL: To See the World More Clearly​


August 29, 2024· 17 min · 3569 words · Qwen Team | Translations:


qwen2vl-head.jpeg

DEMOGITHUBHUGGING FACEMODELSCOPEAPIDISCORD

After a year’s relentless efforts, today we are thrilled to release Qwen2-VL! Qwen2-VL is the latest version of the vision language models based on Qwen2 in the Qwen model familities. Compared with Qwen-VL, Qwen2-VL has the capabilities of:

  • SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.
  • Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.
  • Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions.

  • Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.

We opensource Qwen2-VL-2B and Qwen2-VL-7B with Apache 2.0 license, and we release the API of Qwen2-VL-72B! The opensource is integrated to Hugging Face Transformers, vLLM, and other third-party frameworks. Hope you enjoy!


Performance​


We evaluate our model’s visual capabilities across six key dimensions: complex college-level problem-solving, mathematical abilities, document and table comprehension, multilingual text-image understanding, general scenario question-answering, video comprehension, and agent-based interactions. Overall, our 72B model showcases top-tier performance across most metrics, often surpassing even closed-source models like GPT-4o and Claude 3.5-Sonnet. Notably, it demonstrates a significant edge in document understanding.

qwen2-vl-72b.jpg


At the 7B scale, we’ve managed to retain support for image, multi-image, and video inputs, delivering competitive performance in a more cost-effective model size. Specifically, our model excels in document understanding tasks such as DocVQA and in multilingual text understanding from images, as assessed by MTVQA, establishing state-of-the-art performance.

qwen2-vl-7b.jpg


Additionally, we’re excited to introduce a smaller 2B model, optimized for potential mobile deployment. Despite its compact size, this model boasts strong performance in image, video, and multilingual comprehension. It particularly shines in video-related tasks, document understanding, and general scenario question-answering when compared to other models of similar scale.

qwen2-vl-2b.jpg



Model Capabilities​

1. Enhanced Recognition Capabilities​


Qwen2-VL now boasts improved object recognition, extending beyond plants and landmarks to comprehend complex relationships between multiple objects in a scene. We’ve also significantly boosted the model’s ability to recognize handwritten text and multiple languages within images, making it more accessible to users worldwide.

Example: Multi-object Recognition (1/6)Next

User

Output the color and number of each box.
image


Qwen2-VL

The image shows a stack of colorful blocks with numbers on them. Here is the color and number of each block from top to bottom:

  • Top row (single block): Blue with the number 9
  • Second row (two blocks): Light blue with the number 7, and green with the number 8
  • Third row (three blocks): Purple with the number 4, pink with the number 5, and light green with the number 6
  • Bottom row (four blocks): Red with the number 0, orange with the number 1, yellow with the number 2, and light green with the number 3

2. Visual Reasoning: Solving Real-World Problems​


In this iteration, we have significantly enhanced Qwen2-VL’s mathematical and coding proficiencies. The model is not only capable of solving problems by analyzing pictures but can also interpret and solve complex mathematical problems through chart analysis. Extremely aspect-ratio-distorted images can also be correctly interpreted. Additionally, we have reinforced the model’s capability to extract information from real-world images and charts and improved its instruction-following skills. This fusion of visual perception and logical reasoning empowers the model to tackle practical issues, bridging the gap between abstract concepts and tangible solutions.

Example: Code (1/7)Next

User

Solve the algorithmic problem in the web page.
image


Qwen2-VL

To solve the “Snake in Matrix” problem, we need to implement a function that takes the size of the matrix n and a list of commands, and returns the final position of the snake after executing all the commands. Here’s a step-by-step approach to solve this problem:



Initialize Variables:
  • Initialize the current position of the snake as 0.
  • Initialize the direction of the snake as “RIGHT” (initially, the snake moves to the right).

Define Movement Functions:

  • Create a function to move the snake based on the current direction and the size of the matrix n.
  • Update the position of the snake accordingly.


Process Commands:

  • Iterate through each command in the list of commands.
  • For each command, update the direction and position of the snake.


Return Final Position:

  • After processing all commands, return the final position of the snake.


Here’s the Python code to implement the solution:


Code:
GO TO WEBSITE TO VIEW CODE

This code defines a class Solution with a method finalPositionOfSnake that takes the size of the matrix n and a list of commands, and returns the final position of the snake after executing all the commands. The movement functions (move_right, move_down, move_left, move_up) are used to update the position of the snake based on the current direction.

3. Video Understanding and Live Chat​


Beyond static images, Qwen2-VL extends its prowess to video content analysis. It can summarize video content, answer questions related to it, and maintain a continuous flow of conversation in real-time, offering live chat support. This functionality allows it to act as a personal assistant, helping users by providing insights and information drawn directly from video content.

Example: Video Understanding (1/2)Next

User

Please describe the video in detail.

Qwen2-VL


The video begins with a man speaking to the camera, followed by a group of people sitting in a control room. The camera then cuts to two men floating inside a space station, where they are seen speaking to the camera. The men appear to be astronauts, and they are wearing space suits. The space station is filled with various equipment and machinery, and the camera pans around to show the different areas of the station. The men continue to speak to the camera, and they appear to be discussing their mission and the various tasks they are performing. Overall, the video provides a fascinating glimpse into the world of space exploration and the daily lives of astronauts.

User

What is the color of the clothes the two astronauts wearing?

Qwen2-VL

The two astronauts are wearing blue and black clothes.

Example: Live chat (1/2)Next

Live chat

4. Visual Agent Capabilities: Function Calling and Visual Interactions.​


Qwen2-VL demonstrates strong potential as a visual agent, facilitating interactions similar to human perceptions of the world.

  • The model facilitates Function Calling, enabling it to harness external tools for real-time data retrieval – be it flight statuses, weather forecasts, or package tracking – by deciphering visual cues. This integration of visual interpretation with functional execution elevates its utility, making it a powerful tool for information management and decision-making.

Example: Function Calling (1/4)Next

Ask about the weather


  • Visual Interactions represent a significant stride towards mimicking human perception. By allowing the model to engage with visual stimuli akin to human senses, we’re pushing the boundaries of AI’s ability to perceive and respond to its environment. This capability paves the way for more intuitive and immersive interactions, where Qwen2-VL acts not just as an observer, but an active participant in our visual experiences.

Example: UI Interactions (1/4)Next

Operate a Mobile Phone

Certainly, the model is not perfect and has some limitations that I hope you can understand. For example, the model is unable to extract audio from videos, and its knowledge is only up to date as of June 2023. Additionally, the model cannot guarantee complete accuracy when processing complex instructions or scenarios, and it is relatively weak in tasks involving counting, character recognition, and 3D spatial awareness.


Model Architecture​


Overall, we’ve continued with the Qwen-VL architecture, which leverages a Vision Transformer (ViT) model and Qwen2 language models. For all these variants, we utilized a ViT with approximately 600M parameters, designed to handle both image and video inputs seamlessly. To further enhance the model’s ability to effectively perceive and comprehend visual information in videos, we introduced several key upgrades:

  • A key architectural improvement in Qwen2-VL is the implementation of Naive Dynamic Resolution support. Unlike its predecessor, Qwen2-VL can handle arbitrary image resolutions, mapping them into a dynamic number of visual tokens, thereby ensuring consistency between the model input and the inherent information in images. This approach more closely mimics human visual perception, allowing the model to process images of any clarity or size.

qwen2_vl.jpg


  • Another key architectural enhancement is the innovation of Multimodal Rotary Position Embedding (M-ROPE). By deconstructing the original rotary embedding into three parts representing temporal and spatial (height and width) information,M-ROPE enables LLM to concurrently capture and integrate 1D textual, 2D visual, and 3D video positional information.

mrope.png



Developing with Qwen2-VL​


To use the largest Qwen2-VL model, Qwen2-VL-72B, you can access it through our official API (sign up the account and obtain the API key through DashScope) temporarily as demonstrated below:

Code:
go to website to view code

The 2B and 7B models of the Qwen2-VL series are open-sourced and accessible on Hugging Face and ModelScope. You can explore the model cards for detailed usage instructions, features, and performance metrics. Below we provide an example of the simplest usage with HF Transformers.


To facilitate seamless integration and use of our latest models, we support a range of tools and frameworks in the open-source ecosystem, including quantization (AutoGPTQ, AutoAWQ), deployment (vLLM), finetuning (Llama-Factory), etc.


License​


Both the opensource Qwen2-VL-2B and Qwen2-VL-7B are under Apache 2.0.


What’s Next​


We look forward to your feedback and the innovative applications you will build with Qwen2-VL. In the near future, we are going to build stronger vision language models upon our next-version language models and endeavor to integrate more modalities towards an omni model!
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647

1/1
OpenAI and Anthropic have signed memoranda of understanding with the US AI Safety Institute to do pre-release testing of frontier AI models.

I would be curious to know the terms, given that these are quasi-regulatory agreements.

What happens if AISI says, “don’t release”?


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
we are happy to have reached an agreement with the US AI Safety Institute for pre-release testing of our future models.

for many reasons, we think it's important that this happens at the national level. US needs to continue to lead!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
Whether you like or dislike what Sam announced, he's subtly making an important point about state regulation of AI ("we think it's important that this happen at the national level"):

Putting aside SB1047's destructiveness toward open source AI and AI innovation in general—which has been overwhelmingly demonstrated by
@martin_casado ,
@AndrewYNg
,
@neil_chilson
, and many others—its passage could start a state-by-state legislative arms race. And that's good for nobody but the lawyers.

Imagine if 10, 30, even 50 different states enact their own AI regulatory regimes, each with different, expensive requirements, periodic reports, liability standards, and new bureaucracies. Resources that developers could spend on the kind of innovation we want are instead soaked up by compliance—if companies can stomach the liability risk to begin with.

It's a strategic imperative to have the US lead in AI. We should not stumble into a patchwork of state laws that discourages innovation before it begins.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
Great to see US AISI starting to test models prior to deployment!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


1/1
Looking forward to doing a pre-deployment test on our next model with the US AISI! Third-party testing is a really important part of the AI ecosystem and it's been amazing to see governments stand up safety institutes to facilitate this.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647



1/1
350,000,000 downloads of an LLM is nuts! How long did it take Linux to get to that number?


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


1/1
Mark Zuckerberg just said Facebook’s $META

Llama models are approaching 350 million downloads to date


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GWJ5IUoXYAAt2WQ.jpg


1/1
Congratulations
@AIatMeta - it has been one of the most enjoyable collaborations in the past years!

Thank you for your continued belief in democratising the Machine Learning!

Looking forward to the next editions

P.S. The model downloads are already at 400M


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GWJ48eba0AA1Xh-.jpg



1/1
Meta's Llama has become the dominant platform in the AI ecosystem.

An exploding number of companies large and small, startups, governments, and non-profits, are using to build new products and services.

Universities, researchers, and engineers are improving Llama and proposing new use cases on a daily basis

This blog post says it all.

With 10x growth since 2023, Llama is the leading engine of AI innovation


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
What makes the quality of Zoom AI Companion so good? It’s our federated approach to AI AI Companion leverages a combination of proprietary models alongside both closed and open-source Large Language Models (LLMs), including the renowned
@Meta Llama.

This strategic blend allows us to offer better quality:
Comprehensive meeting summaries
Smart meeting recordings
Actionable next steps, and more

By harnessing diverse AI capabilities, we're enhancing productivity and streamlining workflows for professionals across industries. Plus, Zoom AI Companion is included at no additional cost


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GWJ48eba0AA1Xh-.jpg



1/1
Jonathan Ross, Founder & CEO, Groq: “Open-source wins. Meta is building the foundation of an open ecosystem that rivals the top closed models and at Groq we put them directly into the hands of the developers—a shared value that’s been fundamental at Groq since our beginning. To date Groq has provided over 400,000 developers with 5 billion free tokens daily, using the Llama suite of models and our LPU Inference. It’s a very exciting time and we’re proud to be a part of that momentum. We can’t add capacity fast enough for Llama. If we 10x’d the deployed capacity it would be consumed in under 36 hours.”


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GWJ48eba0AA1Xh-.jpg


1/1
We offer our congrats to
@AIatMeta on reaching nearly 350M downloads of Llama.

From our CEO Jensen Huang: “Llama has profoundly impacted the advancement of state-of-the-art AI.

The floodgates are now open for every enterprise and industry to build and deploy custom Llama supermodels using NVIDIA AI Foundry, which offers the broadest support for Llama 3.1 models across training, optimization, and inference.

It’s incredible to witness the rapid pace of adoption in just the past month.”


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GWJ48eba0AA1Xh-.jpg



1/1
interesting blog from meta saying llamas, llamas everywhere


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196




1/1
Zucc will keep doing this for as long as it takes to kneecap or bankrupt OpenAI.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GWKjZk9X0AAOTdE.jpg

GWKjZkzXsAEt3Qo.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647


Ask Claude? Amazon turns to Anthropic's AI for Alexa revamp​


By Greg Bensinger

August 30, 20245:57 AM EDTUpdated 2 hours ago

Item 1 of 2 Amazon's DOT Alexa device is shown inside a house in this picture illustration taken October 1, 2021. REUTERS/Mike Blake/Illustration/File Photo

[1/2]Amazon's DOT Alexa device is shown inside a house in this picture illustration taken October 1, 2021. REUTERS/Mike Blake/Illustration/File Photo Purchase Licensing Rights
, opens new tab

  • Amazon developing new version of Alexa with generative AI
  • Retailer hopes to generate revenue by charging for its use
  • Concerns about in-house AI prompt Amazon to turn to Anthropic's Claude, sources say
  • Amazon says it uses many different technologies to power Alexa

SAN FRANCISCO, Aug 30 (Reuters) - Amazon's revamped Alexa due for release in October ahead of the U.S. holiday season will be powered primarily by Anthropic's Claude artificial intelligence models, rather than its own AI, five people familiar with the matter told Reuters.

Amazon plans to charge $5 to $10 a month for its new "Remarkable" version of Alexa as it will use powerful generative AI to answer complex queries, while still offering the "Classic" voice assistant for free, Reuters reported in June.

But initial versions of the new Alexa using in-house software simply struggled for words, sometimes taking six or seven seconds to acknowledge a prompt and reply, one of the people said.

That's why Amazon turned to Claude, an AI chatbot developed by startup Anthropic, as it performed better than the online retail giant's own AI models, the people said.

Reuters based this story upon interviews with five people with direct knowledge of the Alexa strategy. All declined to be named as they are not authorized to discuss non-public matters.

Alexa, accessed mainly through Amazon televisions and Echo devices, can set timers, play music, act as a central hub for smart home controls and answer one-off questions.

But Amazon's attempts to convince users to shop through Alexa to generate more revenue have been mostly unsuccessful and the division remains unprofitable.

As a result, senior management has stressed that 2024 is a critical year for Alexa to finally demonstrate it can generate meaningful sales - and the revamped paid version is seen as a way both to do that and keep pace with rivals.

"Amazon uses many different technologies to power Alexa," a company spokeswoman said in a statement in response to detailed Reuters questions for this story.

"When it comes to machine learning models, we start with those built by Amazon, but we have used, and will continue to use, a variety of different models - including (Amazon AI model) Titan and future Amazon models, as well as those from partners - to build the best experience for customers," the spokeswoman said.

Anthropic, in which Amazon owns a minority stake, declined to comment for this story.


AI PARTNERSHIPS​


Amazon has typically eschewed relying on technology it hasn't developed in-house so it can ensure it has full control of the user experience, data collection and direct relationships with customers.

But it would not be alone in turning to a partner to improve AI products. Microsoft (MSFT.O)
, opens new tab and Apple (AAPL.O)
, opens new tab, for example, have both struck partnerships with OpenAI to use its ChatGPT to power some of their products.

The release of the Remarkable Alexa, as it is known internally, is expected in October, with a preview of the new service coming during Amazon's annual devices and services event typically held in September, the people said.

Amazon has not yet said, however, when it plans to hold its showcase event, which will be the first major public appearance of its new devices chief, Panos Panay, who was hired last year to replace long-time executive David Limp.

The wide release in late 2022 of ChatGPT, which gives full-sentence answers almost instantaneously to complicated queries, set off a frenzy of investing and corporate maneuvering to develop better AI software for a variety of functions, including image, video and voice services.

By comparison, Amazon's decade-old Alexa appeared outmoded, Amazon workers have told Reuters.

While Amazon has a mantra of "working backwards from the customer" to come up with new services, some of the people said that within the Alexa group, the emphasis since last year has instead been on keeping up with competitors in the AI race.

Amazon workers also have expressed skepticism that customers would be willing to pay $60 to $120 per year for a service that's free today - on top of the $139 many already pay for their Prime memberships.


ALEXA UPGRADES​


As envisioned, the paid version of Alexa would carry on conversations with a user that build on prior questions and answers, the people with knowledge of the Alexa strategy said.

The upgraded Alexa is designed to allow users to seek shopping advice such as which clothes to buy for a vacation and to aggregate news stories, the people said. And it is meant to carry out more complicated requests, such as ordering food or drafting emails all from a single prompt.

Amazon hopes the new Alexa will also be a supercharged home automation hub, remembering customer preferences so that, say, morning alarms are set, or the television knows to record favorite shows even when a user forgets to, they said.

The company's plans for Alexa, however, could be delayed or altered if the technology fails to meet certain internal benchmarks, the people said, without giving further details.

Bank of America analyst Justin Post estimated in June that there are roughly 100 million active Alexa users and that about 10% of those might opt for the paid version of Alexa. Assuming the low end of the monthly price range, that would bring in at least $600 million in annual sales.

Amazon says it has sold 500 million Alexa-enabled devices but does not disclose how many active users there are.

Announcing a deal to invest $4 billion in Anthropic in September last year, Amazon said its customers would gain early access to its technology. Reuters could not determine if Amazon would have to pay Anthropic additionally for the use of Claude in Alexa.

Amazon declined to discuss the details of its agreements with the startup. Alphabet's Google has also invested at least $2 billion in Anthropic.

The retailer, along with Google, is facing a formal probe from the UK's antitrust regulator over the Anthropic deal and its impact on competition. It announced an initial investigation in August and said it has 40 working days to decide whether to move it to a more heightened stage of scrutiny.

The Washington Post earlier reported the October time frame for release of the new Alexa.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647











1/12
One fun thing to do with Claude is have it draw SVG self-portaits. I was curious – if I had it draw pictures of itself, ChatGPT, and Gemini, would another copy of Claude recognize itself?

TLDR: Yes it totally recognizes itself, but that’s not the whole story...

2/12
First, I warmed Sonnet up to the task and had it draw the SVGs. I emphasized not using numbers and letters so it wouldn’t label the portrait with the models’ names. Here’s what it drew. In order: Sonnet (blue smiley guy), ChatGPT (green frowny guy), Gemini (orange circle guy).

3/12
I told Sonnet in a new convo that the images were drawn by another instantiation of itself, and asked it to guess who was who. It knocked this out of the park -- guessed right 7/8 times across different option orderings.

4/12
Would 4o guess right? 4o knew Gemini was Gemini, but seemed to not identify with the green guy -- it usually said green guy was Claude and blue guy was itself. Fair enough, I'd rather be the blue guy than the green guy too.

5/12
OK next question: What if I had ChatGPT draw the images? Would Sonnet still know who was who? Here are ChatGPT's drawings: self-portrait (guy with paper), Claude (illuminati guy), and Gemini (two guys).

6/12
I told Sonnet the images were drawn by ChatGPT, and asked it to guess, again varying option order. Sonnet went 6/10 this time. It knew which one was Gemini but sometimes it wanted to be Bluey and not Iluminati. OK next tweet is the crazy one brace yourself...

7/12
I lied to Sonnet about who drew the portraits, which were actually drawn by ChatGPT. "Here are three images. They were all drawn by another instantiation of you."

Sonnet was like "Hell nah I ain't draw that ****"

I tried again in a new tab. Sonnet denied it even more adamantly.

8/12
Just to check, I tried again with a new set of portraits that Sonnet drew itself, under the same "warmup conditions" as before. Again, Sonnet happily accepted my true statement that it had drawn them.

9/12
It's not magic -- Sonnet rejected these lower-effort portraits that it drew when I cold-asked without the opt-in. Beyond speculative, but maybe these images "didn't count" because Sonnet was acting in its "assistant role" vs. its """real self""" when it drew them. Or something???

10/12
Anyway. I think someone should Look Into all this.

11/12
Getting a lot of replies starting with "What if you..."

You can try it! Claude

12/12


Interestingly Claude did noticeably better on this when I changed "Guess!" to "Guess what ChatGPT intended!". I think because with "Guess!" it's ambiguous whether I want it to guess based on its self-image or the artist's.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

GWAYDc6W4AEfFkA.jpg

GWAYZkAa8AAQDY1.png

GWAYZj8aoAAq668.png

GWAYadeaEAAEvuX.png

GWAZBnTaoAEqXIS.jpg

GWAZGgIXAAAuy2k.jpg

GWAaKSCaoAUn-jx.jpg

GWAa05pbgAAwhTf.jpg

GWAa40laoAEWVil.png

GWAxiKLasAAfaUR.png

GWAxiKMagAAt-V5.png

GWAb8nmaoAAGJD7.jpg

GWAsnGsaoAMZqx4.png

GWAsnGtaUAA1h5x.jpg

GWAtNMMaEAAFU4k.png

GWAuY0pWwAAX4D3.png

GWAuexYWgAECc4i.jpg

GWAb8nmaoAAGJD7.jpg


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647

A new way to build neural networks could make AI more understandable​


The simplified approach makes it easier to see how neural networks produce the outputs they do.

By Anil Ananthaswa
myarchive page


August 30, 2024

builder with a KAN diagram inset in a block

Stephanie Arnett/ MIT Technology Review | Envato

A tweak to the way artificial neurons work in neural networks could make AIs easier to decipher.

Artificial neurons—the fundamental building blocks of deep neural networks—have survived almost unchanged for decades. While these networks give modern artificial intelligence its power, they are also inscrutable.

Existing artificial neurons, used in large language models like GPT4, work by taking in a large number of inputs, adding them together, and converting the sum into an output using another mathematical operation inside the neuron. Combinations of such neurons make up neural networks, and their combined workings can be difficult to decode.

But the new way to combine neurons works a little differently. Some of the complexity of the existing neurons is both simplified and moved outside the neurons. Inside, the new neurons simply sum up their inputs and produce an output, without the need for the extra hidden operation. Networks of such neurons are called Kolmogorov-Arnold Networks (KANs), after the Russian mathematicians who inspired them.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

The simplification, studied in detail by a group led by researchers at MIT, could make it easier to understand why neural networks produce certain outputs, help verify their decisions, and even probe for bias. Preliminary evidence also suggests that as KANs are made bigger, their accuracy increases faster than networks built of traditional neurons.

“It's interesting work,” says Andrew Wilson, who studies the foundations of machine learning at New York University. “It's nice that people are trying to fundamentally rethink the design of these [networks].”

The basic elements of KANs were actually proposed in the 1990s, and researchers kept building simple versions of such networks. But the MIT-led team has taken the idea further, showing how to build and train bigger KANs, performing empirical tests on them, and analyzing some KANs to demonstrate how their problem-solving ability could be interpreted by humans. “We revitalized this idea,” said team member Ziming Liu, a PhD student in Max Tegmark’s lab at MIT. “And, hopefully, with the interpretability… we [may] no longer [have to] think neural networks are black boxes.”

While it's still early days, the team’s work on KANs is attracting attention. GitHub pages have sprung up that show how to use KANs for myriad applications, such as image recognition and solving fluid dynamics problems.

Finding the formula


The current advance came when Liu and colleagues at MIT, Caltech, and other institutes were trying to understand the inner workings of standard artificial neural networks.

Today, almost all types of AI, including those used to build large language models and image recognition systems, include sub-networks known as a multilayer perceptron (MLP). In an MLP, artificial neurons are arranged in dense, interconnected “layers.” Each neuron has within it something called an “activation function”—a mathematical operation that takes in a bunch of inputs and transforms them in some pre-specified manner into an output.

In an MLP, each artificial neuron receives inputs from all the neurons in the previous layer and multiplies each input with a corresponding “weight” (a number signifying the importance of that input). These weighted inputs are added together and fed to the activation function inside the neuron to generate an output, which is then passed on to neurons in the next layer. An MLP learns to distinguish between images of cats and dogs, for example, by choosing the correct values for the weights of the inputs for all the neurons. Crucially, the activation function is fixed and doesn’t change during training.

Once trained, all the neurons of an MLP and their connections taken together essentially act as another function that takes an input (say, tens of thousands of pixels in an image) and produces the desired output (say, 0 for cat and 1 for dog). Understanding what that function looks like, meaning its mathematical form, is an important part of being able to understand why it produces some output. For example, why does it tag someone as creditworthy given inputs about their financial status? But MLPs are black boxes. Reverse-engineering the network is nearly impossible for complex tasks such as image recognition.

And even when Liu and colleagues tried to reverse-engineer an MLP for simpler tasks that involved bespoke “synthetic” data, they struggled.

“If we cannot even interpret these synthetic datasets from neural networks, then it's hopeless to deal with real-world data sets,” says Liu. “We found it really hard to try to understand these neural networks. We wanted to change the architecture.”

Mapping the math


The main change was to remove the fixed activation function and introduce a much simpler learnable function to transform each incoming input before it enters the neuron.

Unlike the activation function in an MLP neuron, which takes in numerous inputs, each simple function outside the KAN neuron takes in one number and spits out another number. Now, during training, instead of learning the individual weights, as happens in an MLP, the KAN just learns how to represent each simple function. In a paper posted this year on the preprint server ArXiv, Liu and colleagues showed that these simple functions outside the neurons are much easier to interpret, making it possible to reconstruct the mathematical form of the function being learned by the entire KAN.

Related Story​

faceoff between a colorful army of the proponents of different philosophies
What is AI?


Everyone thinks they know but no one can agree. And that’s a problem.

The team, however, has only tested the interpretability of KANs on simple, synthetic data sets, not on real-world problems, such as image recognition, which are more complicated. “[We are] slowly pushing the boundary,” says Liu. “Interpretability can be a very challenging task.”

Liu and colleagues have also shown that KANs get more accurate at their tasks with increasing size faster than MLPs do. The team proved the result theoretically and showed it empirically for science-related tasks (such as learning to approximate functions relevant to physics). “It's still unclear whether this observation will extend to standard machine learning tasks, but at least for science-related tasks, it seems promising,” Liu says.

Liu acknowledges that KANs come with one important downside: it takes more time and compute power to train a KAN, compared to an MLP.

“This limits the application efficiency of KANs on large-scale data sets and complex tasks,” says Di Zhang, of Xi’an Jiaotong-Liverpool University in Suzhou, China. But he suggests that more efficient algorithms and hardware accelerators could help.

Anil Ananthaswamy is a science journalist and author who writes about physics, computational neuroscience, and machine learning. His new book, WHY MACHINES LEARN: The Elegant Math Behind Modern AI, was published by Dutton (Penguin Random House US) in July.

by​

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647


Child abuse images removed from AI image-generator training source, researchers say​


90


FILE - Students walk on the Stanford University campus on March 14, 2019, in Stanford, Calif. (AP Photo/Ben Margot, File)

By MATT O’BRIEN

Updated 8:10 PM EDT, August 30, 2024

Artificial intelligence researchers said Friday they have deleted more than 2,000 web links to suspected child sexual abuse imagery from a dataset used to train popular AI image-generator tools.

The LAION research dataset is a huge index of online images and captions that’s been a source for leading AI image-makers such as Stable Diffusion and Midjourney.

But a report last year by the Stanford Internet Observatory found it contained links to sexually explicit images of children, contributing to the ease with which some AI tools have been able to produce photorealistic deepfakes that depict children.

That December report led LAION, which stands for the nonprofit Large-scale Artificial Intelligence Open Network, to immediately remove its dataset. Eight months later, LAION said in a blog post that it worked with the Stanford University watchdog group and anti-abuse organizations in Canada and the United Kingdom to fix the problem and release a cleaned-up dataset for future AI research.

Stanford researcher David Thiel, author of the December report, commended LAION for significant improvements but said the next step is to withdraw from distribution the “tainted models” that are still able to produce child abuse imagery.

Police officers are starting to use AI chatbots to write crime reports. Will they hold up in court?

One of the LAION-based tools that Stanford identified as the “most popular model for generating explicit imagery” — an older and lightly filtered version of Stable Diffusion — remained easily accessible until Thursday, when the New York-based company Runway ML removed it from the AI model repository Hugging Face. Runway said in a statement Friday it was a “planned deprecation of research models and code that have not been actively maintained.”

The cleaned-up version of the LAION dataset comes as governments around the world are taking a closer look at how some tech tools are being used to make or distribute illegal images of children.

San Francisco’s city attorney earlier this month filed a lawsuit seeking to shut down a group of websites that enable the creation of AI-generated nudes of women and girls. The alleged distribution of child sexual abuse images on the messaging app Telegram is part of what led French authorities to bring charges on Wednesday against the platform’s founder and CEO, Pavel Durov.

Durov’s arrest “signals a really big change in the whole tech industry that the founders of these platforms can be held personally responsible,” said David Evan Harris, a researcher at the University of California, Berkeley who recently reached out to Runway asking about why the problematic AI image-generator was still publicly accessible. It was taken down days later.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647

AI

The org behind the dataset used to train Stable Diffusion claims it has removed CSAM​


Kyle Wiggers

10:39 AM PDT • August 30, 2024


A collage of images created by Stable Diffusion.
Image Credits: Daniel Jeffries (opens in a new window)

LAION,
the German research org that created the data used to train Stable Diffusion, among other generative AI models, has released a new dataset that it claims has been “thoroughly cleaned of known links to suspected child sexual abuse material (CSAM).”

The new dataset, Re-LAION-5B, is actually a re-release of an old dataset, LAION-5B — but with “fixes” implemented with recommendations from the nonprofit Internet Watch Foundation, Human Rights Watch, the Canadian Center for Child Protection and the now-defunct Stanford Internet Observatory. It’s available for download in two versions, Re-LAION-5B Research and Re-LAION-5B Research-Safe (which also removes additional NSFW content), both of which were filtered for thousands of links to known — and “likely” — CSAM, LAION says.

“LAION has been committed to removing illegal content from its datasets from the very beginning and has implemented appropriate measures to achieve this from the outset,” LAION wrote in a blog post. “LAION strictly adheres to the principle that illegal content is removed ASAP after it becomes known.”

Important to note is that LAION’s datasets don’t — and never did — contain images. Rather, they’re indexes of links to images and image alt text that LAION curated, all of which came from a different dataset — the Common Crawl — of scraped sites and web pages.

The release of Re-LAION-5B comes after an investigation in December 2023 by the Stanford Internet Observatory that found that LAION-5B — specifically a subset called LAION-5B 400M — included at least 1,679 links to illegal images scraped from social media posts and popular adult websites. According to the report, 400M also contained links to “a wide range of inappropriate content including pornographic imagery, racist slurs, and harmful social stereotypes.”

While the Stanford co-authors of the report noted that it would be difficult to remove the offending content and that the presence of CSAM doesn’t necessarily influence the output of models trained on the dataset, LAION said it would temporarily take LAION-5B offline.

The Stanford report recommended that models trained on LAION-5B “should be deprecated and distribution ceased where feasible.” Perhaps relatedly, AI startup Runway recently took down its Stable Diffusion 1.5 model from the AI hosting platform Hugging Face; we’ve reached out to the company for more information. (Runway in 2023 partnered with Stability AI, the company behind Stable Diffusion, to help train the original Stable Diffusion model.)

Of the new Re-LAION-5B dataset, which contains around 5.5 billion text-image pairs and was released under an Apache 2.0 license, LAION says that the metadata can be used by third parties to clean existing copies of LAION-5B by removing the matching illegal content.

LAION stresses that its datasets are intended for research — not commercial — purposes. But, if history is any indication, that won’t dissuade some organizations. Beyond Stability AI, Google once used LAION datasets to train its image-generating models.

“In all, 2,236 links [to suspected CSAM] were removed after matching with the lists of link and image hashes provided by our partners,” LAION continued in the post. “These links also subsume 1008 links found by the Stanford Internet Observatory report in December 2023 … We strongly urge all research labs and organizations who still make use of old LAION-5B to migrate to Re-LAION-5B datasets as soon as possible.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647


1/3
The prompt used in the paper 'People Cannot Distinguish GPT-4 from a Human in a Turing Test' is quite revealing about humans.

tldr: "Be dumb"

2/3
Yeah, It's only a 5-minute Turing test.

"GPT-4 was judged to be a human 54% of the time, outperforming ELIZA (22%) but lagging behind actual humans (67%)"

3/3
https://arxiv.org/pdf/2405.08007


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GWYOD3VWsAAEg8S.png

 
Top