The A.I Megathread (LLM , GPT , Development)

bnew · Feb 21, 2024

Google Gemma: because Google doesn’t want to give away Gemini yet

Gemma could be faster and more cost-efficient.

www.theverge.com

Google Gemma: because Google doesn’t want to give away Gemini yet

Gemma 2B and Gemma 7B are smaller open-source AI models for language tasks in English.

By Emilia David, a reporter who covers AI. Prior to joining The Verge, she covered the intersection between technology, finance, and the economy.

Feb 21, 2024, 8:00 AM EST

Google’s new model Gemma.
Image: Google

Google has released Gemma 2B and 7B, a pair of open-source AI models that let developers use the research that went into its flagship Gemini more freely. While Gemini is a big closed AI model that directly competes with (and is nearly as powerful as) OpenAI’s ChatGPT, the lightweight Gemma will likely be suitable for smaller tasks like simple chatbots or summarizations.

But what these models lack in complication, they may make up for in speed and cost of use. Despite their smaller size, Google claims Gemma models “surpass significantly larger models on key benchmarks” and are “capable of running directly on a developer laptop or desktop computer.” They will be available via Kaggle, Hugging Face, Nvidia’s NeMo, and Google’s Vertex AI.

Gemma’s release into the open-source ecosystem is starkly different from how Gemini was released. While developers can build on Gemini, they do that either through APIs or by working on Google’s Vertex AI platform. Gemini is considered a closed AI model. By making Gemma open source, more people can experiment with Google’s AI rather than turn to competitors that offer better access.

Both model sizes will be available with a commercial license regardless of organization size, number of users, and the type of project. However, Google — like other companies — often prohibits its models from being used for specific tasks such as weapons development programs.

Gemma will also ship with “responsible AI toolkits,” as open models can be harder to place guardrails in than more closed systems like Gemini. Tris Warkentin, product management director at Google DeepMind, said the company undertook “more extensive red-teaming to Gemma because of the inherent risks involved with open models.”

The responsible AI toolkit will allow developers to create their own guidelines or a banned word list when deploying Gemma to their projects. It also includes a model debugging tool that lets users investigate Gemma’s behavior and correct issues.

The models work best for language-related tasks in English for now, according to Warkentin. “We hope we can build with the community to address market needs outside of English-language tasks,” he told reporters.

Developers can use Gemma for free in Kaggle, and first-time Google Cloud users get $300 in credits to use the models. The company said researchers can apply for up to $500,000 in cloud credits.

While it’s not clear how much of a demand there is for smaller models like Gemma, other AI companies have released lighter-weight versions of their flagship foundation models, too. Meta put out Llama 2 7B, the smallest iteration of Llama 2, last year. Gemini itself comes in several weights, including Gemini Nano, Gemini Pro, and Gemini Ultra, and Google recently announced a faster Gemini 1.5 — again, for business users and developers for now.

Gemma, by the way, means precious stone.

bnew · Feb 21, 2024

Gemma: Introducing new state-of-the-art open models

Gemma is a family of lightweight, state\u002Dof\u002Dthe art open models built from the same research and technology used to create the Gemini models.

blog.google

DEVELOPERS

Gemma: Introducing new state-of-the-art open models

Feb 21, 2024
3 min read

Gemma is built for responsible AI development from the same research and technology used to create Gemini models.

Jeanine Banks
VP & GM, Developer X and DevRel

Tris Warkentin
Director, Google DeepMind

The word “Gemma” and a spark icon with blueprint styling appears in a blue gradient against a black background.

Listen to article7 minutes

At Google, we believe in making AI helpful for everyone. We have a long history of contributing innovations to the open community, such as with Transformers, TensorFlow, BERT, T5, JAX, AlphaFold, and AlphaCode. Today, we’re excited to introduce a new generation of open models from Google to assist developers and researchers in building AI responsibly.

Gemma open models

Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. Developed by Google DeepMind and other teams across Google, Gemma is inspired by Gemini, and the name reflects the Latin gemma, meaning “precious stone.” Accompanying our model weights, we’re also releasing tools to support developer innovation, foster collaboration, and guide responsible use of Gemma models.

Gemma is available worldwide, starting today. Here are the key details to know:

We’re releasing model weights in two sizes: Gemma 2B and Gemma 7B. Each size is released with pre-trained and instruction-tuned variants.
A new Responsible Generative AI Toolkit provides guidance and essential tools for creating safer AI applications with Gemma.
We’re providing toolchains for inference and supervised fine-tuning (SFT) across all major frameworks: JAX, PyTorch, and TensorFlow through native Keras 3.0.
Ready-to-use Colab and Kaggle notebooks, alongside integration with popular tools such as Hugging Face, MaxText, NVIDIA NeMo and TensorRT-LLM, make it easy to get started with Gemma.
Pre-trained and instruction-tuned Gemma models can run on your laptop, workstation, or Google Cloud with easy deployment on Vertex AI and Google Kubernetes Engine (GKE).
Optimization across multiple AI hardware platforms ensures industry-leading performance, including NVIDIA GPUs and Google Cloud TPUs.
Terms of use permit responsible commercial usage and distribution for all organizations, regardless of size.

State-of-the-art performance at size

Gemma models share technical and infrastructure components with Gemini, our largest and most capable AI model widely available today. This enables Gemma 2B and 7B to achieve best-in-class performance for their sizes compared to other open models. And Gemma models are capable of running directly on a developer laptop or desktop computer. Notably, Gemma surpasses significantly larger models on key benchmarks while adhering to our rigorous standards for safe and responsible outputs. See the technical report for details on performance, dataset composition, and modeling methodologies.

A chart showing Gemma performance on common benchmarks, compared to Llama-2 7B and 13B

bnew · Feb 21, 2024

Responsible by design

Gemma is designed with our AI Principles at the forefront. As part of making Gemma pre-trained models safe and reliable, we used automated techniques to filter out certain personal information and other sensitive data from training sets. Additionally, we used extensive fine-tuning and reinforcement learning from human feedback (RLHF) to align our instruction-tuned models with responsible behaviors. To understand and reduce the risk profile for Gemma models, we conducted robust evaluations including manual red-teaming, automated adversarial testing, and assessments of model capabilities for dangerous activities. These evaluations are outlined in our Model Card.

1

We’re also releasing a new Responsible Generative AI Toolkit together with Gemma to help developers and researchers prioritize building safe and responsible AI applications. The toolkit includes:

Safety classification: We provide a novel methodology for building robust safety classifiers with minimal examples.
Debugging: A model debugging tool helps you investigate Gemma's behavior and address potential issues.
Guidance: You can access best practices for model builders based on Google’s experience in developing and deploying large language models.

Optimized across frameworks, tools and hardware

You can fine-tune Gemma models on your own data to adapt to specific application needs, such as summarization or retrieval-augmented generation (RAG). Gemma supports a wide variety of tools and systems:

Multi-framework tools: Bring your favorite framework, with reference implementations for inference and fine-tuning across multi-framework Keras 3.0, native PyTorch, JAX, and Hugging Face Transformers.
Cross-device compatibility: Gemma models run across popular device types, including laptop, desktop, IoT, mobile and cloud, enabling broadly accessible AI capabilities.
Cutting-edge hardware platforms: We’ve partnered with NVIDIA to optimize Gemma for NVIDIA GPUs, from data center to the cloud to local RTX AI PCs, ensuring industry-leading performance and integration with cutting-edge technology.
Optimized for Google Cloud: Vertex AI provides a broad MLOps toolset with a range of tuning options and one-click deployment using built-in inference optimizations. Advanced customization is available with fully-managed Vertex AI tools or with self-managed GKE, including deployment to cost-efficient infrastructure across GPU, TPU, and CPU from either platform.

Free credits for research and development

Gemma is built for the open community of developers and researchers powering AI innovation. You can start working with Gemma today using free access in Kaggle, a free tier for Colab notebooks, and $300 in credits for first-time Google Cloud users. Researchers can also apply for Google Cloud credits of up to $500,000 to accelerate their projects.

Getting started

You can explore more about Gemma and access quickstart guides on ai.google.dev/gemma.

As we continue to expand the Gemma model family, we look forward to introducing new variants for diverse applications. Stay tuned for events and opportunities in the coming weeks to connect, learn and build with Gemma.

We’re excited to see what you create!

bnew · Feb 21, 2024

[2402.10171] Data Engineering for Scaling Language Models to 128K Context

Computer Science > Computation and Language

[Submitted on 15 Feb 2024]

Data Engineering for Scaling Language Models to 128K Context

Yao Fu, Rameswar Panda, Xinyao Niu, Xiang Yue, Hannaneh Hajishirzi, Yoon Kim, Hao Peng

We study the continual pretraining recipe for scaling language models' context lengths to 128K, with a focus on data engineering. We hypothesize that long context modeling, in particular \textit{the ability to utilize information at arbitrary input locations}, is a capability that is mostly already acquired through large-scale pretraining, and that this capability can be readily extended to contexts substantially longer than seen during training~(e.g., 4K to 128K) through lightweight continual pretraining on appropriate data mixture. We investigate the \textit{quantity} and \textit{quality} of the data for continual pretraining: (1) for quantity, we show that 500 million to 5 billion tokens are enough to enable the model to retrieve information anywhere within the 128K context; (2) for quality, our results equally emphasize \textit{domain balance} and \textit{length upsampling}. Concretely, we find that naively upsampling longer data on certain domains like books, a common practice of existing work, gives suboptimal performance, and that a balanced domain mixture is important. We demonstrate that continual pretraining of the full model on 1B-5B tokens of such data is an effective and affordable strategy for scaling the context length of language models to 128K. Our recipe outperforms strong open-source long-context models and closes the gap to frontier models like GPT-4 128K.

Comments:	Code at this https URL
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2402.10171 [cs.CL]
	(or arXiv:2402.10171v1 [cs.CL] for this version)
	[2402.10171] Data Engineering for Scaling Language Models to 128K Context Focus to learn more

Submission history

From: Yao Fu [view email]
[v1] Thu, 15 Feb 2024 18:19:16 UTC (1,657 KB)

https://arxiv.org/pdf/2402.10171.pdf

GitHub - FranxYao/Long-Context-Data-Engineering: Implementation of paper Data Engineering for Scaling Language Models to 128K Context

Implementation of paper Data Engineering for Scaling Language Models to 128K Context - FranxYao/Long-Context-Data-Engineering

github.com

About

Implementation of paper Data Engineering for Scaling Language Models to 128K Context

WIA20XX · Feb 21, 2024

bnew · Feb 21, 2024

bnew · Feb 21, 2024

https://archive.is/Cs6Q0

SDXL Lightning - a Hugging Face Space by AP123

Super-fast image generation on SDX

huggingface.co

Text-to-Image with SDXL Lightning

This demo utilizes the SDXL-Lightning model by ByteDance, which is a fast text-to-image generative model capable of producing high-quality images in 4 steps. As a community effort, this demo was put together by AngryPenguin. Link to model: https://huggingface.co/ByteDance/SDXL-Lightning

ByteDance/SDXL-Lightning · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

bnew · Feb 21, 2024

In today’s video I want to show you how to train your own Lora. What is Lora first of all, and why do you need to train it.

Google Colaboratory

colab.research.google.com

Google Colaboratory

colab.research.google.com

https://www.youtube.com/channel/UCguoGWHf8uliUg6GQmL05ZQ

Annie Richter – Medium

Read writing from Annie Richter on Medium. Every day, Annie Richter and thousands of other voices read, write, and share important stories on Medium.

medium.com

bnew · Feb 22, 2024

text said:
Mind officially blown:

I recorded a screen capture of a task (looking for an apartment on Zillow). Gemini was able to generate Selenium code to replicate that task, and described everything I did step-by-step.

It even caught that my threshold was set to $3K, even though I didn't explicitly select it.

"This code will open a Chrome browser, navigate to Zillow, enter "Cupertino, CA" in the search bar, click on the "For Rent" tab, set the price range to "Up to $3K", set the number of bedrooms to "2+", select the "Apartments/Condos/Co-ops" checkbox, click on the "Apply" button, wait for the results to load, print the results, and close the browser."

bnew · Feb 22, 2024

Stable Diffusion 3 — Stability AI

Announcing Stable Diffusion 3 in early preview, our most capable text-to-image model with greatly improved performance in multi-subject prompts, image quality, and spelling abilities.

stability.ai

Stable Diffusion 3

22 Feb

Prompt: Epic anime artwork of a wizard atop a mountain at night casting a cosmic spell into the dark sky that says "Stable Diffusion 3" made out of colorful energy

Announcing Stable Diffusion 3 in early preview, our most capable text-to-image model with greatly improved performance in multi-subject prompts, image quality, and spelling abilities.

While the model is not yet broadly available, today, we are opening the waitlist for an early preview. This preview phase, as with previous models, is crucial for gathering insights to improve its performance and safety ahead of an open release. You can sign up to join the waitlist here.

The Stable Diffusion 3 suite of models currently range from 800M to 8B parameters. This approach aims to align with our core values and democratize access, providing users with a variety of options for scalability and quality to best meet their creative needs. Stable Diffusion 3 combines a diffusion transformer architecture and flow matching. We will publish a detailed technical report soon.

We believe in safe, responsible AI practices. This means we have taken and continue to take reasonable steps to prevent the misuse of Stable Diffusion 3 by bad actors. Safety starts when we begin training our model and continues throughout the testing, evaluation, and deployment. In preparation for this early preview, we’ve introduced numerous safeguards. By continually collaborating with researchers, experts, and our community, we expect to innovate further with integrity as we approach the model’s public release.

Our commitment to ensuring generative AI is open, safe, and universally accessible remains steadfast. With Stable Diffusion 3, we strive to offer adaptable solutions that enable individuals, developers, and enterprises to unleash their creativity, aligning with our mission to activate humanity’s potential.

If you’d like to explore using one of our other image models for commercial use prior to the Stable Diffusion 3 release, please visit our Stability AI Membership page to self host or our Developer Platform to access our API.

To stay updated on our progress follow us on Twitter, Instagram, LinkedIn, and join our Discord Community.

bnew · Feb 22, 2024

‘Slow Horses’ & ‘One Life’ Director Predicts A Show Made Entirely By Generative AI Is Only Three-To-Five Years Away

'Slow Horses' director James Hawes has talked Apple TV+ hit and generative AI.

deadline.com

‘Slow Horses’ & ‘One Life’ Director Predicts A Show Made Entirely By Generative AI Is Only Three-To-Five Years Away

By Max Goldbart

Max Goldbart

International TV Co-Editor
@Goldbart1

Microsoft develops its own networking gear for AI datacenters: Report

Juniper Networks and Fungible founder spearheads development.

www.tomshardware.com

Microsoft develops its own networking gear for AI datacenters: Report

News

By Anton Shilov

published 1 day ago

Juniper Networks and Fungible founder spearheads development.

(Image credit: Shutterstock)

After revealing its own 128-core datacenter CPU and Maia 100 GPU for artificial intelligence workloads, Microsoft has begun development of its own networking card in a bid to decrease its reliance on Nvidia's hardware and speed up its datacenters, reports The Information. If the company succeeds, it could then proceed to optimize its Azure infrastructure and diversify its technology stack. Interestingly, the company has indirectly confirmed the effort.

Microsoft acquired Fungible, a developer of data processing units (DPUs) that competed against AMD's Pensando and Nvidia's Mellanox divisions, about a year ago. That means the company clearly has the networking technologies and IP that it needs to design datacenter-grade networking gear suitable for bandwidth-hungry AI training workloads. Pradeep Sindhu, a co-founder of Juniper Networks and founder of Fungible who has a wealth of experience in networking gear, now works at Microsoft and is heading the development of the company's datacenter networking processors.

The new networking card is expected to improve of the performance and efficiency of Microsoft's Azure servers, which currently run Intel CPUs and Nvidia GPUs, but will eventually also adopt Microsoft's own CPUs and GPUs. The Information claims that the project is important for Microsoft, which is why Satya Nadella, the head of the company, appointed Sindhu to the project himself.

"As part of our systems approach to Azure infrastructure, we are focused on optimizing every layer of our stack," a Microsoft spokesperson told The Information. "We routinely develop new technologies to meet the needs of our customers, including networking chips."

High-performance networking gear is crucial for datacenters, especially when handling the massive amount of data required for AI training by clients like OpenAI. By alleviating network traffic jams, the new server component could accelerate the development of AI models, making the process faster and more cost-effective.

Microsoft's move is in line with the industry trend toward custom silicon, as other cloud providers including Amazon Web Services (AWS) and Google are also developing their own AI and general-purpose processors and datacenter networking gear.

The introduction of Microsoft's networking card could potentially impact Nvidia's sales of server networking gear, which is projected to generate over $10 billion per year. If successful, the card could significantly improve the efficiency of Azure datacenters in general and OpenAI's model training in particular, as well as reduce the time and costs associated with AI development, the report claims.

Custom silicon can take a significant amount of time to design and manufacture, which means the initial results of this endeavor could still be years away. In the short term, Microsoft will continue to rely on hardware from other vendors, but that may change in the coming years.

↓R↑LYB · Feb 22, 2024

@bnew props for keeping this thread going breh, you're the only one holding it down :salute:

I don't think people truly understand what's about to happen in the next few years with AI. I'm trying to really get caught up to speed on how to utilize these tools because this shyt appears to be more game changing than the internet.

When you include emerging technologies like Nvidia Omniverse, self operating computers, and foundation agents humans are about to enter a new era of productivity.

bnew · Feb 23, 2024

https://archive.is/DYnYW

Magic AI Secures $117 Million to Build an AI Software Engineer

The startup is carving out a niche by focusing on developing an AI software engineer capable of assisting with complex coding tasks and that will act more as a coworker than merely a "copilot" tool.

www.maginative.com

Magic AI Secures $117 Million to Build an AI Software Engineer

The startup is carving out a niche by focusing on developing an AI software engineer capable of assisting with complex coding tasks and that will act more as a coworker than merely a "copilot" tool.

[URL='https://www.maginative.com/author/chris/']CHRIS MCKAY

FEBRUARY 16, 2024 • 2 MIN READ

Magic AI Secures $117 Million to Build an AI Software Engineer

San Francisco-based startup Magic AI has raised $117 million in Series B funding to further develop its advanced AI system aimed at automating software development.

The round was led by Nat Friedman and Daniel Gross’s NFDG Ventures, with additional participation from CapitalG and Elad Gil. This brings Magic’s total funding to date to over $145 million.

Founded in 2022 by Eric Steinberger and Sebastian De Ro, the startup is carving out a niche by focusing on developing an AI software engineer capable of assisting with complex coding tasks and that will act more as a coworker than merely a "copilot" tool.

The founders believe that in addition to boosting practical coding productivity, advancing intelligent code generation tools can also provide a path toward more expansive artificial general intelligence. Their vision even extends to the creation of broadly capable AGI systems that align with human values - ones able to accelerate global progress by assisting with humanity's most complex challenges. Their $23 million Series A round last summer was a major step towards this ambitious mission.

Central to Magic's technical strategy is handling exceptionally large context windows. Last year, they unveiled their Long-term Memory Network (LTM Net) architecture and corresponding LTM-1 model with a 5 million context window.

For perspective, most language models operate on far more limited contexts, commonly less than 32k tokens. OpenAI's powerful GPT-4 Turbo model is 128k tokens and Anthropic's Claude 2.1 is 200k.

However, models with much larger context windows are on the horizon. Just yesterday, Google announced that their new Gemini 1.5 model will have a 1 million context window, and shared that they have tested up to 10 million context lengths in research.

The substantially larger context capacities allow for more nuanced code comprehension - enabling Magic's model to reason over entire repositories and dependency trees to boost usefulness.

The startup continues to operate in stealth mode with few public demos, but claims to already have thousands of GPUs deployed toward training its next generation models.

Now, with fresh funding in hand, talent recruitment and retention is clearly top of mind for Magic AI’s leadership. The startup is actively seeking talented individuals who share its vision of integrity and innovation and is placing a strong emphasis on cultivating a supportive culture rooted in passion.

With towering investor confidence and transformative ambitions, Magic AI remains one of the most intriguing AI startups amid a landscape filled with heated competition. Still, Magic believes its technical approach focused on extreme model scale and novel neural architectures sets it apart.

The A.I Megathread (LLM , GPT , Development)

Veteran

Google Gemma: because Google doesn’t want to give away Gemini yet​

Gemma 2B and Gemma 7B are smaller open-source AI models for language tasks in English.​

Veteran

Gemma: Introducing new state-of-the-art open models​

Gemma open models​

State-of-the-art performance at size​

Veteran

Responsible by design​

Optimized across frameworks, tools and hardware​

Free credits for research and development​

Getting started​

Veteran

Computer Science > Computation and Language​

Data Engineering for Scaling Language Models to 128K Context​

Submission history​

About​

Superstar

Veteran

Veteran

Text-to-Image with SDXL Lightning ​

Veteran

Veteran

Veteran

Stable Diffusion 3​

Veteran

‘Slow Horses’ & ‘One Life’ Director Predicts A Show Made Entirely By Generative AI Is Only Three-To-Five Years Away​

Max Goldbart​

More Stories By Max​

I trained Sheng Long and Shonuff

Veteran

Microsoft develops its own networking gear for AI datacenters: Report​

I trained Sheng Long and Shonuff

Veteran

Magic AI Secures $117 Million to Build an AI Software Engineer​

[URL='https://www.maginative.com/author/chris/']CHRIS MCKAY​

​

Google Gemma: because Google doesn’t want to give away Gemini yet

Gemma 2B and Gemma 7B are smaller open-source AI models for language tasks in English.

Gemma: Introducing new state-of-the-art open models

Gemma open models

State-of-the-art performance at size

Responsible by design

Optimized across frameworks, tools and hardware

Free credits for research and development

Getting started

Computer Science > Computation and Language

Data Engineering for Scaling Language Models to 128K Context

Submission history

About

Text-to-Image with SDXL Lightning

Stable Diffusion 3

‘Slow Horses’ & ‘One Life’ Director Predicts A Show Made Entirely By Generative AI Is Only Three-To-Five Years Away

Max Goldbart

More Stories By Max

Microsoft develops its own networking gear for AI datacenters: Report

Magic AI Secures $117 Million to Build an AI Software Engineer

[URL='https://www.maginative.com/author/chris/']CHRIS MCKAY