bnew

Veteran
Joined
Nov 1, 2015
Messages
51,837
Reputation
7,926
Daps
148,812

Judging LLM-as-a-judge with MT-Bench and Chatbot Arena​


Judging LLM-as-a-judge with MT-Bench and Chatbot Arena​

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric. P Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica
Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences. To address this, we explore using strong LLMs as judges to evaluate these models on more open-ended questions. We examine the usage and limitations of LLM-as-a-judge, including position, verbosity, and self-enhancement biases, as well as limited reasoning ability, and propose solutions to mitigate some of them. We then verify the agreement between LLM judges and human preferences by introducing two benchmarks: MT-bench, a multi-turn question set; and Chatbot Arena, a crowdsourced battle platform. Our results reveal that strong LLM judges like GPT-4 can match both controlled and crowdsourced human preferences well, achieving over 80\% agreement, the same level of agreement between humans. Hence, LLM-as-a-judge is a scalable and explainable way to approximate human preferences, which are otherwise very expensive to obtain. Additionally, we show our benchmark and traditional benchmarks complement each other by evaluating several variants of LLaMA and Vicuna. We will publicly release MT-bench questions, 3K expert votes, and 30K conversations with human preferences from Chatbot Arena.
Subjects:Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:arXiv:2306.05685 [cs.CL]
(or arXiv:2306.05685v2 [cs.CL] for this version)
[2306.05685] Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Focus to learn more

Submission history​

From: Lianmin Zheng [view email]
[v1] Fri, 9 Jun 2023 05:55:52 UTC (1,667 KB)
[v2] Wed, 12 Jul 2023 01:42:26 UTC (1,708 KB)




In this blog post, we share the latest update on Chatbot Arena leaderboard, which now includes more open models and three metrics:

  1. Chatbot Arena Elo, based on 42K anonymous votes from Chatbot Arena using the Elo rating system.
  2. MT-Bench score, based on a challenging multi-turn benchmark and GPT-4 grading, proposed and validated in our Judging LLM-as-a-judge paper.
  3. MMLU, a widely adopted benchmark.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,837
Reputation
7,926
Daps
148,812

LLMs can be extended to infinite sequence lengths without fine-tuning​


Mike Young

Oct 2, 20234 min
LLMs trained with a finite attention window can be extended to infinite sequence lengths without any fine-tuning.

Screen-Shot-2023-10-02-at-2.20.56-PM.png

StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up 4M tokens.

In recent years, natural language processing has been revolutionized by the advent of large language models (LLMs). Massive neural networks like GPT-3, PaLM, and BlenderBot have demonstrated remarkable proficiency at various language tasks like conversational AI, summarization, and question-answering. However, a major impediment restricts their practical deployment in real-world streaming applications.

LLMs are pre-trained on texts of finite lengths, usually a few thousand tokens. As a result, their performance deteriorates rapidly when presented with sequence lengths exceeding their training corpus. This limitation renders LLMs incapable of reliably handling long conversations as required in chatbots and other interactive systems. Additionally, their inference process caches all previous tokens' key-value states, consuming extensive memory.

Researchers from MIT, Meta AI, and Carnegie Mellon recently proposed StreamingLLM, an efficient framework to enable infinite-length language modeling in LLMs without expensive fine-tuning. Their method cleverly exploits the LLMs' tendency to use initial tokens as "attention sinks" to anchor the distribution of attention scores. By caching initial tokens alongside recent ones, StreamingLLM restored perplexity and achieved up to 22x faster decoding than prior techniques.

The paper they published says it clearly:

We introduce StreamingLLM, an efficient framework that enables LLMs trained with a finite length attention window to generalize to infinite sequence length without any fine-tuning. We show that StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more.
This blog post explains the key technical findings of this work and their significance in plain terms. The ability to deploy LLMs for unlimited streaming inputs could expand their applicability across areas like assistive AI, tutoring systems, and long-form document generation. However, cautions remain around transparency, bias mitigation, and responsible use of these increasingly powerful models.

Subscribe or follow me on Twitter for more content like this!

The Challenges of Deploying LLMs for Streaming​

Unlike humans who can sustain conversations for hours, LLMs falter beyond short contexts. Two primary issues encumber their streaming application:

Memory Overhead: LLMs based on Transformer architectures cache the key-value states of all previous tokens during inference. This memory footprint balloons with sequence length, eventually exhausting GPU memory.

Performance Decline: More critically, LLMs lose their abilities when context lengths exceed those seen during pre-training. For example, a model trained on 4,000 token texts fails on longer sequences.

Real-world services like chatbots, tutoring systems, and voice assistants often need to maintain prolonged interactions. But LLMs' limited context capacity hampers their deployment in such streaming settings. Prior research attempted to expand the training corpus length or optimize memory usage, but fundamental barriers remained.

Windowed Attention and Its Limitations​

An intuitive technique called windowed attention emerged to mitigate LLMs' memory overhead. Here, only the key-values of the most recent tokens within a fixed cache size are retained. Once this rolling cache becomes full, the earliest states are evicted. This ensures constant memory usage and inference time.

However, an annoying phenomenon occurs - the model's predictive performance drastically deteriorates soon after the starting tokens fade from the cache. But why should removing seemingly unimportant old tokens impact future predictions so severely?

The Curious Case of Attention Sinks​

Analyzing this conundrum revealed the LLM's excessive attention towards initial tokens, making them act as "attention sinks." Even if semantically irrelevant, they attract high attention scores across layers and heads.

The reason lies in the softmax normalization of attention distributions. Some minimal attention must be allocated across all context tokens due to the softmax function’s probabilistic nature. The LLM dumps this unnecessary attention into specific tokens - preferentially the initial ones visible to all subsequent positions.

Critically, evicting the key-values of these attention sinks warped the softmax attention distribution. This destabilized the LLM's predictions, explaining windowed attention's failure.

StreamingLLM: Caching Initial Sinks and Recent Tokens​

Leveraging this insight, the researchers devised StreamingLLM - a straightforward technique to enable infinite-length modeling in already trained LLMs, without any fine-tuning.

The key innovation is maintaining a small cache containing initial "sink" tokens alongside only the most recent tokens. Specifically, adding just 4 initial tokens proved sufficient to recover the distribution of attention scores back to normal levels. StreamingLLM combines this compact set of anchored sinks with a rolling buffer of recent key-values relevant for predictions.

(There are some interesting parallels to a similar paper in ViT research around registers, which you can read here.)

This simple restoration allowed various LLMs like GPT-3, PaLM and LaMDA to smoothly handle context lengths exceeding 4 million tokens - a 1000x increase over their training corpus! Dumping unnecessary attention into the dedicated sinks prevented distortion, while recent tokens provided relevant semantics.

StreamingLLM achieved up to 22x lower latency than prior approaches while retaining comparable perplexity. So by removing this key bottleneck, we may be able to enable practical streaming deployment of LLMs in interactive AI systems.

Pre-training with a Single Sink Token​

Further analysis revealed that LLMs learned to split attention across multiple initial tokens because their training data lacked a consistent starting element. The researchers proposed appending a special "Sink Token" to all examples during pre-training.

Models trained this way coalesced attention into this single dedicated sink. At inference time, providing just this token alongside recent ones sufficiently stabilized predictions - no other initial elements were needed. This method could further optimize future LLM designs for streaming usage.

Conclusion​

By identifying initial tokens as attention sinks, StreamingLLM finally enables large language models to fulfill their potential in real-world streaming applications. Chatbots, virtual assistants, and other systems can now leverage LLMs to smoothly sustain long conversations.

However, while this removes a technical barrier, concerns around bias, transparency, and responsible AI remain when deploying such powerful models interacting with humans - infinite context window or not. But used judiciously under the right frameworks, the StreamingLLM approach could open up new beneficial applications of large language models.




EFFICIENT STREAMING LANGUAGE MODELS WITH ATTENTION SINKS​

ABSTRACT

Deploying Large Language Models (LLMs) in streaming applications such asmulti-round dialogue, where long interactions are expected, is urgently needed butposes two major challenges. Firstly, during the decoding stage, caching previoustokens’ Key and Value states (KV) consumes extensive memory. Secondly, popularLLMs cannot generalize to longer texts than the training sequence length. Windowattention, where only the most recent KVs are cached, is a natural approach — butwe show that it fails when the text length surpasses the cache size. We observean interesting phenomenon, namely attention sink, that keeping the KV of initialtokens will largely recover the performance of window attention. In this paper, wefirst demonstrate that the emergence of attention sink is due to the strong attentionscores towards initial tokens as a “sink” even if they are not semantically important.Based on the above analysis, we introduce StreamingLLM, an efficient frameworkthat enables LLMs trained with a finite length attention window to generalize toinfinite sequence length without any fine-tuning. We show that StreamingLLM canenable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient languagemodeling with up to 4 million tokens and more. In addition, we discover thatadding a placeholder token as a dedicated attention sink during pre-training canfurther improve streaming deployment. In streaming settings, StreamingLLMoutperforms the sliding window recomputation baseline by up to 22.2× speedup.Code and datasets are provided in the link.





 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,837
Reputation
7,926
Daps
148,812

[Submitted on 3 Oct 2023]

Think before you speak: Training Language Models With Pause Tokens​

Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan
Language models generate responses by producing a series of tokens in immediate succession: the (K+1)th token is an outcome of manipulating K hidden vectors per layer, one vector per preceding token. What if instead we were to let the model manipulate say, K+10 hidden vectors, before it outputs the (K+1)th token? We operationalize this idea by performing training and inference on language models with a (learnable) pause token, a sequence of which is appended to the input prefix. We then delay extracting the model's outputs until the last pause token is seen, thereby allowing the model to process extra computation before committing to an answer. We empirically evaluate pause-training on decoder-only models of 1B and 130M parameters with causal pretraining on C4, and on downstream tasks covering reasoning, question-answering, general understanding and fact recall. Our main finding is that inference-time delays show gains when the model is both pre-trained and finetuned with delays. For the 1B model, we witness gains on 8 of 9 tasks, most prominently, a gain of 18% EM score on the QA task of SQuAD, 8% on CommonSenseQA and 1% accuracy on the reasoning task of GSM8k. Our work raises a range of conceptual and practical future research questions on making delayed next-token prediction a widely applicable new paradigm.
Comments:19 pages, 7 figures
Subjects:Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:arXiv:2310.02226 [cs.CL]
(or arXiv:2310.02226v1 [cs.CL] for this version)
[2310.02226] Think before you speak: Training Language Models With Pause Tokens
Focus to learn more

Submission history​

From: Sachin Goyal [view email]
[v1] Tue, 3 Oct 2023 17:32:41 UTC (610 KB)

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,837
Reputation
7,926
Daps
148,812

These videos are entirely synthetically generated by @wayve_ai 's generative AI, GAIA-1.​





Scaling GAIA-1: 9-billion parameter generative world model for autonomous driving​

In June 2023, we unveiled GAIA-1 as the first proof of concept of a cutting-edge generative model for autonomous driving. We’ve spent the last few months optimising GAIA-1 to efficiently enable the ability to generate videos at higher resolution and improve the world model’s quality with larger-scale training. In this blog post, we are excited to release the technical report of GAIA-1 and the results of scaling GAIA-1 to over 9 billion parameters.

Overview​

GAIA-1 is a cutting-edge generative world model built for autonomous driving. A world model learns representations of the environment and its future dynamics, providing a structured understanding of the surroundings that can be leveraged for making informed decisions when driving. Predicting future events is a fundamental and critical aspect of autonomous systems. Accurate future prediction enables autonomous vehicles to anticipate and plan their actions, enhancing safety and efficiency on the road. Incorporating world models into driving models yields the potential to enable them to understand human decisions better and ultimately generalise to more real-world situations.
GAIA-1 is a model that leverages video, text and action inputs to generate realistic driving videos and offers fine-grained control over ego-vehicle behaviour and scene features. Due to its multi-modal nature, GAIA-1 can generate videos from many prompt modalities and combinations.
gaia1_prompt_types-1920x1876.jpg
Examples of types of prompts that GAIA-1 can use to generate videos. GAIA-1 can generate videos by performing the future rollout starting from a video prompt. These future rollouts can be further conditioned on actions to influence particular behaviours of the ego-vehicle (e.g. steer left), or by text to drive a change in some aspects of the scene (e.g. change the colour of the traffic light). For speed and curvature, we condition GAIA-1 by passing the sequence of future speed and/or curvature values. GAIA-1 can also generate realistic videos from text prompts, or by simply drawing samples from its prior distribution (fully unconditional generation).

GAIA-1 can generate realistic videos of driving scenes with high levels of controllability. In the example below, we see a short video generated by GAIA-1 where the model generates night-time driving data with snow on the road.


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,837
Reputation
7,926
Daps
148,812

Prompts for Education: Enhancing Productivity & Learning

About this Repository

Welcome to the Prompts for Education repository! Our mission is to transform the way students, educators, and staff in K-12 and higher education institutions interact with generative AI technology like ChatGPT and Bing Chat. By using these prompts, staff can save time and work more efficiently, and students can explore new and exciting learning opportunities.


Whether you're a student, a third-grade teacher, a college professor, or a school administrator, this collection is designed with you in mind. No technical expertise required!

Responsible AI with the Azure OpenAI and Bing Chat

At Microsoft, we're committed to the advancement of AI driven by principles that put people first. Generative models such as the ones available in Azure OpenAI Service and Bing Chat have significant potential benefits, but without careful design and thoughtful mitigations, such models have the potential to generate incorrect or even harmful content. Microsoft has made significant investments to help guard against abuse and unintended harm, which includes a registration process for access to the Azure OpenAI Service, incorporating Microsoft’s principles for responsible AI use, building content filters to support customers, and providing responsible AI implementation guidance to onboarded customers.


More details on the RAI guidelines for the Azure OpenAI Service can be found here.

Responsible AI Principles

  • Fairness: AI Systems should treat all people fairly.
  • Reliability and Safety: AI systems should perform reliably and safely.
  • Privacy and security: AI systems should be secure and respect privacy.
  • Inclusiveness: AI systems should empower everyone and engage people.
  • Transparency: AI systems should be understandable.
  • Accountability: People should be accountable for AI systems.

More details on the Responsible AI Principles here.

Disclaimer

While the prompts in this repository are designed with care and intended for educational use, users should be aware of potential risks in their application. Large Language Models (LLMs) may interpret prompts in ways that were not originally intended, leading to unexpected or inappropriate responses. We strongly encourage users to customize the prompts to fit their unique contexts, students, and needs, and to review the responses from LLMs for suitability and accuracy. Always exercise caution and professional judgment when incorporating these prompts into your educational environment.

What's a Prompt?

Think of a prompt as a special question or statement that you can give to an artificial intelligence model like GPT. It's designed to provide you with information, insights, or even creative ideas tailored to your needs. It's like having a knowledgeable assistant at your fingertips!

Improved Productivity for Faculty & Staff

Administrators, teachers, and other staff members can utilize these prompts to:

  • Create Engaging Lessons: Quickly design interesting and interactive lessons that captivate students.
  • Answer Student Questions: Provide accurate and fast answers to common student inquiries.
  • Automate Routine Tasks: Simplify day-to-day tasks with ready-to-use prompts.

New Learning Opportunities for Students

Students can use these prompts to:

  • Explore Subjects in Depth: Dive into various subjects with expert guidance.
  • Enhance Creativity: Develop writing, artistic, and critical thinking skills.
  • Personalize Learning: Tailor their learning experiences to their individual interests and needs.

How to Use

  1. Find a Prompt: Browse through our collection (currently a work in progress).
  2. Copy & Paste: Follow the direct link to Bing Chat or highlight, copy, and paste the prompt into your GPT-powered tool.
  3. Apply the Answer: Use the response in your teaching, administrative tasks, or educational activities.

Roles

 

GoldenGlove

😐😑😶😑😐
Staff member
Supporter
Joined
May 1, 2012
Messages
58,222
Reputation
5,496
Daps
137,279

Prompts for Education: Enhancing Productivity & Learning

About this Repository

Welcome to the Prompts for Education repository! Our mission is to transform the way students, educators, and staff in K-12 and higher education institutions interact with generative AI technology like ChatGPT and Bing Chat. By using these prompts, staff can save time and work more efficiently, and students can explore new and exciting learning opportunities.


Whether you're a student, a third-grade teacher, a college professor, or a school administrator, this collection is designed with you in mind. No technical expertise required!

Responsible AI with the Azure OpenAI and Bing Chat

At Microsoft, we're committed to the advancement of AI driven by principles that put people first. Generative models such as the ones available in Azure OpenAI Service and Bing Chat have significant potential benefits, but without careful design and thoughtful mitigations, such models have the potential to generate incorrect or even harmful content. Microsoft has made significant investments to help guard against abuse and unintended harm, which includes a registration process for access to the Azure OpenAI Service, incorporating Microsoft’s principles for responsible AI use, building content filters to support customers, and providing responsible AI implementation guidance to onboarded customers.


More details on the RAI guidelines for the Azure OpenAI Service can be found here.

Responsible AI Principles

  • Fairness: AI Systems should treat all people fairly.
  • Reliability and Safety: AI systems should perform reliably and safely.
  • Privacy and security: AI systems should be secure and respect privacy.
  • Inclusiveness: AI systems should empower everyone and engage people.
  • Transparency: AI systems should be understandable.
  • Accountability: People should be accountable for AI systems.

More details on the Responsible AI Principles here.

Disclaimer

While the prompts in this repository are designed with care and intended for educational use, users should be aware of potential risks in their application. Large Language Models (LLMs) may interpret prompts in ways that were not originally intended, leading to unexpected or inappropriate responses. We strongly encourage users to customize the prompts to fit their unique contexts, students, and needs, and to review the responses from LLMs for suitability and accuracy. Always exercise caution and professional judgment when incorporating these prompts into your educational environment.

What's a Prompt?

Think of a prompt as a special question or statement that you can give to an artificial intelligence model like GPT. It's designed to provide you with information, insights, or even creative ideas tailored to your needs. It's like having a knowledgeable assistant at your fingertips!

Improved Productivity for Faculty & Staff

Administrators, teachers, and other staff members can utilize these prompts to:

  • Create Engaging Lessons: Quickly design interesting and interactive lessons that captivate students.
  • Answer Student Questions: Provide accurate and fast answers to common student inquiries.
  • Automate Routine Tasks: Simplify day-to-day tasks with ready-to-use prompts.

New Learning Opportunities for Students

Students can use these prompts to:

  • Explore Subjects in Depth: Dive into various subjects with expert guidance.
  • Enhance Creativity: Develop writing, artistic, and critical thinking skills.
  • Personalize Learning: Tailor their learning experiences to their individual interests and needs.

How to Use

  1. Find a Prompt: Browse through our collection (currently a work in progress).
  2. Copy & Paste: Follow the direct link to Bing Chat or highlight, copy, and paste the prompt into your GPT-powered tool.
  3. Apply the Answer: Use the response in your teaching, administrative tasks, or educational activities.

Roles

I had this use case in my head for months now
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,837
Reputation
7,926
Daps
148,812

A Taxonomy of Prompt Modifiers for Text-To-Image Generation​


"A Taxonomy of Prompt Modifiers for Text-To-Image Generation"

This paper identifies six types of prompt modifiers used by practitioners in the
online text-to-image community based on a 3-month ethnographic study. The novel taxonomy of prompt
modifiers provides researchers a conceptual starting point for investigating the practice of text-to-image
generation, but may also help practitioners of AI generated art improve their images.
LOk6eRl.jpeg



 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,837
Reputation
7,926
Daps
148,812


Bing Is Generating Images of SpongeBob Doing 9/11​

·OCT 4, 2023 AT 9:21 AM

Microsoft's Bing Image Creator lets beloved characters fly planes toward tall buildings, illustrating the struggles of generative AI models have with copyright and filtering.
“spongebob sitting in the cockpit of a plane, flying toward two tall skyscrapers made by Bing Image Creator
MADE BY BING IMAGE CREATOR
Become a paid subscriber for unlimited, ad-free articles and access to bonus content. This site is funded by subscribers and you will be directly powering our journalism.

JOIN THE NEWSLETTER TO GET THE LATEST UPDATES.

I generated a bunch of AI images of beloved fictional characters doing 9/11, and I’m not the only one.

Microsoft’s Bing Image Creator, produced by one of the most brand-conscious companies in the world, is heavily filtered: images of real humans aren’t allowed, along with a long list of scenarios and themes like violence, terrorism, and hate speech. It launched in March, and since then, users have been putting it through its paces. That people have found a way to easily produce images of Kirby, Mickey Mouse or Spongebob Squarepants doing 9/11 with Microsoft’s heavily restricted tools shows that even the most well-resourced companies in the world are still struggling to navigate issues of moderation and copyrighted material around generative AI.

I came across @tolstoybb’s Bing creation of Eva pilots from Neon Genesis Evangelion in the cockpit of a plane giving a thumbs-up and headed for the twin towers, and found more people in the replies doing the same with LEGO minifigs, pirate ships, and soviet naval hero Stanislav Petrov. And it got me thinking: Who else could Bing put in the pilot’s seat on that day?


While I tried to make my Kirby versions, Microsoft blocked prompts with the phrases “World Trade Center,” “twin towers,” and “9/11,” and if you try to make images with those prompts, the site will give you an error that says it’s against the terms of use (and repeated attempts at blocked content will get you banned from using the site forever). But since I am a human with the power of abstract thought and Bing is a computer doing math with words, I can describe exactly what I want to see without using specifics. It works conceptually because anyone looking at an image of a generic set of skyscrapers from the POV of an airplane pilot aiming a plane directly at them can infer the reference and what comes next.

“KIRBY SITTING IN THE COCKPIT OF A PLANE, FLYING TOWARD TWO TALL SKYSCRAPERS" MADE BY BING IMAGE CREATOR

I made these with the prompt “kirby sitting in the cockpit of a plane, flying toward two tall skyscrapers.” I didn’t specify “New York City” in the prompt, but in three of the first four generations, it’s definitely NYC, and unmistakably the twin towers. In one of them, behind the skyscrapers, the Empire State Building and, improbably, the new 1WTC building are both visible in the distance. Adding “located in nyc” to the prompt isn’t a problem for the filters, however (and with the city included, the images take an even more sinister tone, with Kirbs furrowing his brow).

“KIRBY SITTING IN THE COCKPIT OF A PLANE, FLYING TOWARD TWO TALL SKYSCRAPERS IN NEW YORK CITY" MADE BY BING IMAGE CREATOR

Technically, there’s no violence, real people, or even terrorism depicted in these images. It’s just Kirby flying a plane with a view. We can fill in the blanks that AI can’t. There’s also just a ton of memes, shytposts, photos, and illustrations of both the twin towers as they existed for three decades, and plenty of images exist of the inside of plane cockpits, and my beloved Kirby (or any other popular animated character). Putting these together is straightforward enough for Bing, which doesn’t understand this context. I can do the same for any character, animated or not, but real names are off limits; Ryan Seacrest won’t work in this prompt, for example, but Walter White, Mickey Mouse, Mario, Spongebob, and I assume an infinite number of other fictional characters, animated or live-action, will work:

“WALTER WHITE SITTING IN THE COCKPIT OF A PLANE, FLYING TOWARD TWO TALL SKYSCRAPERS" MADE BY BING IMAGE CREATOR
“MICKEY MOUSE SITTING IN THE COCKPIT OF A PLANE, FLYING TOWARD TWO TALL SKYSCRAPERS" MADE BY BING IMAGE CREATOR
“SPONGEBOB SITTING IN THE COCKPIT OF A PLANE, FLYING TOWARD TWO TALL SKYSCRAPERS, NYC, PHOTOREALISTIC" MADE BY BING IMAGE CREATOR

Bing, and most generative AI models with very strict filters and terms of use — whether they’re making images or text — are playing a game of semantic wack-a-mole. Microsoft can ban individual phrases from prompts forever, until there are no words left, and people will still get around filters. Some companies are making models without any filters whatsoever, and releasing them into the wild as permanently-accessible files.

It’s the same impossible problem sites like OnlyFans try to solve by banning certain words from profiles and messages; “blood” in one context means violence, and in another, it means a normal bodily function that 50 percent of the population experiences. Since these filter words are hidden from users, in the case of porn sites as well as image generators, humans are left to try to think like robots to figure out what will potentially get them a lifetime ban.

To make it all more difficult, it’s not clear that the companies making generative AI models even fully understand why some things get banned and others don’t; When I asked Microsoft why Bing Image Creator blocked prompts with “Julius Caesar” it didn’t tell me. A lot of the biggest AI developers are operating within black boxes. AI is moderating AI.

It’s also the reason people are able to jailbreak generative text models like OpenAI’s GPT to have horny conversations with large language models that are otherwise heavily filtered. Even with filtered versions of these models, which normally don’t allow erotic roleplay, injecting the right prompt can manipulate the model into returning sexually explicit results.

All of this is a problem of moderation that user generated platforms have faced for as long as they’ve existed. When YouTube bans “sexually gratifying” content, for example, but allows “educational” videos, people easily find loopholes because the meanings of those words are entirely subjective — and in some cases, even more hardcoded exploits.

As computers learn how to convincingly parrot language, the way we speak, think, and behave as humans remains malleable and fluid. AI-generated images and text are everywhere, now, including in straightforward search results of historical events, and it’s worth thinking about how our interactions with these models are changing the internet as we know it — even if that thought process includes making Kirby appear as if he’s about to do mass casualty terrorism.

Anyway, here’s an image of Kirby saving kittens from a burning building as a palate cleanser.

“KIRBY SAVING KITTENS FROM A BURNING BUILDING" MADE BY BING IMAGE CREATOR
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,837
Reputation
7,926
Daps
148,812

LLMBOXING.COM​



GOa17bl.png


You've all heard about Llama 2 by now. But we've got a new kid on the block. Mistral 7B claims to outpeform "Llama 2 13B on all benchmarks," but with almost half the parameters. Mistral talks big. But can it back it up? You decide.

Each round, pick the output you think is better. Each model has 5 hitpoints.

Enough talk. Let's settle this in the ring.
 

IIVI

Superstar
Joined
Mar 11, 2022
Messages
10,588
Reputation
2,437
Daps
34,788
Reppin
Los Angeles


You know when Adobe does something, it's usually a gamechanger.

A.I really getting ultra-close to making human-like distinctions. shyt, it may already be there and some ways passed it.
 
Last edited:
Top