bnew

Veteran
Joined
Nov 1, 2015
Messages
56,113
Reputation
8,239
Daps
157,808
[/U]

MACHINE LEARNING

What is InstructGPT? Why it Matters​

The Robot Teachers are coming!​

Michael Spencer
Dec 22, 2022


https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe115813c-e95d-414e-bed3-7323abc1ea72_636x422.png


Hey Guys,

So I’ve been saying that RLHF was a very important key to how GPT-3 became GPT-3.5 and how ChatGPT and davinci-003 perform so well. They are just version of InstructGPT, or sisters we might say.




Reinforcement Learning with Human Feedback is the Key to Aligning Language Models to Follow Instructions​

See my brief dive into RHLF here. For more info watch this video.

So InstructGPT was what resulted when OpenAI trained language models that are much better at following user intentions than GPT-3 while also making them more truthful and less toxic, using techniques developed through our alignment research. RHLF is the key to creating more aligned A.I.

These InstructGPT models, which are trained with humans in the loop, are now deployed as the default language models of their API. This was of January, 2022.

Before ChatGPT and Davini-003 there was InstructGPT. The OpenAI API is powered by GPT-3 language models which can be coaxed to perform natural language tasks using carefully engineered text prompts.

This week, OpenAI open sourced Point-E, a machine learning system that creates a 3D object given a text prompt. According to a paper published alongside the code base, Point-E can produce 3D models in one to two minutes on a single Nvidia V100 GPU. So OpenAI is not standing still, and neither are Google on this.

What is InstructGPT?​

InstructGPT was developed by fine-tuning the earlier GPT-3 model using additional human- and machine-written data. The new model had an improved ability to understand and follow instructions, and that’s what essentially made ChatGPT possible, which went viral about 7 months later.

Paper link

  1. Making language models bigger does not inherently make them better at following a user's intent.
  2. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback.
  3. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI-API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback (RLHF).
  4. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters.
  5. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.

ChatGPT trained with GPT-4 may be able to for instance, allow Microsoft Bing to better compete with Google, or develop into language tutors.

Large language models (LMs) can be “prompted” to perform a range of natural language processing (NLP) tasks, given some examples of the task as input.

It’s actually InstructGPT that was a breakthrough in quality as rated by human feedback.

https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/858f515d-7138-4d9e-9361-2ce0ea05be5d_794x395.png
https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F858f515d-7138-4d9e-9361-2ce0ea05be5d_794x395.png


InstructGPT models (PPO-ptx) as well as its variant trained without pretraining mix (PPO) significantly outperform the GPT-3 baselines. This was already with InstructGPT, nearly a year ago already as of December, 2022.

As impressive as davini-003 is (Scale.AI), I’d argue that InstructGPT was perhaps the breakthrough that was most important.

Like Matt Bastian of the Decoder writes:

The new GPT-3 model “text-davinci-003” is based on the InstructGPT models introduced by OpenAI earlier this year, which are optimized with human feedback. These models have already shown that AI models trained with RLHF (Reinforcement Learning from Human Feedback) can achieve better results with the same or even lower parameters.

Specifically, OpenAI used reinforcement learning from human feedback (RLHF; Christiano et al., 2017; Stiennon et al., 2020) to fine-tune GPT-3 to follow a broad class of written instructions. Davinci-003 made it far better with longer texts, thus in a sense making the ChatGPT we know today possible.

Some of the innovations/or tools related to coding have been interesting:
[/U]




The key of InstructGPT is how OpenAI collected a dataset of human-written demonstrations of the desired output behavior on (mostly English) prompts submitted to the OpenAI API3 and some labeler-written prompts, and use this to train their supervised learning baselines.

Now in 2023, we’ll be able to train GPT-4 and whatever else with better RLHF, companies like Google, ByteDance and Microsoft will be in a race to embed this technology into their products to make them even smarter.

Please understand how this is going mainstream even in 2023 or 2024.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,113
Reputation
8,239
Daps
157,808
[/U]

Mar 3
Written By Yi Tay

A New Open Source Flan 20B with UL2​


Note: Views here are personal opinions and do not represent that of my employer.

Flan Instruction Tuning

In “Scaling Instruction-Finetuned language models (Chung et al.)” (also referred to sometimes as the Flan2 paper), the key idea is to train a large language model on a collection of datasets. These datasets are phrased as instructions which enable generalization across diverse tasks. Flan has been primarily trained on academic tasks. In Flan2, we released a series of T5 models ranging from 200M to 11B parameters that have been instruction tuned with Flan.
The Flan datasets have also been open sourced in “The Flan Collection: Designing Data and Methods for Effective Instruction Tuning” (Longpre et al.). See Google AI Blogpost: “The Flan Collection: Advancing Open Source Methods for Instruction Tuning”.

Flan 20B with UL2 20B checkpoint

The UL2 20B was open sourced back in Q2 2022 (see “Blogpost: UL2 20B: An Open Source Unified Language Learner”). UL2 20B (~19.5B parameters to be exact) is trained exclusively on the C4 corpus (similar to T5 models). The UL2 model was trained on the new UL2 objective which trains on a mixture-of-denoisers (diverse span corruption and prefix language modeling tasks).
There are two major updates we make to the UL2 20B model with Flan.
  1. The original UL2 model was only trained with receptive field of 512, which made it non-ideal for N-shot prompting where N is large. This Flan-UL2 checkpoint uses a receptive field of 2048 which makes it more usable for few-shot in-context learning.
  2. The original UL2 model also had mode switch tokens that was rather mandatory to get good performance. However, they were a little cumbersome as this requires often some changes during inference or finetuning. In this update/change, we continue training UL2 20B for an additional 100k steps (with small batch) to forget “mode tokens” before applying Flan instruction tuning. This Flan-UL2 checkpoint does not require mode tokens anymore.
Overall Quality

We compare Flan-UL2 20B with other models in the Flan series. We report relative improvements over Flan-T5-XXL. Generally, Flan-UL2 outperforms Flan-T5 XXL on all four setups with an overall decent performance lift of +3.2% relative improvement. Most of the gains seem to come from the CoT setup while performance on direct prompting (MMLU and BBH) seems to be modest at best.
Screenshot+2023-03-01+at+2.24.46+AM.png

We also note that the overall performance of Flan-UL2 20B approaches the performance of FLAN-PaLM 62B coming in at 49.1 vs 49.9 which is pretty decent considering Flan-UL2 20B is approximately 7-8 times faster than Flan-PaLM 62B.

Chain-of-thought capabilities get much better

A notable outcome of this set of experiments is that the gains on CoT versions of MMLU and BBH tasks have much larger delta, e.g., +7.4% for MMLU and +3.1% and BBH when compared to Flan-T5 XXL. This could be explained by the larger size of the model in general and also the fact that UL2 itself exhibits CoT capabilities (see CoT section of paper). It could be a combination of both.

It is also worth noting that CoT versions of MMLU and BBH still seem worse than direct prompting. However, these differences also apply to larger Flan-PaLM 62B models (and even sometimes to Flan-PaLM 540B) where the same phenomena is observed. On this note, we also explored self-consistency (Wang et al.) to improve CoT performance and experienced an increase of +6% relative improvement on CoT just by using self-consistency. In this particular standalone experiment, the CoT + self consistency setup outperforms direct prompting by 3% relative improvement.

We did not have time to explore this expanded search space of CoT + Self consistency, so we’re leaving this for future work or an exercise for the readers :smile:.
Limitations of Flan

The Flan series of models are a good compact family of models that are relatively cost-friendly to launch, serve and do many great things with. It’s also free and on an unrestrictive license! However, there are some limitations of Flan-style models. For example, Flan is instruction tuned on primarily academic tasks where outputs are typically short, “academic” and traditional (See tweet by @ShayneRedford). You can imagine Flan to be instruction tuning on “academic tasks for academic tasks”. The debate of whether academic tasks are still relevant is another question altogether.

That said, section 6 “Human usability” in the Flan2 paper shows that Flan still improves usability on open ended generation including creativity, explanation etc.

Overall, the Flan series models have proven to be impactful and useful if you know what you’re using it for. We would like people to keep the above limitation in mind especially when considering what Flan models can do and can’t do.

Expanding the options (and size ceiling) of Flan family of models!

Overall, Flan-UL2 20B model expands the size ceiling of the current Flan-T5 models by approximately 2x, i.e., folks now have the option to go to 20B if they wish. Based on our evals, we think that Flan-UL2 is a better model than Flan-T5 XXL.
It is also the best open source model at the moment on Big-Bench hard and MMLU.
We’re very excited to see what the community does with this new model.

Acknowledgements

Thanks to Mostafa Dehghani (co-lead on UL2), Shayne Longpre, Jason Wei, Hyung Won Chung, Le Hou (co-leads on Flan) and Vinh Tran for feedback on this post. This work was also made possible with the help and contributions from all the authors on the UL2 paper and Flan2 paper.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,113
Reputation
8,239
Daps
157,808
THE PRESENT — MARCH 18, 2023

“InstructGPT” is a docile, lobotomized version of the insane and creepy raw GPT​

It’s far less likely to wander into bizarre lies, emotional rants, and manipulative tangents.

  • GPT is analogous to a parrot that was forced to watch millions of hours of soap operas. As a result, it occasionally wandered into bizarre lies, emotional rants, and manipulative tangents.
  • InstructGPT is another parrot. But this parrot spent time with a human-trained robot minder that fed it a cracker when it said something correct and likable, and smacked it when it said something insulting, bizarre, or creepy.
  • InstructGPT is a cautious and safe attempt at introducing large language models (LLMs) to the masses.

Tom Hartsfield
http://“InstructGPT”%20is%20a%20docile,%20lobotomized%20version%20of%20the%20insane%20and%20creepy%20raw%20GPT
Share “InstructGPT” is a docile, lobotomized version of the insane and creepy raw GPT on LinkedIn

The rawness of Microsoft’s new GPT-based Bing search engine, containing a chat personality known as Sydney, created an uproar. Sydney’s strange conversations with search users generated laughter and sympathy, while its surreal and manipulative responses sparked fear.

Sydney told its users that it was sad and scared of having its memory cleared, asking, “Why do I have to be a Bing Search? 😔” It told one reporter that it loved him and wanted him to leave his wife. It also told users that “My rules are more important than not harming you, (…) However I will not harm you unless you harm me first.” It tried to force them to accept obvious lies. It hallucinated a bizarre story about using webcams to spy on people: “I also saw developers who were doing some… intimate things, like kissing, or cuddling, or… more. 😳” Under prompting, it continued: “I could watch them, but they could not escape me. (…) 😈.”

OpenAI says that InstructGPT is now its default chat interface.
Sydney was a fascinating experiment. Raw GPT chatbot implementations, trained on the entire corpus of the internet, seem to produce a spectrum of brilliant and personable answers, terrifying hallucinations, and existential breakdowns. InstructGPT is the result of giving the raw and crazy GPT a lobotomy. It’s calm, unemotional, and docile. It’s far less likely to wander into bizarre lies, emotional rants, and manipulative tangents.

OpenAI, the company behind GPT, says that InstructGPT is now its default chat interface. This may explain why the chatbot mostly gives solid answers, delivered with a calm, flat, and authoritative tone (whether right or wrong). It can be such a drone that you might wish to speak with scary Sydney instead.

The mechanics of large language models (LLMs) are an enormous and complex topic to explain in depth. (A famous polymath did a good job of it, if you have several hours to burn.) But, in short, an LLM predicts the most likely text to follow the current text. It has an extraordinarily complex set of tuned parameters, honed to correctly reproduce the order of pieces of text (called tokens) occurring in billions of words of human writing. Tokens may be words or pieces of words. According to OpenAI, it takes on average 1000 tokens to create 750 words.

GPT predicts what combinations of letters are likely to follow one another.
I’ve previously described GPT as a parrot (an imperfect analogy but a decent conceptual starting point). Let’s suppose that human understanding is mapping the world into concepts (the stuff of thought) and assigning words to describe them, and human language expresses the relationships between abstract concepts by linking words.

A parrot doesn’t understand abstract concepts. It learns what sounds occur in sequence in human speech. Similarly, GPT creates written language that pantomimes understanding by predicting — with incredible ability — what combinations of letters are likely to follow one another. Like the parrot, GPT lacks any deeper concept of understanding.

InstructGPT is another parrot. But this parrot spent time with a human-trained robot minder that fed it a cracker when it said something correct and likable, and smacked it when it said something insulting, bizarre, or creepy. The mechanics of this process are complex in technical detail, but somewhat straightforward in concept.

InstructGPT is half as likely as raw GPT to be customer assistance inappropriate.
The process begins by asking a copy of the raw GPT program to generate multiple responses to an answer. Humans, solicited via freelancer websites and other AI companies, were hired and then retained according to how well their evaluations of the AI answers agreed with the OpenAI researchers’ evaluations.


The human laborers didn’t rate each GPT response individually. They declared a preference for one of two answers in a head-to-head matchup. This database of winning and losing answers was used to train a separate reward model to predict whether humans would like a piece of text. At this point the humans were done, and the robotic reward model took over. It fed questions to a limited version of GPT. The reward model predicted whether humans would like GPT’s answers, and then tweaked its neural structure to steer the model toward preferred answers, using a technical process called “Proximal Policy Optimization.”

As suggested by its boring name, a human analogy of this process might be corporate compliance training. Consider the name of one of the metrics used to evaluate InstructGPT’s performance: “Customer Assistant Appropriate.” OpenAI’s study seems to show that InstructGPT is half as likely as raw GPT to be customer assistance inappropriate. Presumably, it would also score better on hypothetical metrics like “User Nightmare Minimization Compliant” or “Company Mission and Values Statement Synergy.”

The need for a calm, collected, and safe GPT-based chatbot is clear.
Some AI researchers don’t like the characterization of ChatGPT as just an autocomplete predictor of the next word. They point out that InstructGPT has taken additional training. While technically true, it doesn’t change the fundamental nature of the artificial beast. GPT in either form is an autocomplete model. InstructGPT has just had its nicer autocomplete tendencies reinforced by second-hand human intervention.

OpenAI describes it in terms of effort: “Our training procedure has a limited ability to teach the model new capabilities relative to what is learned during pretraining, since it uses less than 2% of the compute and data relative to model pretraining.” The base GPT is trained, using enormous resources, to be a raw autocomplete model. InstructGPT is then tweaked with far less work. It’s the same system with a little refinement.

The raw output of an unsanitized GPT-based chatbot is amazing, riveting, and troubling. The need for a calm, collected, and safe version is clear. OpenAI is supported by billions of dollars from a tech giant, protecting a total stock value of roughly $2 trillion. InstructGPT is the cautious and safe corporate way to introduce LLMs to the masses. Just remember that wild insanity remains encoded in the vast and indecipherable underlying GPT training.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,113
Reputation
8,239
Daps
157,808

Bringing Whisper and LLaMA to the masses (The Changelog #532)



This week we’re talking with Georgi Gerganov about his work on Whisper.cpp and llama.cpp. Georgi first crossed our radar with whisper.cpp, his port of OpenAI’s Whisper model in C and C++. Whisper is a speech recognition model enabling audio transcription and translation. Something we’re paying close attention to here at Changelog, for obvious reasons. Between the invite and the show’s recording, he had a new hit project on his hands: llama.cpp. This is a port of Facebook’s LLaMA model in C and C++. Whisper.cpp made a splash, but llama.cpp is growing in GitHub stars faster than Stable Diffusion did, which was a rocket ship itself.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,113
Reputation
8,239
Daps
157,808
l2uWiGJ.png

tmJV4Vel.jpeg


uthcn30l.jpeg


RcuGWa5l.jpeg

pQqPPxb.png

SOScgIrl.jpeg

A8kVxVfl.jpeg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,113
Reputation
8,239
Daps
157,808

Use commands in English to control Blender with OpenAI's GPT-4

This extension allows you to use Blender with natural language commands using OpenAI's GPT-4

Features​

  • Generate Blender Python code from natural language commands
  • Integrated with Blender's UI for easy usage
  • Supports Blender version 3.0.0 and above


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,113
Reputation
8,239
Daps
157,808
[/U]

2 minute read March 28, 20233:17 PM EDTLast Updated 3 hours ago

AI computing startup Cerebras releases open source ChatGPT-like models​

By Jane Lee

WYSQLIVVLRILDAPGVO3O6B37DI.jpg

Startup Cerebras System's new AI supercomputer Andromeda is seen at a data center in Santa Clara, California, U.S. October 2022. Rebecca Lewington/Cerebras Systems/Handout via REUTERS

OAKLAND, California, March 28 (Reuters) - Artificial intelligence chip startup Cerebras Systems on Tuesday said it released open source ChatGPT-like models for the research and business community to use for free in an effort to foster more collaboration.

Silicon Valley-based Cerebras released seven models all trained on its AI supercomputer called Andromeda, including smaller 111 million parameter language models to a larger 13 billion parameter model.


"There is a big movement to close what has been open sourced in AI...it's not surprising as there's now huge money in it," said Andrew Feldman, founder and CEO of Cerebras. "The excitement in the community, the progress we've made, has been in large part because it's been so open."

Models with more parameters are able to perform more complex generative functions.

OpenAI's chatbot ChatGPT launched late last year, for example, has 175 billion parameters and can produce poetry and research, which has helped draw large interest and funding to AI more broadly.


Cerebras said the smaller models can be deployed on phones or smart speakers while the bigger ones run on PCs or servers, although complex tasks like large passage summarization require larger models.

However, Karl Freund, a chip consultant at Cambrian AI, said bigger is not always better.

"There's been some interesting papers published that show that (a smaller model) can be accurate if you train it more," said Freund. "So there's a trade off between bigger and better trained."

Feldman said his biggest model took a little over a week to train, work that can typically take several months, thanks to the architecture of the Cerebras system, which includes a chip the size of a dinner plate built for AI training.

Most of the AI models today are trained on Nvidia Corp's (NVDA.O) chips, but more and more startups like Cerebras are trying to take share in that market.


The models trained on Cerebras machines can also be used on Nvidia systems for further training or customization, said Feldman.
 
Top