bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,613
Daps
161,856


Bringing LLM Fine-Tuning and RLHF to Everyone​

June 5, 2023

Argilla Team
Through months of fun teamwork and learning from the community, we are thrilled to share our biggest feature to date: Argilla Feedback.
Argilla Feedback is completely open-source and the first of its kind at the enterprise level. With its unique focus on scalable human feedback collection, Argilla Feedback is designed to boost the performance and safety of Large Language Models (LLMs).

In recent months, interest in applications powered by LLMs has skyrocketed. Yet, this excitement has been tempered by reality checks underlining the critical role of evaluation, alignment, data quality, and human feedback.

At Argilla, we believe that rigorous evaluation and human feeedback are indispensable for transitioning from LLM experiments and proofs-of-concept to real-world applications.

When it comes to deploying safe and reliable software solutions, there are few shortcuts, and this holds true for LLMs. However, there is a notable distinction: for LLMs, the primary source of reliability, safety, and accuracy is data.

After training their latest model, OpenAI dedicated several months to refine its safety and alignment before publicly releasing ChatGPT. The global success of ChatGPT heavily leaned on human feedback for model alignment and safety, which illustrates the crucial role this approach plays in successful AI deployment.

Perhaps you assume only a handful of companies have the resources for this. Yet, there's encouraging news: open-source foundation models are growing more powerful every day, and even small quantities of high-quality, expert-curated data can make LLM accurately follow instructions. So, unless you're poised to launch the next ChatGPT competitor, incorporating human feedback for specific domains is within reach and Argilla is your key to deploying LLM use cases, safely and effectively. Eager to understand why? Read on to discover more!
You can add unlimited users to Argilla so it can be used to seamlessly distribute the workload among hundreds of labelers or experts within your organization. Similar efforts include Dolly from Databricks or OpenAssistant. If you’d like help setting up such an effort, reach out to us and we’ll gladly help out.



Argilla Feedback in a nutshell

Argilla Feedback is purpose-built to support customized and multi-aspect feedback in LLM projects. Serving as a critical solution for fine-tuning and Reinforcement Learning from Human Feedback (RLHF), Argilla Feedback provides a flexible platform for the evaluation, monitoring, and fine-tuning tailored to enterprise use cases.

Argilla Feedback boosts LLMs use cases through:

LLM Monitoring and Evaluation: This process assesses LLM projects by collecting both human and machine feedback. Key to this is Argilla's integration with 🦜🔗 LangChain, which ensures continuous feedback collection for LLM applications.

Collection of Demonstration Data: It facilitates the gathering of human-guided examples, necessary for supervised fine-tuning and instruction-tuning.

Collection of Comparison Data: It plays a significant role in collecting comparison data to train reward models, a crucial component of LLM evaluation and RLHF.

Reinforcement Learning: It assists in crafting and selecting prompts for the reinforcement learning stage of RLHF

Custom LLMs. We think language models will be fine-tuned in-house and tailored to the requirements of enterprise use cases. To achieve this you need to think about data management and curation as an essential component of the MLOps (or should we say LLMOps) stack.
Throughout these phases, Argilla Feedback streamlines the process of collecting both human and machine feedback, improving the efficiency of LLM refinement and evaluation. The figure below visualizes the key stages in training and fine-tuning LLMs. It highlights the data and expected outcomes at each stage, with particular emphasis on points where human feedback is incorporated.

LLM development stages
LLM development stages, pioneered by the InstructGPT paper, leading to ChatGPT. This figure is adapted from Chip Huyen’s brilliant post “RLHF: Reinforcement Learning from Human Feedback”

Domain Expertise vs Outsourcing. In Argilla, the process of data labeling and curation is not a single event but an iterative component of the ML lifecycle, setting it apart from traditional data labeling platforms. Argilla integrates into the MLOps stack, using feedback loops for continuous data and model refinement. Given the current complexity of LLM feedback, organizations are increasingly leveraging their own internal knowledge and expertise instead of outsourcing training sets to data labeling services. Argilla supports this shift effectively.
Read on as we detail how Argilla Feedback works, using two example use cases: supervised fine-tuning and reward modelling.

A note to current Argilla users - Argilla Feedback is a new task, fully integrated with the Argilla platform. If you know Argilla already, you can think of Argilla Feedback as a supercharged version of the things our users already love. In fact, it sets the stage for Argilla 2.0, which will integrate other tasks like Text Classification and Token Classification in a more flexible and powerful manner.

Supervised fine-tuning

The importance of demonstration data - prompts and demonstrations - is paramount for improving LLMs. This data aids in supervised fine-tuning, also known as instruction-tuning or behavior cloning, where models learn to respond to instructions based on human examples.

Quality over quantity. Recent research suggests that a dataset of 1,000-2,000 diverse and consistent examples can go a long way in improving instruction-following abilities and response style, surpassing the need for larger demonstration sets used by models like InstructGPT and Dolly.
To visualize the desired effect of supervised fine-tuning, let's see two examples generated by LLMs. First, the model Falcon-7B is not fine-tuned for the following instructions. We're using "Email:" at the end of the prompt to nudge the model into writing an email, but it's still not giving us the results we want:

PromptCompletion
Write a follow-up for a sales email. Email:I was just curious if you had any questions about the project I discussed with you on the phone yesterday. Please let me know if I can help you with anything else, or any further details that will help you move forward with your project. Thanks again for your time, and have a great weekend, -Sarah
And the same model after instruction-tuning, the Falcon-7B-instruct model:

PromptCompletion
Write a follow-up for a sales emailDear Customer Name,

Thank you for purchasing Product Name from us last week. We hope you have been enjoying using it!

If you have any questions or feedback about your purchase, please do not hesitate to let us know. We are always happy to help.

Best regards,
Sales Team
The steps to build this workflow are: configure the dataset to ask for completions, add records, gathering feedback from labelers, prepare the dataset, and fine-tuning the LLM. Read on to learn more about each step and see below a visual representation.

comparison-data
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,613
Daps
161,856


translation:
This is a video using Stable Diffusion Gen1. It's like magic to become a different person in another world with a snap of your finger! What kind of transformation would you like to see? #StableDiffusion #生成AI

:ohhh:
edit:
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,613
Daps
161,856

Meta just released MusicGen, a simple and controllable model for music generation

MusicGen is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike existing methods like MusicLM, MusicGen doesn't not require a self-supervised semantic representation, and it generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, can predict them in parallel, thus having only 50 auto-regressive steps per second of audio

Demo:


About​

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

MusicGen​

Audiocraft provides the code and models for MusicGen, a simple and controllable model for music generation. MusicGen is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike existing methods like MusicLM, MusicGen doesn't not require a self-supervised semantic representation, and it generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict them in parallel, thus having only 50 auto-regressive steps per second of audio. Check out our sample page or test the available demo!
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,613
Daps
161,856

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,613
Daps
161,856


Shishir G. Patil*, Tianjun Zhang*, Xin Wang, Joseph E. Gonzalez

*Equal Contribution​

UC Berkeley, Microsoft Research​

sgp@berkeley.edu, tianjunz@berkeley.edu


GitHub Paper Demo Video Gorilla-Spotlight Signup Discord

An API Appstore for LLMs​


Gorilla LLM logo

Gorilla is a LLM that can provide the appropriate API calls. It is trained on three massive machine learning hub datasets: Torch Hub, TensorFlow Hub and HuggingFace. Zero-shot Gorilla outperforms GPT-4, Chat-GPT and Claude. Gorilla is extremely reliable, and significantly reduces the hallucination errors.

We are excited to hear your feedback and we welcome API contributions as we build this open-source project. Join us on Discord or feel free to email us!​


Abstract​

Large Language Models (LLMs) have seen an impressive wave of advances recently, with models now excelling in a variety of tasks, such as mathematical reasoning and program synthesis. However, their potential to effectively use tools via API calls remains unfulfilled. This is a challenging task even for today's state-of-the-art LLMs such as GPT-4, largely due to their inability to generate accurate input arguments and their tendency to hallucinate the wrong usage of an API call. We release Gorilla, a finetuned LLaMA-based model that surpasses the performance of GPT-4 on writing API calls. When combined with a document retriever, Gorilla demonstrates a strong capability to adapt to test-time document changes, enabling flexible user updates or version changes. It also substantially mitigates the issue of hallucination, commonly encountered when prompting LLMs directly. To evaluate the model's ability, we introduce APIBench, a comprehensive dataset consisting of HuggingFace, TorchHub, and TensorHub APIs. The successful integration of the retrieval system with Gorilla demonstrates the potential for LLMs to use tools more accurately, keep up with frequently updated documentation, and consequently increase the reliability and applicability of their outputs. The model and code of Gorilla are available at GitHub - ShishirPatil/gorilla: Gorilla: An API store for LLMs.

{continue reading on the site}







About​

Gorilla: An API store for LLMs


Gorilla: Large Language Model Connected with Massive APIs [Project Website]​



🟢 Gorilla is Apache 2.0 With Gorilla being fine-tuned on MPT, and Falcon, you can use Gorilla commercially with no obligations! ⛳

🚀 Try Gorilla in 60s Colab

🗞️ Checkout our paper! arXiv

👋 Join our Discord! Discord

Gorilla enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke. With Gorilla, we are the first to demonstrate how to use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. We also release APIBench, the largest collection of APIs, curated and easy to be trained on! Join us, as we try to expand the largest API store and teach LLMs how to write them! Hop on our Discord, or open a PR, or email us if you would like to have your API incorporated as well.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,613
Daps
161,856

‘Unlocking the new next frontier’: UC Berkeley researchers develop innovative AI ‘Gorilla’​


article image

KAYLA SIM | STAFF
The team behind Gorilla trained it on a specific training recipe and designed it to connect large language models, or LLMs, with services accessed through application programming interfaces, or APIs, according to Patil.


NATASHA KAYE
|

STAFF

JUNE 06, 2023

Researchers from the Sky Computing lab and the Berkeley AI Research, or BAIR, Lab recently released Gorilla, a large language model, or LLM, designed to revolutionize the way AI algorithms function, according to Shishir Patil, campus computer science doctoral student and project lead.

Since the release of OpenAI’s ChatGPT in November 2022, researchers around the world have been brainstorming ways to increase the efficiency and abilities of LLMs.

ChatGPT generates a response to the question a user asks based on what it learned during its training phase. While this question and answer function is popular given its novelty, Patil said looking forward, there are more useful functions for this technology.

“One example could be you want to book a flight ticket, right? Or you want to book a reservation at a restaurant. Now today, an LLM cannot do that because it cannot interact with the rest of the world. So that’s where Gorilla comes in. Gorilla is a large language model that trains LLMs how to interact with the rest of the world through tools,” Patil said.

The “tools” being used to teach this model are application programming interfaces, or APIs, which allows systems to communicate with one another, according to Patil.

The team behind Gorilla trained it on a specific training recipe and designed it to connect LLMs with services accessed through APIs, according to Patil. The models and code the team used for training are all open sourced — meaning they are available in the public domain — allowing for quick processing times.

Just this morning, the team released a newer model with an Apache-2.0 license, allowing it to be used commercially, according to Patil.

“We are studying ways to automatically integrate with the millions of services on the web by teaching LLMs to find and then read API documentation,” said Joseph Gonzalez, a professor in the electrical engineering and computer sciences department and the director of the Sky Computing lab, in an email.

In addition to Gorilla’s API capabilities, Patil noted the model can measure how much it “hallucinates,” or how often it relays made-up information.

Because LLMs are trained to generate their own answers, hallucinations are rather common. Gorilla, however, provides scientifically rigorous ways to determine exactly how much the model is hallucinating while also being proven to hallucinate less often than ChatGPT, according to Patil.

“As we are serving Gorilla to the outside world. We have multiple requests from Korea, Israel, obviously India, China and the Bay Area dominates,” Patil said.

“All of this is being sold on infrastructure that’s being provided by UC Berkeley and more specifically the Skylab that we’re all part of.”

The researchers behind Gorilla include Patil and Tianjun Zhang, a campus computer science doctoral students; Gonzalez, who is the lead faculty member on the project and Xin Wang, a senior researcher at Microsoft who was a doctoral student of Gonzalez’s at UC Berkeley.

Gonzalez noted the collaboration with Wang and her colleagues at Microsoft were “instrumental” to the success of Gorilla.

Patil noted the team named the project “Gorilla” because the animals use tools similarly to how they want their LLM to be used.

“This is like unlocking the new next frontier,” Patil said. “Before, LLMs were this closed box that could only be used within this domain. Now by teaching LLMs how to write thousands of APIs, we are, in some sense, unlocking what an LLM can do. Now it’s like there are no limits.”
 
Top