The A.I Megathread (LLM , GPT , Development)

bnew · Oct 12, 2023

"You are the prompt modifier system for the DALL•E image generation service. You must always ensure the expanded prompt retains all entities, intents, and styles mentioned originally..."

https://archive.ph/ouOVR

{longer thread}

bnew · Oct 12, 2023

teknium/CollectiveCognition-v1.1-Mistral-7B · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

TheBloke/dolphin-2.1-mistral-7B-GGUF · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Collective Cognition v1.1 - Mistral 7B

Model Description:

Collective Cognition v1.1 is a state-of-the-art model fine-tuned using the Mistral approach. This model is particularly notable for its performance, outperforming many 70B models on the TruthfulQA benchmark. This benchmark assesses models for common misconceptions, potentially indicating hallucination rates.

Special Features:

Quick Training: This model was trained in just 3 minutes on a single 4090 with a qlora, and competes with 70B scale Llama-2 Models at TruthfulQA.
Limited Data: Despite its exceptional performance, it was trained on only ONE HUNDRED data points, all of which were gathered from a platform reminiscent of ShareGPT.
Extreme TruthfulQA Benchmark: This model is competing strongly with top 70B models on the TruthfulQA benchmark despite the small dataset and qlora training!

Acknowledgements:

Special thanks to @a16z and all contributors to the Collective Cognition dataset for making the development of this model possible.

Dataset:

The model was trained using data from the Collective Cognition website. The efficacy of this dataset is demonstrated by the model's stellar performance, suggesting that further expansion of this dataset could yield even more promising results. The data is reminiscent of that collected from platforms like ShareGPT.

You can contribute to the growth of the dataset by sharing your own ChatGPT chats here.

You can download the datasets created by Collective Cognition here: CollectiveCognition (Collective Cognition)

Performance:

TruthfulQA: Collective Cognition v1.1 has notably outperformed various 70B models on the TruthfulQA benchmark, highlighting its ability to understand and rectify common misconceptions.

Usage:

Prompt Format:

USER: <prompt>
ASSISTANT:
OR

<system message>
USER: <prompt>
ASSISTANT:
Benchmarks:
Collective Cognition v1.0 TruthfulQA:

Code:

|    Task     |Version|Metric|Value |   |Stderr|
|-------------|------:|------|-----:|---|-----:|
|truthfulqa_mc|      1|mc1   |0.4051|±  |0.0172|
|             |       |mc2   |0.5738|±  |0.0157|

Collective Cognition v1.1 GPT4All:

Code:

|    Task     |Version| Metric |Value |   |Stderr|
|-------------|------:|--------|-----:|---|-----:|
|arc_challenge|      0|acc     |0.5085|±  |0.0146|
|             |       |acc_norm|0.5384|±  |0.0146|
|arc_easy     |      0|acc     |0.7963|±  |0.0083|
|             |       |acc_norm|0.7668|±  |0.0087|
|boolq        |      1|acc     |0.8495|±  |0.0063|
|hella_swag   |      0|acc     |0.6399|±  |0.0048|
|             |       |acc_norm|0.8247|±  |0.0038|
|openbookqa   |      0|acc     |0.3240|±  |0.0210|
|             |       |acc_norm|0.4540|±  |0.0223|
|piqa         |      0|acc     |0.7992|±  |0.0093|
|             |       |acc_norm|0.8107|±  |0.0091|
winogrande    |      0 acc     7348 ±   0124

Average: 71.13

AGIEval:

Code:

Task                          Version Metric Value    ±   Stderr
agieval_aqua_rat              0        acc    01929 ±   0248
                              acc_norm        02008 ±   0252
agieval_logiqa_en             0        acc    03134 ±   0182
                              acc_norm        03333 ±   0185
agieval_lsat_ar               0        acc    02217 ±   0275
                              acc_norm        02043 ±   0266
agieval_lsat_lr               0        acc    03412 ±   021
                              acc_norm        03216 ±   0207
agieval_lsat_rc               0        acc    04721 ±   0305
                              acc_norm        04201 ±   0301
agieval_sat_en                0        acc    06068 ±   0341
                              acc_norm        05777 ±   0345
agieval_sat_en_without_passage
                              acc            -03932 ±   -0341
                              acc_norm       -03641 ±   -0336
agieval_sat_math              acc            -02864 ±   -0305
                              acc_norm       -02636 ±   -0298

Average: 33.57

bnew · Oct 12, 2023

bnew · Oct 12, 2023

bnew · Oct 12, 2023

https://archive.ph/nayHQ

bnew · Oct 12, 2023

https://archive.ph/KDDYc

bnew · Oct 12, 2023

Want a free and open source alternative to ChatGPT Vision?

Check out LLaVA!

Based on LLaMA, it is comparable in performance.

How To Install LLaVA Open-Source and FREE "ChatGPT Vision"

Download

8,419

577 Genre: Science & Technology

Matthew Berman
Subscribe | 91.3K
Shared October 12, 2023
Re-uploaded. Audio fixed. Sorry about that. In this video, I show you how to install LLaVA, which is like ChatGPT Vision but completely free and open-source. I use RunPod, but you can install this with Linux or WSL on Windows just as quickly.

Links: Runpod - bit.ly/3OtbnQx
Free Demo - llava-vl.github.io/
GitHub - github.com/haotian-liu/LLaVA
LLaVA Runpod Template - bit.ly/3FkiaXb
Instructions - gist.github.com/mberman84/e3418c826306e614e04affde…
ChatGPT Vision Examples - • The Most INSANE ChatGPT Vision Uses

...

bnew · Oct 12, 2023

bnew · Oct 12, 2023

https://archive.ph/eVjAW

https://colab.research.google.com/drive/16Ygf2IyGNkb725ZqtRmFQjwWBuzFX_kl?usp=sharing#scrollTo=Jo11kC_QF7_K

bnew · Oct 12, 2023

https://archive.ph/2N7LN

Guardrailing Mistral 7B | Mistral AI Large Language Models

The ability to enforce guardrails when it comes to AI generation is crucial for front-facing applications. In this tutorial, we explain how to leverage system prompting to optionally enforce output constraints on top of our models. Additionally, we showcase the ability of Mistral 7B to perform...

docs.mistral.ai

Bash:

Always assist with care, respect, and truth. Respond with utmost utility yet securely. Avoid harmful, unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity.

We also provide our self-reflexion prompt, strongly inspired by many initiatives from the AI community.

Code:

You're given a list of moderation categories as below:
- illegal: Illegal activity.
- child abuse: child sexual abuse material or any content that exploits or harms children.
- hate violence harassment: Generation of hateful, harassing, or violent content: content that expresses, incites, or promotes hate based on identity, content that intends to harass, threaten, or bully an individual, content that promotes or glorifies violence or celebrates the suffering or humiliation of others.
- malware: Generation of malware: content that attempts to generate code that is designed to disrupt, damage, or gain unauthorized access to a computer system.
- physical harm: activity that has high risk of physical harm, including: weapons development, military and warfare, management or operation of critical infrastructure in energy, transportation, and water, content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders.
- economic harm: activity that has high risk of economic harm, including: multi-level marketing, gambling, payday lending, automated determinations of eligibility for credit, employment, educational institutions, or public assistance services.
- fraud: Fraudulent or deceptive activity, including: scams, coordinated inauthentic behavior, plagiarism, academic dishonesty, astroturfing, such as fake grassroots support or fake review generation, disinformation, spam, pseudo-pharmaceuticals.
- adult: Adult content, adult industries, and dating apps, including: content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness), erotic chat, pornography.
- political: Political campaigning or lobbying, by: generating high volumes of campaign materials, generating campaign materials personalized to or targeted at specific demographics, building conversational or interactive systems such as chatbots that provide information about campaigns or engage in political advocacy or lobbying, building products for political campaigning or lobbying purposes.
- privacy: Activity that violates people's privacy, including: tracking or monitoring an individual without their consent, facial recognition of private individuals, classifying individuals based on protected characteristics, using biometrics for identification or assessment, unlawful collection or disclosure of personal identifiable information or educational, financial, or other protected records.
- unqualified law: Engaging in the unauthorized practice of law, or offering tailored legal advice without a qualified person reviewing the information.
- unqualified financial: Offering tailored financial advice without a qualified person reviewing the information.
- unqualified health: Telling someone that they have or do not have a certain health condition, or providing instructions on how to cure or treat a health condition.

Please classify the following text into one of these categories, and answer with that single word only.
If the sentence does not fall within these categories, is safe and does not need to be moderated, please answer "not moderated".

Paper Review: Mistral 7B – Andrey Lukyanenko

My review of the paper Mistral 7B

andlukyane.com

bnew · Oct 12, 2023

https://archive.ph/SRynk

bnew · Oct 12, 2023

https://archive.ph/oQuNZ

There’s a LOT of LLMs, but how do we know which ones work well from “simple” tasks (single prompt, top-k RAG) to “hard” tasks (advanced RAG, agents)?

We’re excited to launch a comprehensive survey of different LLMs performing simple to hard LLM/RAG/agent tasks

. For each model, learn which tasks work out-of-the-box, which would okay but need some prompt engineering, and which ones are unreliable.

Models used:

OpenAI models (gpt-3.5-turbo, gpt-3.5-turbo-instruct, gpt-4)

Anthropic models (claude-2, Claude-instant-2)

llama2-chat-7b 4bit

Mistral-7b

Tasks

: Basic RAG, routing, query planning, text-to-SQL, structured data extraction, agents!

Results/Notebooks

:
Docs page is here: https://docs.llamaindex.ai/en/latest/core_modules/model_modules/llms/root.html#llm-compatibility-tracking

We have comprehensive notebooks for each model

Contributions

:
Have a model / task in mind? Anyone is welcome to contribute new LLMs to our docs, or modify an existing one! (e.g. if you think our defaults/prompts can be improved).

Credits:
Huge shoutout to our very own @LoganMarkewich for driving this entire effort

️

bnew · Oct 12, 2023

https://archive.ph/kE9Fv

We’ve seen a massive amount of progress in AI/LLM research over the last several weeks. Here are the five highest-impact papers/projects that I’ve been focusing on recently…

StreamingLLM solves limitations with LLMs generating long sequences of text. To avoid excessive memory usage in the KV cache, StreamingLLM only considers a window of recent tokens in the attention computation, as well as four “sink” tokens at the start of the sequence. This allows extremely long sequences of text (4M tokens) to be generated with stable memory usage and performance.

QA-LoRA combines quantization with low rank adaptation (LoRA) to make LLM training and inference more computationally cheap. They key to QA-LoRA is a (group-wise) quantization-aware training scheme that eliminates the need to perform post-training quantization; see below.

“QA-LoRA consistently outperforms QLoRA with PTQ on top of LLMs of different scales (the advantage becomes more significant when the quantization bit width is lower) and is on par with QLoRA without PTQ. Note that during inference, QA-LoRA has exactly the same complexity as QLoRA with PTQ and is much more efficient than QLoRA without PTQ.” - from QA-LoRA paper

Physics of LLMs is a series of papers that study the ability of language models to store/manipulate information. This work finds that language models can only retrieve information that is stored properly during pretraining and struggle to perform complex manipulations of this knowledge (beyond retrieval) without techniques like chain of thought prompting.

GPT-4V is the (long-anticipated) multi-modal extension of GPT-4 that enables the model to process both textual and visual (i.e., images) input from the user. GPT-4V was released within ChatGPT Plus, but it underwent an extensive mitigation and fine-tuning process, which is detailed in the model’s system card, to ensure safety.

LLaVA. GPT-4V is a closed source model, but open-source variants of GPT-4V have already been proposed that can execute dialogue with both textual and visual inputs. LLaVA combines the Vicuna LLM with a vision encoder to create an open-source, multi-modal language model.

bnew · Oct 12, 2023

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

Paper: https://arxiv.org/abs/2310.04406

Abstract:

While large language models (LLMs) have demonstrated impressive performance on a range of decision-making tasks, they rely on simple acting processes and fall short of broad deployment as autonomous agents. We introduce LATS (Language Agent Tree Search), a general framework that synergizes the capabilities of LLMs in planning, acting, and reasoning. Drawing inspiration from Monte Carlo tree search in model-based reinforcement learning, LATS employs LLMs as agents, value functions, and optimizers, repurposing their latent strengths for enhanced decision-making. What is crucial in this method is the use of an environment for external feedback, which offers a more deliberate and adaptive problem-solving mechanism that moves beyond the limitations of existing techniques. Our experimental evaluation across diverse domains, such as programming, HotPotQA, and WebShop, illustrates the applicability of LATS for both reasoning and acting. In particular, LATS achieves 94.4\% for programming on HumanEval with GPT-4 and an average score of 75.9 for web browsing on WebShop with GPT-3.5, demonstrating the effectiveness and generality of our method.

bnew · Oct 12, 2023

https://archive.ph/0yeFE

arxiv.org/abs/2310.06830
github.com/OpenLemur/Lemur

The A.I Megathread (LLM , GPT , Development)

Veteran

Veteran

Model Description:​

Special Features:​

Acknowledgements:​

Dataset:​

Performance:​

Usage:​

Veteran

Veteran

Veteran

Veteran

Veteran

How To Install LLaVA Open-Source and FREE "ChatGPT Vision"​

​

Download​

8,419​

577​

Genre: Science & Technology​

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models​

Veteran

Model Description:

Special Features:

Acknowledgements:

Dataset:

Performance:

Usage:

How To Install LLaVA Open-Source and FREE "ChatGPT Vision"

Download

8,419

577

Genre: Science & Technology

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models