The A.I Megathread (LLM , GPT , Development)

bnew · Nov 5, 2023

bnew · Nov 5, 2023

01-ai/Yi-34B · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Introduction

The Yi series models are large language models trained from scratch by developers at 01.AI. The first public release contains two bilingual(English/Chinese) base models with the parameter sizes of 6B and 34B. Both of them are trained with 4K sequence length and can be extended to 32K during inference time.

News

2023/11/02: The base model of Yi-6B and Yi-34B.

Model Performance

Model	MMLU	CMMLU	C-Eval	GAOKAO	BBH	Common-sense Reasoning	Reading Comprehension	Math & Code
	5-shot	5-shot	5-shot	0-shot	3-shot@1	-	-	-
LLaMA2-34B	62.6	-	-	-	44.1	69.9	68.0	26.0
LLaMA2-70B	68.9	53.3	-	49.8	51.2	71.9	69.4	36.8
Baichuan2-13B	59.2	62.0	58.1	54.3	48.8	64.3	62.4	23.0
Qwen-14B	66.3	71.0	72.1	62.5	53.4	73.3	72.5	39.8
Skywork-13B	62.1	61.8	60.6	68.1	41.7	72.4	61.4	24.9
InternLM-20B	62.1	59.0	58.8	45.5	52.5	78.3	-	30.4
Aquila-34B	67.8	71.4	63.1	-	-	-	-	-
Falcon-180B	70.4	58.0	57.8	59.0	54.0	77.3	68.8	34.0
Yi-6B	63.2	75.5	72.0	72.2	42.8	72.3	68.7	19.8
Yi-34B	76.3	83.7	81.4	82.8	54.3	80.1	76.4	37.1

While benchmarking open-source models, we have observed a disparity between the results generated by our pipeline and those reported in public sources (e.g. OpenCompass). Upon conducting a more in-depth investigation of this difference, we have discovered that various models may employ different prompts, post-processing strategies, and sampling techniques, potentially resulting in significant variations in the outcomes. Our prompt and post-processing strategy remains consistent with the original benchmark, and greedy decoding is employed during evaluation without any post-processing for the generated content. For scores that were not reported by the original authors (including scores reported with different settings), we try to get results with our pipeline.

To evaluate the model's capability extensively, we adopted the methodology outlined in Llama2. Specifically, we included PIQA, SIQA, HellaSwag, WinoGrande, ARC, OBQA, and CSQA to assess common sense reasoning. SquAD, QuAC, and BoolQ were incorporated to evaluate reading comprehension. CSQA was exclusively tested using a 7-shot setup, while all other tests were conducted with a 0-shot configuration. Additionally, we introduced GSM8K (8-shot@1), MATH (4-shot@1), HumanEval (0-shot@1), and MBPP (3-shot@1) under the category "Math & Code". Due to technical constraints, we did not test Falcon-180 on QuAC and OBQA; the score is derived by averaging the scores on the remaining tasks. Since the scores for these two tasks are generally lower than the average, we believe that Falcon-180B's performance was not underestimated.

01-ai/Yi-34B · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

TheBloke/Yi-34B-GGUF · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

TheBloke/Yi-34B-GPTQ · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Daniel Daugherty on LinkedIn: 01-ai/Yi-34B at main

01.AI just released two models which beat all other models of similar sizes: - a 6B which outperforms Mistral-7B - a 34B which outperforms…

www.linkedin.com

bnew · Nov 5, 2023

GitHub - KoljaB/LocalAIVoiceChat: Local AI talk with a custom voice based on Zephyr 7B model. Uses RealtimeSTT with faster_whisper for transcription and RealtimeTTS with Coqui XTTS for synthesis.

Local AI talk with a custom voice based on Zephyr 7B model. Uses RealtimeSTT with faster_whisper for transcription and RealtimeTTS with Coqui XTTS for synthesis. - GitHub - KoljaB/LocalAIVoiceChat:...

github.com

About

Local AI talk with a custom voice based on Zephyr 7B model. Uses RealtimeSTT with faster_whisper for transcription and RealtimeTTS with Coqui XTTS for synthesis.

Local AI Voice Chat

Provides talk in realtime with AI, completely local on your PC, with customizable AI personality and voice.

About the Project

Integrates the powerful Zephyr 7B language model with real-time speech-to-text and text-to-speech libraries to create a fast and engaging voicebased local chatbot.

https://user-images.githubusercontent.com/7604638/280487248-cebacdad-8a57-4a03-bfd1-a469730dda51.mov

Local.AI.Talkbot.GithubClip.mov

Tech Stack

llama_cppwith Zephyr 7B
- library interface for llamabased language models
RealtimeSTTwith faster_whisper
- real-time speech-to-text transcription library
RealtimeTTSwith Coqui XTTS
- real-time text-to-speech synthesis library

Notes

This software is in an experimental alpha state and does not provide production ready stability. The current XTTS model used for synthesis still has glitches and also Zephyr - while really good for a 7B model - of course can not compete with the answer quality of GPT 4, Claude or Perplexity.

Please take this as a first attempt to provide an early version of a local realtime chatbot.

Updates

Bugfix to RealtimeTTS (download of Coqui model did not work properly)

Prerequisites

You will need a GPU with around 8 GB VRAM to run this in real-time.

NVIDIA CUDA Toolkit 11.8:
- Access the NVIDIA CUDA Toolkit Archive.
- Choose version 11.x and follow the instructions for downloading and installation.
NVIDIA cuDNN 8.7.0 for CUDA 11.x:
- Navigate to NVIDIA cuDNN Archive.
- Locate and download "cuDNN v8.7.0 (November 28th, 2022), for CUDA 11.x".
- Follow the provided installation guide.
FFmpeg:
Install FFmpeg according to your operating system:
- Ubuntu/Debian:
  sudo apt update && sudo apt install ffmpeg
- Arch Linux:
  sudo pacman -S ffmpeg
- macOS (Homebrew):
  brew install ffmpeg
- Windows (Chocolatey):
  choco install ffmpeg
- Windows (Scoop):
  scoop install ffmpeg

Installation Steps

Clone the repository or download the source code package.
Run the install_win.bat script. This will automatically handle the installation of required dependencies and prepare your environment. There may be warnings about numpy or fsspec incompatibilies, but you can ignore them, it will work nevertheless.
If you are running UNIX or MAC you need to adjust this script (if someone with more experience under these platforms could mail me install-scripts, I would love to add them for these platforms).
Download zephyr-7b-beta.Q5_K_M.gguf from here.
- Open creation_params.json and enter the filepath to the downloaded model into model_path.
- Adjust n_gpu_layers (0-35, raise if you have more VRAM) and n_threads (number of CPU threads, i recommend not using all available cores but leave some for TTS)
Implement a temporary workaround for an issue in the Coqui TTS library:
- Activate your venv (test_env\Scripts\activate.bat under Windows, I think source test_env/bin/activate under Unix/Mac)
- Execute the command pip show tts to find the installation path of the Coqui TTS library.
- Navigate to the Coqui installation directory and proceed to TTS/tts/models.
- Locate and open the xtts.py file in a text editor with administrative or sufficient privileges.
- Within the handle_chunks method, modify the line if wav_overlap is not None: to if wav_overlap is not None and wav_chunk.shape[0] > 0:.
- Note: This modification addresses a specific runtime issue I encountered during working with the coqui library. Although it resolves the problem, it is a provisional solution. I did not consider a pull request submission to the Coqui TTS repository yet, because I honestly do not fully understand the underlying cause and full implications of the change to even document it well. This adjustment ensures functionality but should be approached with caution and technical oversight.

Running the Application

To start the AI voice chat, run start.bat

bnew · Nov 5, 2023

bnew · Nov 5, 2023

https://archive.ph/R1Ce3

Finetuning Llama 2 and Mistral

A beginner’s guide to finetuning SOTA LLMs with QLoRA

https://archive.ph/xBATa

bnew · Nov 5, 2023

bnew · Nov 5, 2023

https://archive.ph/Nt051

GitHub - OneInterface/realtime-bakllava: llama.cpp with BakLLaVA model describes what does it see

llama.cpp with BakLLaVA model describes what does it see - OneInterface/realtime-bakllava

github.com

bnew · Nov 5, 2023

https://archive.ph/fW0uB

bnew · Nov 6, 2023

https://archive.ph/skwiW

SAM And MetaCLIP - a Hugging Face Space by SkalskiP

Discover amazing ML apps made by the community

huggingface.co

bnew · Nov 6, 2023

https://archive.ph/i420k

Capybara - a NousResearch Collection

Un-aligned model for general use, leveraging Amplify-Instruct and novel quality curation techniques, made with a dataset of less than 20K examples.

huggingface.co

bnew · Nov 6, 2023

https://archive.ph/agMzH

We propose #SEINE, a video diffusion model that focuses on generative transition and prediction.

#SEINE supports *video transition generation* and *image-to-video animation*

- Project: https://vchitect.github.io/SEINE-project/
- Paper: https://arxiv.org/abs/2310.20700
- Code: https://github.com/Vchitect/SEINE

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

paper page: Paper page - SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

Recently video generation has achieved substantial progress with realistic results. Nevertheless, existing AI-generated videos are usually very short clips ("shot-level") depicting a single scene. To deliver a coherent long video ("story-level"), it is desirable to have creative transition and prediction effects across different clips. This paper presents a short-to-long video diffusion model, SEINE, that focuses on generative transition and prediction. The goal is to generate high-quality long videos with smooth and creative transitions between scenes and varying lengths of shot-level videos. Specifically, we propose a random-mask video diffusion model to automatically generate transitions based on textual descriptions. By providing the images of different scenes as inputs, combined with text-based control, our model generates transition videos that ensure coherence and visual quality. Furthermore, the model can be readily extended to various tasks such as image-to-video animation and autoregressive video prediction. To conduct a comprehensive evaluation of this new generative task, we propose three assessing criteria for smooth and creative transition: temporal consistency, semantic similarity, and video-text semantic alignment. Extensive experiments validate the effectiveness of our approach over existing methods for generative transition and prediction, enabling the creation of story-level long videos.

bnew · Nov 6, 2023

https://archive.ph/UA5Wc

bnew · Nov 6, 2023

https://archive.ph/54m3P

Explore Wonder3D, which generate high-fidelity 3D meshes from single-view images.

Great work Xiaoxiao Long, Yuan-Chen Guo, @chenglin6161, @YuanLiu41955461, and team.
Code- github.com/xxlong0/Wonder3D/
Gradio Demo- huggingface.co/spaces/flameh…

bnew · Nov 6, 2023

https://archive.ph/v8Ula

bnew · Nov 6, 2023

https://archive.ph/0GOJD

GitHub - huggingface/distil-whisper: Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate. - GitHub - huggingface/distil-whisper: Distilled variant of Whisper for speech recognition. 6...

github.com

About

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Distil-Whisper

[Paper] [Models] [Colab]

Distil-Whisper is a distilled version of Whisper that is 6 times faster, 49% smaller, and performs within 1% word error rate (WER) on out-of-distribution evaluation sets.

Model	Params / M	Rel. Latency	Short-Form WER	Long-Form WER
whisper-large-v2	1550	1.0	9.1	11.7

distil-large-v2	756	5.8	10.1	11.6
distil-medium.en	394	6.8	11.1	12.4

Note: Distil-Whisper is currently only available for English speech recognition. Multilingual support will be provided soon.

The A.I Megathread (LLM , GPT , Development)

Veteran

Veteran

Introduction​

News​

Model Performance​

Veteran

About​

Local AI Voice Chat​

About the Project​

Tech Stack​

Notes​

Updates​

Prerequisites​

Installation Steps​

Running the Application​

Veteran

Veteran

Finetuning Llama 2 and Mistral​

A beginner’s guide to finetuning SOTA LLMs with QLoRA​

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction​

Veteran

Veteran

Veteran

Veteran

About​

Distil-Whisper​

Introduction

News

Model Performance

About

Local AI Voice Chat

About the Project

Tech Stack

Notes

Updates

Prerequisites

Installation Steps

Running the Application

Finetuning Llama 2 and Mistral

A beginner’s guide to finetuning SOTA LLMs with QLoRA

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

About

Distil-Whisper