The A.I Megathread (LLM , GPT , Development)

bnew · Dec 25, 2023

https://archive.is/0zFT0

https://archive.is/hyY9s

GitHub - baaivision/Emu: Emu Series: Generative Multimodal Models from BAAI

Emu Series: Generative Multimodal Models from BAAI - baaivision/Emu

github.com

BAAI/Emu2-Gen · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Generative Multimodal Models are In-Context Learners

baaivision.github.io

Generative Multimodal Models are In-Context Learners

Quan Sun1*, Yufeng Cui1*, Xiaosong Zhang1*, Fan Zhang1*, Qiying Yu2,1*, Zhengxiong Luo1, Yueze Wang1, Yongming Rao1 Jingjing Liu2 Tiejun Huang1,3 Xinlong Wang1†

1Beijing Academy of Artificial Intelligence 2Tsinghua University 3Peking University

*equal contribution †project lead

arXiv Code Demo

HF Demo

HF Model

Abstract

The human ability to easily solve multimodal tasks in context (i.e., with only a few demonstrations or simple instructions), is what current multimodal systems have largely struggled to imitate. In this work, we demonstrate that the task-agnostic in-context learning capabilities of large multimodal models can be significantly enhanced by effective scaling-up. We introduce Emu2, a generative multimodal model with 37 billion parameters, trained on large-scale multimodal sequences with a unified autoregressive objective. Emu2 exhibits strong multimodal in-context learning abilities, even emerging to solve tasks that require on-the-fly reasoning, such as visual prompting and object-grounded generation. The model sets a new record on multiple multimodal understanding tasks in few-shot settings. When instruction-tuned to follow specific instructions, Emu2 further achieves new state-of-the-art on challenging tasks such as question answering benchmarks for large multimodal models and open-ended subject-driven generation. These achievements demonstrate that Emu2 can serve as a base model and general-purpose interface for a wide range of multimodal tasks. Code and models are publicly available to facilitate future research.

Video

A strong multimodal few-shot learner

An impressive multimodal generalist

A skilled painter

Zero-shot subject-driven generation

Multimodal in-context learning

Strong multimodal understanding

Generate image from any prompt sequence

Generate video from any prompt sequence

A.I generated explanation:

Sure, let’s break down this abstract into simpler terms and provide some examples:

Multimodal tasks in context: This refers to tasks that involve multiple types of data (like text, images, and sound) and require understanding the context. For example, if you see a picture of a dog and read a story about a dog, you can understand that both are related. This is something humans do easily but is challenging for machines.
In-context learning capabilities of large multimodal models: This means the ability of large AI models to learn from the context in which they’re used. For instance, if an AI model is used to recommend movies, it might learn to suggest horror movies when it’s close to Halloween, based on the context of the time of year.
Emu2: This is the name of the new AI model introduced in the paper. It’s a large model with 37 billion parameters, which means it has a lot of capacity to learn from data.
Unified autoregressive objective: This is a fancy way of saying that Emu2 learns to predict the next piece of data (like the next word in a sentence) based on all the data it has seen so far.
Visual prompting and object-grounded generation: These are examples of tasks that Emu2 can do. Visual prompting might involve generating a description of an image, while object-grounded generation could involve writing a story about a specific object in an image.
Few-shot settings: This means that Emu2 can learn to do new tasks with only a few examples. For instance, if you show it a few examples of cats and then ask it to identify cats in other images, it can do this effectively.
Instruction-tuned: This means that Emu2 can be adjusted to follow specific instructions, like answering questions or generating text on a specific topic.
Code and models are publicly available: This means that the authors have shared their work publicly, so other researchers can use and build upon it.

I hope this helps! Let me know if you have any other questions.

bnew · Dec 25, 2023

Micky Mikey · Dec 25, 2023

I need an LLM that can help me game the stock market.

bnew · Dec 25, 2023

GitHub - dottxt-ai/outlines: Structured Text Generation

Structured Text Generation. Contribute to dottxt-ai/outlines development by creating an account on GitHub.

github.com

Outlines

Robust (guided) text generation.

Made with ❤

️ by the team at .txt. We'd love to have your feedback!

pip install outlines

First time here? Go to our setup guide

Features

Multiple model integrations: OpenAI, transformers, AutoGPTQ, AutoAWQ
Simple and powerful prompting primitives based on the Jinja templating engine
Multiple choices, type constraints and dynamic stopping
Fast regex-guided generation
Fast JSON generation following a JSON schema or a Pydantic model
Grammar-guided generation
Interleave completions with loops, conditionals, and custom Python functions
Caching of generations
Batch inference
Serve with vLLM

Outlines 〰 has new releases and features coming every week. Make sure to

star and

watch this repository, follow @dottxtai to stay up to date!

bnew · Dec 25, 2023

https://archive.is/bdm79

https://archive.is/tGygh

https://archive.is/qRiiy

bnew · Dec 25, 2023

https://archive.is/z4Z74

https://archive.is/mXBQ2

bnew · Dec 25, 2023

TikTok - Make Your Day

bnew · Dec 25, 2023

[2312.05230] Language Models, Agent Models, and World Models: The LAW for Machine Reasoning and Planning

Computer Science > Artificial Intelligence

[Submitted on 8 Dec 2023]

Language Models, Agent Models, and World Models: The LAW for Machine Reasoning and Planning

Zhiting Hu, Tianmin Shu

Despite their tremendous success in many applications, large language models often fall short of consistent reasoning and planning in various (language, embodied, and social) scenarios, due to inherent limitations in their inference, learning, and modeling capabilities. In this position paper, we present a new perspective of machine reasoning, LAW, that connects the concepts of Language models, Agent models, and World models, for more robust and versatile reasoning capabilities. In particular, we propose that world and agent models are a better abstraction of reasoning, that introduces the crucial elements of deliberate human-like reasoning, including beliefs about the world and other agents, anticipation of consequences, goals/rewards, and strategic planning. Crucially, language models in LAW serve as a backend to implement the system or its elements and hence provide the computational power and adaptability. We review the recent studies that have made relevant progress and discuss future research directions towards operationalizing the LAW framework.

Comments:	Position paper. Accompanying NeurIPS2023 Tutorial: this https URL
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
Cite as:	arXiv:2312.05230 [cs.AI]
	(or arXiv:2312.05230v1 [cs.AI] for this version)
	[2312.05230] Language Models, Agent Models, and World Models: The LAW for Machine Reasoning and Planning Focus to learn more

Submission history

From: Zhiting Hu [view email]
[v1] Fri, 8 Dec 2023 18:25:22 UTC (981 KB)

https://arxiv.org/pdf/2312.05230.pdf

bnew · Dec 25, 2023

xDAN-AI/xDAN-L1-Chat-RL-v1 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

**The first 7b model achieving an 8.8 on MT-Bench which is performance at Humanalities, Coding and Writing.**

xDAN-AI • > Discord • Twitter • Huggingface

########## First turn ##########

model	turn	score	size
gpt-4	1	8.95625	-
xDAN-L1-Chat-RL-v1	1	8.87500	7b
xDAN-L2-Chat-RL-v2	1	8.78750	30b
claude-v1	1	8.15000	-
gpt-3.5-turbo	1	8.07500	20b
vicuna-33b-v1.3	1	7.45625	33b
wizardlm-30b	1	7.13125	30b
oasst-sft-7-llama-30b	1	7.10625	30b
Llama-2-70b-chat	1	6.98750	70b

########## Second turn ##########

model	turn	score	size
gpt-4	2	9.025000	-
xDAN-L2-Chat-RL-v2	1	8.087500	30b
xDAN-L1-Chat-RL-v1	2	7.825000	7b
gpt-3.5-turbo	2	7.812500	20b
claude-v1	2	7.650000	-
wizardlm-30b	2	6.887500	30b
vicuna-33b-v1.3	2	6.787500	33b
Llama-2-70b-chat	2	6.725000	70b

########## Average turn##########

model	score	size
gpt-4	8.990625	-
xDAN-L2-Chat-RL-v2	8.437500	30b
xDAN-L1-Chat-RL-v1	8.350000	7b
gpt-3.5-turbo	7.943750	20b
claude-v1	7.900000	-
vicuna-33b-v1.3	7.121875	33b
wizardlm-30b	7.009375	30b
Llama-2-70b-chat	6.856250	70b

Prompt Template(Alpaca)

Instruction:"You are a helpful assistant named DAN.You are an expert in worldly knowledge, skilled in employing a probing questioning strategy, carefully considering each step before providing answers."

{Question}

Response:

Created By xDAN-AI at 2023-12-15

Eval by FastChat: GitHub - lm-sys/FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Check: https://www.xdan.ai

bnew · Dec 25, 2023

https://archive.is/4xHkI

GitHub - VikParuchuri/texify: OCR model for math that outputs LaTeX and markdown

OCR model for math that outputs LaTeX and markdown - GitHub - VikParuchuri/texify: OCR model for math that outputs LaTeX and markdown

github.com

About
OCR model for math that outputs LaTeX and markdown

Texify
Texify is an OCR model that converts images or pdfs containing math into markdown and LaTeX that can be rendered by MathJax ($$ and $ are delimiters). It can run on CPU, GPU, or MPS.

Texify can work with block equations, or equations mixed with text (inline). It will convert both the equations and the text.

The closest open source comparisons to texify are pix2tex and nougat, although they're designed for different purposes:

Pix2tex is designed only for block LaTeX equations, and hallucinates more on text.
Nougat is designed to OCR entire pages, and hallucinates more on small images only containing math.
Pix2tex is trained on im2latex, and nougat is trained on arxiv. Texify is trained on a more diverse set of web data, and works on a range of images.

See more details in the benchmarks section.

vikp/texify2 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

bnew · Dec 25, 2023

https://archive.is/2TAbj

bnew · Dec 25, 2023

https://archive.is/8NieV

[2312.07910] PromptBench: A Unified Library for Evaluation of Large Language Models

Computer Science > Artificial Intelligence

[Submitted on 13 Dec 2023]

PromptBench: A Unified Library for Evaluation of Large Language Models

Kaijie Zhu, Qinlin Zhao, Hao Chen, Jindong Wang, Xing Xie

The evaluation of large language models (LLMs) is crucial to assess their performance and mitigate potential security risks. In this paper, we introduce PromptBench, a unified library to evaluate LLMs. It consists of several key components that are easily used and extended by researchers: prompt construction, prompt engineering, dataset and model loading, adversarial prompt attack, dynamic evaluation protocols, and analysis tools. PromptBench is designed to be an open, general, and flexible codebase for research purposes that can facilitate original study in creating new benchmarks, deploying downstream applications, and designing new evaluation protocols. The code is available at: this https URL and will be continuously supported.

Comments:	An extension to PromptBench (arXiv:2306.04528) for unified evaluation of LLMs using the same name; code: this https URL
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2312.07910 [cs.AI]
	(or arXiv:2312.07910v1 [cs.AI] for this version)
	[2312.07910] PromptBench: A Unified Library for Evaluation of Large Language Models Focus to learn more

Submission history

From: Jindong Wang [view email]
[v1] Wed, 13 Dec 2023 05:58:34 UTC (288 KB)

https://arxiv.org/pdf/2312.07910.pdf

GitHub - microsoft/promptbench: A unified evaluation framework for large language models

A unified evaluation framework for large language models - GitHub - microsoft/promptbench: A unified evaluation framework for large language models

github.com

PromptBench: A Unified Library for Evaluating and Understanding Large Language Models.
Paper · Documentation · Leaderboard · More papers

Table of Contents

News and Updates

[16/12/2023] Add support for Gemini, Mistral, Mixtral, Baichuan, Yi models.
[15/12/2023] Add detailed instructions for users to add new modules (models, datasets, etc.) examples/add_new_modules.md.
[05/12/2023] Published promptbench 0.0.1.

Introduction

PromptBench is a Pytorch-based Python package for Evaluation of Large Language Models (LLMs). It provides user-friendly APIs for researchers to conduct evaluation on LLMs. Check the technical report: [2312.07910] PromptBench: A Unified Library for Evaluation of Large Language Models.

What does promptbench currently provide?

Quick model performance assessment: We offer a user-friendly interface that allows for quick model building, dataset loading, and evaluation of model performance.
Prompt Engineering: We implemented several prompt engineering methods. For example: Few-shot Chain-of-Thought [1], Emotion Prompt [2], Expert Prompting [3] and so on.
Evaluating adversarial prompts: promptbench integrated prompt attacks [4], enabling researchers to simulate black-box adversarial prompt attacks on models and evaluate their robustness (see details here).
Dynamic evaluation to mitigate potential test data contamination: we integrated the dynamic evaluation framework DyVal [5], which generates evaluation samples on-the-fly with controlled complexity.

bnew · Dec 25, 2023

https://archive.is/do89G

GitHub - David-Franciamone/Useful-for-chat-prompts-Combine-the-contents-of-all-files-into-one.-: A Python program designed to aggregate text from files (.txt, .csv, .json, .xml, .cs) into a single document. Features include easy directory selection a

A Python program designed to aggregate text from files (.txt, .csv, .json, .xml, .cs) into a single document. Features include easy directory selection and a simple GUI. Useful for preparing data f...

github.com

Project Title: Scrape Contents of Files into a Text File

Project Description: A Python program designed to aggregate text from files (.txt, .csv, .json, .xml, .cs) into a single document. Features include easy directory selection and a simple GUI. Useful for preparing data for AI chat prompts.

bnew · Dec 25, 2023

bnew · Dec 25, 2023

https://archive.is/A5hNq

https://archive.is/eQT1p

The A.I Megathread (LLM , GPT , Development)

Veteran

Generative Multimodal Models are In-Context Learners​

Abstract​

Video​

A strong multimodal few-shot learner​

An impressive multimodal generalist​

A skilled painter​

Multimodal in-context learning​

Strong multimodal understanding​

Generate image from any prompt sequence​

Generate video from any prompt sequence​

Veteran

Banned

Veteran

Outlines ​

Features​

Veteran

Veteran

Veteran

Veteran

Computer Science > Artificial Intelligence​

Language Models, Agent Models, and World Models: The LAW for Machine Reasoning and Planning​

Submission history​

Veteran

Instruction:"You are a helpful assistant named DAN.You are an expert in worldly knowledge, skilled in employing a probing questioning strategy, carefully considering each step before providing answers."​

Response:​

Created By xDAN-AI at 2023-12-15​

Eval by FastChat: GitHub - lm-sys/FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.​

Check: https://www.xdan.ai​

Veteran

Veteran

Veteran

Computer Science > Artificial Intelligence​

PromptBench: A Unified Library for Evaluation of Large Language Models​

Submission history​

News and Updates​

Introduction​

What does promptbench currently provide?​

Veteran

Veteran

Veteran

Generative Multimodal Models are In-Context Learners

Abstract

Video

A strong multimodal few-shot learner

An impressive multimodal generalist

A skilled painter

Multimodal in-context learning

Strong multimodal understanding

Generate image from any prompt sequence

Generate video from any prompt sequence

Outlines

Features

Computer Science > Artificial Intelligence

Language Models, Agent Models, and World Models: The LAW for Machine Reasoning and Planning

Submission history

Instruction:"You are a helpful assistant named DAN.You are an expert in worldly knowledge, skilled in employing a probing questioning strategy, carefully considering each step before providing answers."

Response:

Created By xDAN-AI at 2023-12-15

Eval by FastChat: GitHub - lm-sys/FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Check: https://www.xdan.ai

Computer Science > Artificial Intelligence

PromptBench: A Unified Library for Evaluation of Large Language Models

Submission history

News and Updates

Introduction

What does promptbench currently provide?