bnew

Veteran
Joined
Nov 1, 2015
Messages
57,479
Reputation
8,519
Daps
160,211



















Generative Multimodal Models are In-Context Learners​

Quan Sun1*, Yufeng Cui1*, Xiaosong Zhang1*, Fan Zhang1*, Qiying Yu2,1*, Zhengxiong Luo1, Yueze Wang1, Yongming Rao1 Jingjing Liu2 Tiejun Huang1,3 Xinlong Wang1†

1Beijing Academy of Artificial Intelligence 2Tsinghua University 3Peking University

*equal contribution †project lead

arXiv Code Demo 🤗HF Demo 🤗HF Model

Abstract​

The human ability to easily solve multimodal tasks in context (i.e., with only a few demonstrations or simple instructions), is what current multimodal systems have largely struggled to imitate. In this work, we demonstrate that the task-agnostic in-context learning capabilities of large multimodal models can be significantly enhanced by effective scaling-up. We introduce Emu2, a generative multimodal model with 37 billion parameters, trained on large-scale multimodal sequences with a unified autoregressive objective. Emu2 exhibits strong multimodal in-context learning abilities, even emerging to solve tasks that require on-the-fly reasoning, such as visual prompting and object-grounded generation. The model sets a new record on multiple multimodal understanding tasks in few-shot settings. When instruction-tuned to follow specific instructions, Emu2 further achieves new state-of-the-art on challenging tasks such as question answering benchmarks for large multimodal models and open-ended subject-driven generation. These achievements demonstrate that Emu2 can serve as a base model and general-purpose interface for a wide range of multimodal tasks. Code and models are publicly available to facilitate future research.

Video​





A strong multimodal few-shot learner​

comparison_fewshot.

An impressive multimodal generalist​

Radar.

A skilled painter​

gen_metrics.

Zero-shot subject-driven generation

Multimodal in-context learning​

multi-modal-incontext-learning.

Strong multimodal understanding​

hexogon.

guide robot.

damage car.

sample A and B.

Generate image from any prompt sequence​

generate_from_any_prompt_sequence1.

generate_from_any_prompt_sequence2.

generate_from_any_prompt_sequence3.

Generate video from any prompt sequence​

video_generation.





A.I generated explanation:



Sure, let’s break down this abstract into simpler terms and provide some examples:
  1. Multimodal tasks in context: This refers to tasks that involve multiple types of data (like text, images, and sound) and require understanding the context. For example, if you see a picture of a dog and read a story about a dog, you can understand that both are related. This is something humans do easily but is challenging for machines.
  2. In-context learning capabilities of large multimodal models: This means the ability of large AI models to learn from the context in which they’re used. For instance, if an AI model is used to recommend movies, it might learn to suggest horror movies when it’s close to Halloween, based on the context of the time of year.
  3. Emu2: This is the name of the new AI model introduced in the paper. It’s a large model with 37 billion parameters, which means it has a lot of capacity to learn from data.
  4. Unified autoregressive objective: This is a fancy way of saying that Emu2 learns to predict the next piece of data (like the next word in a sentence) based on all the data it has seen so far.
  5. Visual prompting and object-grounded generation: These are examples of tasks that Emu2 can do. Visual prompting might involve generating a description of an image, while object-grounded generation could involve writing a story about a specific object in an image.
  6. Few-shot settings: This means that Emu2 can learn to do new tasks with only a few examples. For instance, if you show it a few examples of cats and then ask it to identify cats in other images, it can do this effectively.
  7. Instruction-tuned: This means that Emu2 can be adjusted to follow specific instructions, like answering questions or generating text on a specific topic.
  8. Code and models are publicly available: This means that the authors have shared their work publicly, so other researchers can use and build upon it.
I hope this helps! Let me know if you have any other questions.
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,479
Reputation
8,519
Daps
160,211


Outlines 〰️



Pypi Contributors Discord Twitter
Robust (guided) text generation.
Made with ❤👷️ by the team at .txt. We'd love to have your feedback!

pip install outlines

First time here? Go to our setup guide

Features​

Outlines 〰 has new releases and features coming every week. Make sure to ⭐ star and 👀 watch this repository, follow @dottxtai to stay up to date!
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,479
Reputation
8,519
Daps
160,211

Computer Science > Artificial Intelligence​

[Submitted on 8 Dec 2023]

Language Models, Agent Models, and World Models: The LAW for Machine Reasoning and Planning​

Zhiting Hu, Tianmin Shu
Despite their tremendous success in many applications, large language models often fall short of consistent reasoning and planning in various (language, embodied, and social) scenarios, due to inherent limitations in their inference, learning, and modeling capabilities. In this position paper, we present a new perspective of machine reasoning, LAW, that connects the concepts of Language models, Agent models, and World models, for more robust and versatile reasoning capabilities. In particular, we propose that world and agent models are a better abstraction of reasoning, that introduces the crucial elements of deliberate human-like reasoning, including beliefs about the world and other agents, anticipation of consequences, goals/rewards, and strategic planning. Crucially, language models in LAW serve as a backend to implement the system or its elements and hence provide the computational power and adaptability. We review the recent studies that have made relevant progress and discuss future research directions towards operationalizing the LAW framework.
Comments:Position paper. Accompanying NeurIPS2023 Tutorial: this https URL
Subjects:Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
Cite as:arXiv:2312.05230 [cs.AI]
(or arXiv:2312.05230v1 [cs.AI] for this version)
[2312.05230] Language Models, Agent Models, and World Models: The LAW for Machine Reasoning and Planning
Focus to learn more

Submission history​

From: Zhiting Hu [view email]
[v1] Fri, 8 Dec 2023 18:25:22 UTC (981 KB)




 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,479
Reputation
8,519
Daps
160,211


tVAcwKkIH5vkfzqgqHeHi.png

**The first 7b model achieving an 8.8 on MT-Bench which is performance at Humanalities, Coding and Writing.**

xDAN-AI • > DiscordTwitterHuggingface

Image

########## First turn ##########

modelturnscoresize
gpt-418.95625-
xDAN-L1-Chat-RL-v118.875007b
xDAN-L2-Chat-RL-v218.7875030b
claude-v118.15000-
gpt-3.5-turbo18.0750020b
vicuna-33b-v1.317.4562533b
wizardlm-30b17.1312530b
oasst-sft-7-llama-30b17.1062530b
Llama-2-70b-chat16.9875070b
########## Second turn ##########

modelturnscoresize
gpt-429.025000-
xDAN-L2-Chat-RL-v218.08750030b
xDAN-L1-Chat-RL-v127.8250007b
gpt-3.5-turbo27.81250020b
claude-v127.650000-
wizardlm-30b26.88750030b
vicuna-33b-v1.326.78750033b
Llama-2-70b-chat26.72500070b
########## Average turn##########

modelscoresize
gpt-48.990625-
xDAN-L2-Chat-RL-v28.43750030b
xDAN-L1-Chat-RL-v18.3500007b
gpt-3.5-turbo7.94375020b
claude-v17.900000-
vicuna-33b-v1.37.12187533b
wizardlm-30b7.00937530b
Llama-2-70b-chat6.85625070b
Prompt Template(Alpaca)

Instruction:"You are a helpful assistant named DAN.You are an expert in worldly knowledge, skilled in employing a probing questioning strategy, carefully considering each step before providing answers."​

{Question}

Response:​

Created By xDAN-AI at 2023-12-15​

Eval by FastChat: GitHub - lm-sys/FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Check: https://www.xdan.ai

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,479
Reputation
8,519
Daps
160,211




About
OCR model for math that outputs LaTeX and markdown

Texify
Texify is an OCR model that converts images or pdfs containing math into markdown and LaTeX that can be rendered by MathJax ($$ and $ are delimiters). It can run on CPU, GPU, or MPS.

Texify can work with block equations, or equations mixed with text (inline). It will convert both the equations and the text.

The closest open source comparisons to texify are pix2tex and nougat, although they're designed for different purposes:

Pix2tex is designed only for block LaTeX equations, and hallucinates more on text.
Nougat is designed to OCR entire pages, and hallucinates more on small images only containing math.
Pix2tex is trained on im2latex, and nougat is trained on arxiv. Texify is trained on a more diverse set of web data, and works on a range of images.

See more details in the benchmarks section.

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,479
Reputation
8,519
Daps
160,211


Computer Science > Artificial Intelligence​

[Submitted on 13 Dec 2023]

PromptBench: A Unified Library for Evaluation of Large Language Models​

Kaijie Zhu, Qinlin Zhao, Hao Chen, Jindong Wang, Xing Xie
The evaluation of large language models (LLMs) is crucial to assess their performance and mitigate potential security risks. In this paper, we introduce PromptBench, a unified library to evaluate LLMs. It consists of several key components that are easily used and extended by researchers: prompt construction, prompt engineering, dataset and model loading, adversarial prompt attack, dynamic evaluation protocols, and analysis tools. PromptBench is designed to be an open, general, and flexible codebase for research purposes that can facilitate original study in creating new benchmarks, deploying downstream applications, and designing new evaluation protocols. The code is available at: this https URL and will be continuously supported.
Comments:An extension to PromptBench (arXiv:2306.04528) for unified evaluation of LLMs using the same name; code: this https URL
Subjects:Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:arXiv:2312.07910 [cs.AI]
(or arXiv:2312.07910v1 [cs.AI] for this version)
[2312.07910] PromptBench: A Unified Library for Evaluation of Large Language Models
Focus to learn more

Submission history

From: Jindong Wang [view email]
[v1] Wed, 13 Dec 2023 05:58:34 UTC (288 KB)






Logo
PromptBench: A Unified Library for Evaluating and Understanding Large Language Models.
Paper · Documentation · Leaderboard · More papers

Table of Contents

News and Updates​

  • [16/12/2023] Add support for Gemini, Mistral, Mixtral, Baichuan, Yi models.
  • [15/12/2023] Add detailed instructions for users to add new modules (models, datasets, etc.) examples/add_new_modules.md.
  • [05/12/2023] Published promptbench 0.0.1.

Introduction​

PromptBench is a Pytorch-based Python package for Evaluation of Large Language Models (LLMs). It provides user-friendly APIs for researchers to conduct evaluation on LLMs. Check the technical report: [2312.07910] PromptBench: A Unified Library for Evaluation of Large Language Models.

Code Structure

What does promptbench currently provide?​

  1. Quick model performance assessment: We offer a user-friendly interface that allows for quick model building, dataset loading, and evaluation of model performance.
  2. Prompt Engineering: We implemented several prompt engineering methods. For example: Few-shot Chain-of-Thought [1], Emotion Prompt [2], Expert Prompting [3] and so on.
  3. Evaluating adversarial prompts: promptbench integrated prompt attacks [4], enabling researchers to simulate black-box adversarial prompt attacks on models and evaluate their robustness (see details here).
  4. Dynamic evaluation to mitigate potential test data contamination: we integrated the dynamic evaluation framework DyVal [5], which generates evaluation samples on-the-fly with controlled complexity.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,479
Reputation
8,519
Daps
160,211



Project Title: Scrape Contents of Files into a Text File

Project Description: A Python program designed to aggregate text from files (.txt, .csv, .json, .xml, .cs) into a single document. Features include easy directory selection and a simple GUI. Useful for preparing data for AI chat prompts.

Picture of the interface
 
Top