The A.I Megathread (LLM , GPT , Development)

bnew · Jan 22, 2024

https://archive.is/UmKBQ

Old Photo Restoration - a Hugging Face Space by modelscope

Discover amazing ML apps made by the community

huggingface.co

bnew · Jan 22, 2024

https://archive.is/6Irtr

01-ai/Yi-VL-6B · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

01-ai/Yi-VL-34B · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

01.AI just released Yi-VL-34B on Hugging Face

Yi Visual Language (Yi-VL) model is the open-source, multimodal version of the Yi Large Language Model (LLM) series, enabling content comprehension, recognition, and multi-round conversations about images.

huggingface.co/01-ai/Yi-VL-6…

huggingface.co/01-ai/Yi-VL-3…

ranking first among all existing open-source models in the latest benchmarks including MMMU in English and CMMMU in Chinese

bnew · Jan 22, 2024

https://archive.is/8Dwd6

https://archive.is/J2QPr

Old Photo Restoration - a Hugging Face Space by modelscope

Discover amazing ML apps made by the community

huggingface.co

piddnad/DDColor-models · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

piddnad/ddcolor – Run with an API on Replicate

Towards Photo-Realistic Image Colorization via Dual Decoders

replicate.com

bnew · Jan 22, 2024

Paper page - Zero Bubble Pipeline Parallelism

Join the discussion on this paper page

huggingface.co

bnew · Jan 22, 2024

Paper page - Understanding Video Transformers via Universal Concept Discovery

Join the discussion on this paper page

huggingface.co

bnew · Jan 22, 2024

https://archive.is/wip/PWyS3

Adobe presents ActAnywhere

Subject-Aware Video Background Generation

paper page: Paper page - ActAnywhere: Subject-Aware Video Background Generation

ActAnywhere, a generative model that automates this process which traditionally requires tedious manual efforts. Our model leverages the power of large-scale video diffusion models, and is specifically tailored for this task. ActAnywhere a sequence of foreground subject segmentation as input and an image that describes the desired scene as condition, to produce a coherent video with realistic foreground-background interactions while adhering to the condition frame. We train our model on a large-scale dataset of human-scene interaction videos. Extensive evaluations demonstrate the superior performance of our model, significantly outperforming baselines. Moreover, we show that ActAnywhere name generalizes to diverse out-of-distribution samples, including non-human subjects.

bnew · Jan 22, 2024

https://archive.is/rCQ5E

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

paper page: Paper page - Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

The inference process in Large Language Models (LLMs) is often limited due to the absence of parallelism in the auto-regressive decoding process, resulting in most operations being restricted by the memory bandwidth of accelerators. While methods such as speculative decoding have been suggested to address this issue, their implementation is impeded by the challenges associated with acquiring and maintaining a separate draft model. In this paper, we present Medusa, an efficient method that augments LLM inference by adding extra decoding heads to predict multiple subsequent tokens in parallel. Using a tree-based attention mechanism, Medusa constructs multiple candidate continuations and verifies them simultaneously in each decoding step. By leveraging parallel processing, Medusa introduces only minimal overhead in terms of single-step latency while substantially reducing the number of decoding steps required. We present two levels of fine-tuning procedures for Medusa to meet the needs of different use cases: Medusa-1: Medusa is directly fine-tuned on top of a frozen backbone LLM, enabling lossless inference acceleration. Medusa-2: Medusa is fine-tuned together with the backbone LLM, enabling better prediction accuracy of Medusa heads and higher speedup but needing a special training recipe that preserves the backbone model's capabilities. Moreover, we propose several extensions that improve or expand the utility of Medusa, including a self-distillation to handle situations where no training data is available and a typical acceptance scheme to boost the acceptance rate while maintaining generation quality. We evaluate Medusa on models of various sizes and training procedures. Our experiments demonstrate that Medusa-1 can achieve over 2.2x speedup without compromising generation quality, while Medusa-2 further improves the speedup to 2.3-3.6x.

bnew · Jan 22, 2024

https://archive.is/AY2IN

MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World

vis-www.cs.umass.edu

GitHub - UMass-Foundation-Model/MultiPLY: Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World

Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World - GitHub - UMass-Foundation-Model/MultiPLY: Code for MultiPLY: A Multisensory Object-Centric Embodied Larg...

github.com

[2401.08577] MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World

Overview

Human beings possess the capability to multiply a mélange of multisensory cues while actively exploring and interacting with the 3D world. Current multi-modal large language models, however, passively absorb sensory data as inputs, lacking the capacity to actively interact with the objects in the 3D environment and dynamically collect their multisensory information. To usher in the study of this area,we propose MultiPLY, a multisensory embodied large language model that could incorporate multisensory interactive data, including visual, audio, tactile, and thermal information into large language models, thereby establishing the correlation among words, actions, and perceptions. MultiPLY can perform a diverse set of multisensory embodied tasks, including multisensory question answering, embodied question answering, task decomposition, object retrieval, and tool use.

bnew · Jan 22, 2024

https://archive.is/ml05Z

Exciting news! @intern_lm 7/20B models are now live on the @huggingface Open LLM Leaderboard!

Highlights:

- 200K context length for base/chat models.
- 20B model is on par with the performance of Yi-34B.
- 7B model is the best in the <= 13B range.

internlm (InternLM)

Org profile for InternLM on Hugging Face, the AI community building the future.

huggingface.co

bnew · Jan 22, 2024

https://archive.is/8potk

bnew · Jan 22, 2024

https://archive.is/saqKj

https://archive.is/NVQbf

bnew · Jan 22, 2024

https://archive.is/85rVC

bnew · Jan 22, 2024

bnew · Jan 22, 2024

https://archive.is/qdeYH

bnew · Jan 22, 2024

https://archive.is/PY3bG

https://archive.is/uwmO6

Discord - Group Chat That’s All Fun & Games

Discord is great for playing games and chilling with friends, or even building a worldwide community. Customize your own space to talk, play, and hang out.

discord.gg

The A.I Megathread (LLM , GPT , Development)

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Overview​

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Overview