bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,913







1/6
DistriFusion

Distributed Parallel Inference for High-Resolution Diffusion Models

Diffusion models have achieved great success in synthesizing high-quality images. However, generating high-resolution images with diffusion models is still challenging due to the enormous

2/6
computational costs, resulting in a prohibitive latency for interactive applications. In this paper, we propose DistriFusion to tackle this problem by leveraging parallelism across multiple GPUs. Our method splits the model input into multiple patches and assigns each patch to a

3/6
GPU. However, na\"{\i}vely implementing such an algorithm breaks the interaction between patches and loses fidelity, while incorporating such an interaction will incur tremendous communication overhead. To overcome this dilemma, we observe the high similarity between the input

4/6
from adjacent diffusion steps and propose displaced patch parallelism, which takes advantage of the sequential nature of the diffusion process by reusing the pre-computed feature maps from the previous timestep to provide context for the current step.

5/6
Therefore, our method supports asynchronous communication, which can be pipelined by computation. Extensive experiments show that our method can be applied to recent Stable Diffusion XL with no quality degradation and achieve up to a 6.1 x speedup on 8 NVIDIA A100s compared to 1.

6/6
paper page:
GHj__wSXMAEUG2f.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,913










1/9
Trajectory Consistency Distillation

Latent Consistency Model (LCM) extends the Consistency Model to the latent space and leverages the guided consistency distillation technique to achieve impressive performance in accelerating text-to-image synthesis. However, we observed that

2/9
LCM struggles to generate images with both clarity and detailed intricacy. To address this limitation, we initially delve into and elucidate the underlying causes. Our investigation identifies that the primary issue stems from errors in three distinct areas. Consequently, we

3/9
introduce Trajectory Consistency Distillation (TCD), which encompasses trajectory consistency function and strategic stochastic sampling. The trajectory consistency function diminishes the distillation errors by broadening the scope of the self-consistency boundary condition and

4/9
endowing the TCD with the ability to accurately trace the entire trajectory of the Probability Flow ODE. Additionally, strategic stochastic sampling is specifically designed to circumvent the accumulated errors inherent in multi-step consistency sampling, which is meticulously

5/9
tailored to complement the TCD model. Experiments demonstrate that TCD not only significantly enhances image quality at low NFEs but also yields more detailed results compared to the teacher model at high NFEs.

6/9
paper page:

7/9
demo:

8/9
model:

9/9
"An astronaut riding a green horse"
GHj762IWgAARvfv.jpg

GHj9VhQXIAAZP5g.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,913






1/5
Priority Sampling of Large Language Models for Compilers

Large language models show great potential in generating and optimizing code. Widely used sampling methods such as Nucleus Sampling increase the diversity of generation but often produce repeated samples for low

2/5
temperatures and incoherent samples for high temperatures. Furthermore, the temperature coefficient has to be tuned for each task, limiting its usability. We present Priority Sampling, a simple and deterministic sampling technique that produces unique samples ordered by the

3/5
model's confidence. Each new sample expands the unexpanded token with the highest probability in the augmented search tree. Additionally, Priority Sampling supports generation based on regular expression that provides a controllable and structured exploration process. Priority

4/5
Sampling outperforms Nucleus Sampling for any number of samples, boosting the performance of the original model from 2.87% to 5% improvement over -Oz.

5/5
paper page:
GHjxbYgXQAABHep.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,913





1/4
Google presents Griffin

Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN

2/4
with gated linear recurrences, and Griffin, a hybrid model that mixes gated linear recurrences with local attention. Hawk exceeds the reported performance of Mamba on downstream tasks, while Griffin matches the performance of Llama-2 despite being trained on over 6 times fewer

3/4
tokens. We also show that Griffin can extrapolate on sequences significantly longer than those seen during training. Our models match the hardware efficiency of Transformers during training, and during inference they have lower latency and significantly higher throughput.

4/4
paper page:
GHjrCvkXUAACXQ_.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,913






1/6
Amazon presents ViewFusion

Towards Multi-View Consistency via Interpolated Denoising

Novel-view synthesis through diffusion models has demonstrated remarkable potential for generating diverse and high-quality images.

2/6
Amazon presents ViewFusion

Towards Multi-View Consistency via Interpolated Denoising

Novel-view synthesis through diffusion models has demonstrated remarkable potential for generating diverse and high-quality images.

3/6
Yet, the independent process of image generation in these prevailing methods leads to challenges in maintaining multiple-view consistency. To address this, we introduce ViewFusion, a novel, training-free algorithm that can be seamlessly integrated into existing pre-trained

4/6
diffusion models. Our approach adopts an auto-regressive method that implicitly leverages previously generated views as context for the next view generation, ensuring robust multi-view consistency during the novel-view generation process.

5/6
Through a diffusion process that fuses known-view information via interpolated denoising, our framework successfully extends single-view conditioned models to work in multiple-view conditional settings without any additional fine-tuning.

6/6
paper page:
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,913









1/8
Snap presents Panda-70M

Captioning 70M Videos with Multiple Cross-Modality Teachers

The quality of the data and annotation upper-bounds the quality of a downstream model. While there exist large text corpora and image-text pairs, high-quality video-text data is much harder to

2/8
collect. First of all, manual labeling is more time-consuming, as it requires an annotator to watch an entire video. Second, videos have a temporal dimension, consisting of several scenes stacked together, and showing multiple actions.

3/8
Accordingly, to establish a video dataset with high-quality captions, we propose an automatic approach leveraging multimodal inputs, such as textual video description, subtitles, and individual video frames.

4/8
Specifically, we curate 3.8M high-resolution videos from the publicly available HD-VILA-100M dataset. We then split them into semantically consistent video clips, and apply multiple cross-modality teacher models to obtain captions for each video.

5/8
Next, we finetune a retrieval model on a small subset where the best caption of each video is manually selected and then employ the model in the whole dataset to select the best caption as the annotation. In this way, we get 70M videos paired with high-quality text captions.

6/8
We dub the dataset as Panda-70M. We show the value of the proposed dataset on three downstream tasks: video captioning, video and text retrieval, and text-driven video generation.

7/8
paper page:

8/8
DistriFusion

Distributed Parallel Inference for High-Resolution Diffusion Models

Diffusion models have achieved great success in synthesizing high-quality images. However, generating high-resolution images with diffusion models is still challenging due to the enormous
GHjk7T7W8AAq4hU.jpg

GHj__wSXMAEUG2f.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,913







1/6
Differential Diffusion

Giving Each Pixel Its Strength

Text-based image editing has advanced significantly in recent years. With the rise of diffusion models, image editing via textual instructions has become ubiquitous.

2/6
Unfortunately, current models lack the ability to customize the quantity of the change per pixel or per image fragment, resorting to changing the entire image in an equal amount, or editing a specific region using a binary mask.

3/6
In this paper, we suggest a new framework which enables the user to customize the quantity of change for each image fragment, thereby enhancing the flexibility and verbosity of modern diffusion models.

4/6
Our framework does not require model training or fine-tuning, but instead performs everything at inference time, making it easily applicable to an existing model.

5/6
paper page:

6/6
Beyond Language Models

Byte Models are Digital World Simulators

Traditional deep learning often overlooks bytes, the basic units of the digital world, where all forms of information and operations are encoded and manipulated in binary format. Inspired by the success of next
GHjKFlPWgAE4tyf.jpg

GHjz25WXwAAyX_f.jpg

GHkofmoasAAGlOD.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,913




1/3
Microsoft The Era of 1-bit LLMs paper is now the most upvoted paper of all time on HF paper pages beating Apple's LLM in a Flash

2/3
paper page:

3/3
Beyond Language Models

Byte Models are Digital World Simulators

Traditional deep learning often overlooks bytes, the basic units of the digital world, where all forms of information and operations are encoded and manipulated in binary format. Inspired by the success of next
GHjDf7KXgAAgrbC.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,913



1/2
Terribly excited about open-source + on-device AI these days! Great to see
@Qualcomm
release 80+ models optimized and curated for their devices and chips on
@huggingface
: qualcomm (Qualcomm)

2/2
We just crossed 100,000 organizations on HF!

Some of my favorites:
- The MLX community for on-device AI: [U][URL]https:// -[/URL][/U] The @AiEleuther org with over 150+ datasets: https://
- The @Bloomberg org to show big financial institutions can use the hub:…
GHRprVXW0AI8XSf.jpg

GHibpa-WwAAn2qo.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,913



1/2
Video as the New Language for Real-World Decision Making

Both text and video data are abundant on the internet and support large-scale self-supervised learning through next token or frame prediction. However, they have not been equally leveraged: language models have had significant real-world impact, whereas video generation has remained largely limited to media entertainment. Yet video data captures important information about the physical world that is difficult to express in language. To address this gap, we discuss an under-appreciated opportunity to extend video generation to solve tasks in the real world. We observe how, akin to language, video can serve as a unified interface that can absorb internet knowledge and represent diverse tasks. Moreover, we demonstrate how, like language models, video generation can serve as planners, agents, compute engines, and environment simulators through techniques such as in-context learning, planning and reinforcement learning. We identify major impact opportunities in domains such as robotics, self-driving, and science, supported by recent work that demonstrates how such advanced capabilities in video generation are plausibly within reach. Lastly, we identify key challenges in video generation that mitigate progress. Addressing these challenges will enable video generation models to demonstrate unique value alongside language models in a wider array of AI applications.

2/2
paper page:
GHZJ1YZXEAAl8Xf.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,913










1/9
Announcing `10k_prompts_ranked`, the first dataset release from Data Is Better Together. Created in <2 weeks by the community. Includes:
- 10,000+ prompt quality rankings
- Human and synthetic data rankings
- Generated by 300+ contributors
on how + why collaborative datasets

2/9
It's no secret that high-quality open data is essential for creating better open models. The open source community shares 100s of models, datasets and demos openly weekly, but collectively building open datasets has been less explored.

3/9
Datasets have a massive role in shaping what models can be created. If we want more high-quality models for all languages, domains and tasks, we need more and better open datasets for all languages, domains and tasks!

4/9
To explore how the community could build impactful datasets collectively, @argilla_io added support for HF authentication for Argilla instances hosted on a @huggingface Space. Anyone with an HF login could begin contributing to a dataset in <1 minute.

5/9
To test this new workflow, we launched a task to rank the quality of prompts (human and synthetically generated). The @nomic_ai Atlas gives an excellent sense of the coverage of the topics in the prompts.

6/9
You can find the dataset here:

7/9
In less than two weeks, we built a community of over 300 contributors for this dataset. This dataset became a reality thanks to the dedication of all the individuals who lent their support

To see the amazing people behind this dataset, visit https://ompt-collective-dashboard…

8/9
This is just the beginning! We aim to empower the community to build new datasets collaboratively. This could include:
Preference datasets for a low-resource language
Evaluations for a specific domain
Datasets for novel tasks

9/9
If this sounds interesting, keep your eyes peeled for further announcements later this week
GHWvoR_bAAAAUhr.jpg

GHWvr3ZawAAJ-Jq.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,913



1/2
Meta presents Rainbow Teaming

Open-Ended Generation of Diverse Adversarial Prompts

As large language models (LLMs) become increasingly prevalent across many real-world applications, understanding and enhancing their robustness to user inputs is of paramount importance. Existing methods for identifying adversarial prompts tend to focus on specific domains, lack diversity, or require extensive human annotations. To address these limitations, we present Rainbow Teaming, a novel approach for producing a diverse collection of adversarial prompts. Rainbow Teaming casts adversarial prompt generation as a quality-diversity problem, and uses open-ended search to generate prompts that are both effective and diverse. It can uncover a model's vulnerabilities across a broad range of domains including, in this paper, safety, question answering, and cybersecurity. We also demonstrate that fine-tuning on synthetic data generated by Rainbow Teaming improves the safety of state-of-the-art LLMs without hurting their general capabilities and helpfulness, paving the path to open-ended self-improvement.

2/2
paper page:
GHUiFLYWUAAw0Go.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,913




1/3
FuseChat

Knowledge Fusion of Chat Models

While training large language models (LLMs) from scratch can indeed lead to models with distinct capabilities and strengths, this approach incurs substantial costs and may lead to potential redundancy in competencies. An alternative strategy is to combine existing LLMs into a more robust LLM, thereby diminishing the necessity for expensive pre-training. However, due to the diverse architectures of LLMs, direct parameter blending proves to be unfeasible. Recently, FuseLLM introduced the concept of knowledge fusion to transfer the collective knowledge of multiple structurally varied LLMs into a target LLM through lightweight continual training. In this report, we extend the scalability and flexibility of the FuseLLM framework to realize the fusion of chat LLMs, resulting in FuseChat. FuseChat comprises two main stages. Firstly, we undertake knowledge fusion for structurally and scale-varied source LLMs to derive multiple target LLMs of identical structure and size via lightweight fine-tuning. Then, these target LLMs are merged within the parameter space, wherein we propose a novel method for determining the merging weights based on the variation ratio of parameter matrices before and after fine-tuning. We validate our approach using three prominent chat LLMs with diverse architectures and scales, namely NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B. Experimental results spanning various chat domains demonstrate the superiority of \textsc{FuseChat-7B} across a broad spectrum of chat LLMs at 7B and 34B scales, even surpassing GPT-3.5 (March) and approaching Mixtral-8x7B-Instruct.

2/3
paper page:

3/3
Beyond Language Models

Byte Models are Digital World Simulators

Traditional deep learning often overlooks bytes, the basic units of the digital world, where all forms of information and operations are encoded and manipulated in binary format. Inspired by the success of next
GHUakLEXYAAuLGw.jpg

GHjz25WXwAAyX_f.jpg

GHjf_p9WwAAbST7.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,913



1/3
Google announces Do Large Language Models Latently Perform Multi-Hop Reasoning?

study whether Large Language Models (LLMs) latently perform multi-hop reasoning with complex prompts such as "The mother of the singer of 'Superstition' is". We look for evidence of a latent reasoning pathway where an LLM (1) latently identifies "the singer of 'Superstition'" as Stevie Wonder, the bridge entity, and (2) uses its knowledge of Stevie Wonder's mother to complete the prompt. We analyze these two hops individually and consider their co-occurrence as indicative of latent multi-hop reasoning. For the first hop, we test if changing the prompt to indirectly mention the bridge entity instead of any other entity increases the LLM's internal recall of the bridge entity. For the second hop, we test if increasing this recall causes the LLM to better utilize what it knows about the bridge entity. We find strong evidence of latent multi-hop reasoning for the prompts of certain relation types, with the reasoning pathway used in more than 80% of the prompts. However, the utilization is highly contextual, varying across different types of prompts. Also, on average, the evidence for the second hop and the full multi-hop traversal is rather moderate and only substantial for the first hop. Moreover, we find a clear scaling trend with increasing model size for the first hop of reasoning but not for the second hop. Our experimental findings suggest potential challenges and opportunities for future development and applications of LLMs.

2/3
paper page:

3/3
Beyond Language Models

Byte Models are Digital World Simulators

Traditional deep learning often overlooks bytes, the basic units of the digital world, where all forms of information and operations are encoded and manipulated in binary format. Inspired by the success of next
GHUZUqIX0AAXVyd.jpg

GHjz25WXwAAyX_f.jpg

GHksPllboAE_zVh.jpg

GHjf_p9WwAAbST7.jpg
 
Top