The A.I Megathread (LLM , GPT , Development)

bnew · Apr 8, 2024

1/1
𝐂𝐇𝐀𝐌𝐏: 𝐂𝐨𝐧𝐭𝐫𝐨𝐥𝐥𝐚𝐛𝐥𝐞 𝐚𝐧𝐝 𝐂𝐨𝐧𝐬𝐢𝐬𝐭𝐞𝐧𝐭 𝐇𝐮𝐦𝐚𝐧 𝐈𝐦𝐚𝐠𝐞 𝐀𝐧𝐢𝐦𝐚𝐭𝐢𝐨𝐧

Model weights on
@huggingface :

fudan-generative-ai/champ · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Code:

GitHub - fudan-generative-vision/champ: Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance - fudan-generative-vision/champ

github.com

Curious to see Champ in action? Build your app & submit for Spaces GPU grants!

bnew · Apr 8, 2024

1/1
Congratulations to
@aidangomez and
@cohere
team! Well deserved reception from the OS community
Play with the Command R+ uber-cool chatbot here: C4AI Command R Plus - a Hugging Face Space by CohereForAI

bnew · Apr 8, 2024

1/2
Ditch the endless scrolling for AI breakthroughs.

Introducing Deforum Daily Papers : A simple way to stay up to date with AI research, right on Discord, powered by the Hugging Face papers API.

Links to Discord & GitHub below.

2/2
Join the Deforum discord: https://discord.gg/deforum Daily Papers Bot Github: GitHub - blueangel1313/DeforumDailyPapers: A discord bot to stay up to date with Hugging Face Daily Papers.

bnew · Apr 8, 2024

1/3
Introducing 𝐇𝐚𝐢𝐫𝐅𝐚𝐬𝐭𝐆𝐀𝐍: a creative approach to realistic and robust Hair Transfer!

High res output and near real-time

Utilizes a new architecture in StyleGAN's FS latent space, enhanced inpainting, and improved encoders

2/3
Calling the Gradio community!
Let's bring HairFastGAN to life by building an interactive demo on Spaces using Gradio!

Showcase the fun and interesting hair transfer technology and make it accessible to everyone with Gradio.

3/3
Open in Simple Colab: Google Colaboratory
Project:

https://colab.research.google.com/#fileId=https://huggingface.co/AIRI-Institute/HairFastGAN/blob/main/notebooks/HairFast_inference.ipynb…

GitHub - AIRI-Institute/HairFastGAN: Official Implementation for "HairFastGAN: Realistic and Robust...

bnew · Apr 8, 2024

1/3
103.5 toks-per-sec, 4-bit Mistral 7B, M2 Ultra, MLX

Your move
@ggerganov

2/3
*caveat* Not yet in main. Still

If curious, some relevant PRs (mostly reducing overheads):

https://github.com/ml-explore/mlx/p...//github.com/ml-explore/mlx-examples/pull/651

3/3
192 GB

bnew · Apr 8, 2024

1/2
Collect 30 examples of good and bad outputs.

Build a DSPy classifier to evaluate outputs. Optimize it on those 30 examples.

Use this as the metric.

2/2
Yes, 30 input-output pairs could be interesting if your classifier needs to see inputs too. It depends on the problem really.

bnew · Apr 8, 2024

1/1
Our work Ranni has been accepted by CVPR 2024 as oral presentation!
We just released the code and checkpoints with a gradio demo for image generation and continuous editing:

Paper: https://huggingface.co/papers/2311.17002…
Code: GitHub - ali-vilab/Ranni
Project: Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following

bnew · Apr 8, 2024

1/2
Introducing Know Your Neighbors (KYN) (#CVPR2024), a single-view scene reconstruction approach that recovers occluded geometry using vision-language spatial reasoning

Project: KYN
Paper: Paper page - Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning
Code: GitHub - ruili3/Know-Your-Neighbors: [CVPR 2024]

Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

2/2
A team effort with
@TobiasFischer11 ,
@MattiaSegu
,
@mapo1
, Luc Van Gool, and
@fedassa

bnew · Apr 8, 2024

1/7
Sigma

Siamese Mamba Network for Multi-Modal Semantic Segmentation

Multi-modal semantic segmentation significantly enhances AI agents' perception and scene understanding, especially under adverse conditions like low-light or overexposed environments. Leveraging additional

2/7
modalities (X-modality) like thermal and depth alongside traditional RGB provides complementary information, enabling more robust and reliable segmentation. In this work, we introduce Sigma, a Siamese Mamba network for multi-modal semantic segmentation, utilizing the Selective

3/7
Structured State Space Model, Mamba. Unlike conventional methods that rely on CNNs, with their limited local receptive fields, or Vision Transformers (ViTs), which offer global receptive fields at the cost of quadratic complexity, our model achieves global receptive fields

4/7
coverage with linear complexity. By employing a Siamese encoder and innovating a Mamba fusion mechanism, we effectively select essential information from different modalities. A decoder is then developed to enhance the channel-wise modeling ability of the model. Our

5/7
method, Sigma, is rigorously evaluated on both RGB-Thermal and RGB-Depth segmentation tasks, demonstrating its superiority and marking the first successful application of State Space Models (SSMs) in multi-modal perception tasks.

6/7
paper page:

7/7
daily papers:

Paper page - Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation
Daily Papers - Hugging Face

bnew · Apr 8, 2024

1/1
Excited to announce our new work: Linear Attention Sequence Parallelism (LASP). Sequence Parallel to leverage Linear Attention right-product feature. Ready to use on SSM (Mamba), Linear RNN (RWKV, HGRN) models.

Paper: https://huggingface.co/papers/2404.02882…Code: GitHub - OpenNLPLab/LASP: Linear Attention Sequence Parallelism (LASP)

bnew · Apr 8, 2024

1/7
Robust Gaussian Splatting

In this paper, we address common error sources for 3D Gaussian Splatting (3DGS) including blur, imperfect camera poses, and color inconsistencies, with the goal of improving its robustness for practical applications

2/7
like reconstructions from handheld phone captures. Our main contribution involves modeling motion blur as a Gaussian distribution over camera poses, allowing us to address both camera pose refinement and motion blur correction in a unified way. Additionally, we

3/7
propose mechanisms for defocus blur compensation and for addressing color in-consistencies caused by ambient light, shadows, or due to camera-related factors like varying white balancing settings. Our proposed solutions integrate in a seamless way with the 3DGS

4/7
formulation while maintaining its benefits in terms of training efficiency and rendering speed. We experimentally validate our contributions on relevant benchmark datasets

5/7
including Scannet++ and Deblur-NeRF, obtaining state-of-the-art results and thus consistent improvements over relevant baselines.

6/7
paper page:

7/7
daily papers:

@3blue1brown
@yacineMTB
@dingboard_
https://naklecha.notion.site/explained-latent-consistency-models-13a9290c0fd3427d8d1a1e0bed97bde2…
@naklecha

Paper page - Robust Gaussian Splatting
Daily Papers - Hugging Face
explained: latent consistency models | Notion

bnew · Apr 8, 2024

1/2
The first project of my PhD, Generative Rendering, will be presented at #CVPR2024!

Generative Rendering is a framework that directly renders low-fidelity animated meshes to animations, enabling a creator to seamlessly create very basic geometry and motion modeling in Blender within minutes, and then hallucinate the other parts of the rendering process using generative models. Inspired by Deferred Rendering, we perform UV-space feature & noise unification and use only the 2D Stable Diffusion model without any further tuning/distillation or additional video/3D data.

Website: Generative Rendering
Paper: [2312.01409] Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

I had a long philosophical conversation with
@Michael_J_Black in early 2023, and since then have been wanting to do some research targeting animation, and bring some blocks from the very polished traditional CG pipeline into generative AI. With the amazing opportunity provided at
@AdobeResearch
by
@guerrera_desesp
, we were able to put some thoughts together! A huge thanks to my mentors:
@guerrera_desesp
,
@gadelha_m
,
@paulchhuang
, Tuanfeng Wang, and
@GordonWetzstein
.

2/2
Thanks!

bnew · Apr 8, 2024

1/6
Social Skill Training with Large Language Models

People rely on social skills like conflict resolution to communicate effectively and to thrive in both work and personal life. However, practice environments for social skills are typically out of reach for most people.

2/6
How can we make social skill training more available, accessible, and inviting? Drawing upon interdisciplinary research from communication and psychology, this perspective paper identifies social skill barriers to enter specialized fields. Then we present a solution that

3/6
leverages large language models for social skill training via a generic framework. Our AI Partner, AI Mentor framework merges experiential learning with realistic practice and

4/6
tailored feedback. This work ultimately calls for cross-disciplinary innovation to address the broader implications for workforce development and social equality.

5/6
paper page:

6/6
daily papers:

huggingface.co
Paper page - Social Skill Training with Large Language Models
Daily Papers - Hugging Face

bnew · Apr 8, 2024

1/9
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

Web-crawled pretraining datasets underlie the impressive "zero-shot" evaluation performance of multimodal models, such as CLIP for classification/retrieval and Stable-

2/9
Diffusion for image generation. However, it is unclear how meaningful the notion of "zero-shot" generalization is for such multimodal models, as it is not known to what extent their pretraining datasets encompass the downstream concepts targeted for during "zero-shot"

3/9
evaluation. In this work, we ask: How is the performance of multimodal models on downstream concepts influenced by the frequency of these concepts in their pretraining datasets? We comprehensively investigate this question across 34 models and five

4/9
standard pretraining datasets (CC-3M, CC-12M, YFCC-15M, LAION-400M, LAION-Aesthetics), generating over 300GB of data artifacts. We consistently find that, far from exhibiting "zero-shot" generalization, multimodal models require exponentially more data to achieve linear

5/9
improvements in downstream "zero-shot" performance, following a sample inefficient log-linear scaling trend. This trend persists even when controlling for sample-level similarity between pretraining and downstream datasets, and testing on purely synthetic data

6/9
distributions. Furthermore, upon benchmarking models on long-tailed data sampled based on our analysis, we demonstrate that multimodal models across the board perform poorly. We contribute this long-tail test set as the "Let it Wag!" benchmark to further research in this

7/9
direction. Taken together, our study reveals an exponential need for training data which implies that the key to "zero-shot" generalization capabilities under large-scale training paradigms remains to be found.

8/9
paper page:

9/9
daily papers:

@memdotai

huggingface.co
Paper page - No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines...
Daily Papers - Hugging Face
mem.ai
No "Zero-Shot" Without Exponential Data: Investigating Multimodal Model Performance
Published by Save to Mem · 11:36 AM

bnew · Apr 8, 2024

1/1
Open Sora Plan v1.0.0
@Gradio demo is live on
@huggingface
, courtesy of
@fffiloni

space: Open Sora Plan v1.0.0 - a Hugging Face Space by fffiloni

1/3
We are thrilled to present Open-Sora-Plan v1.0.0, which can generate 10 seconds of 1024×1024 video with 24 FPS. And it is also able to generate high resolution images.
If you like our project, please give us a star on GitHub for latest update.

GitHub - PKU-YuanGroup/Open-Sora-Plan: This project aim to reproduce Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project.

This project aim to reproduce Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project. - PKU-YuanGroup/Open-Sora-Plan

github.com

2/3
Sunset_over_the_sea
65×1024×1024

3/3
A_quiet_beach_at_dawn,_the_waves_gently_lapping_at_the_shore_and_the_sky_painted_in_pastel_hues

65×1024×1024 without super resolution and frame interpolation.

The A.I Megathread (LLM , GPT , Development)

More options

bnew

Veteran

fudan-generative-ai/champ · Hugging Face

GitHub - fudan-generative-vision/champ: Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

GitHub - PKU-YuanGroup/Open-Sora-Plan: This project aim to reproduce Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project.