bnew

Veteran
Joined
Nov 1, 2015
Messages
51,834
Reputation
7,926
Daps
148,802

1/1
๐‚๐‡๐€๐Œ๐: ๐‚๐จ๐ง๐ญ๐ซ๐จ๐ฅ๐ฅ๐š๐›๐ฅ๐ž ๐š๐ง๐ ๐‚๐จ๐ง๐ฌ๐ข๐ฌ๐ญ๐ž๐ง๐ญ ๐‡๐ฎ๐ฆ๐š๐ง ๐ˆ๐ฆ๐š๐ ๐ž ๐€๐ง๐ข๐ฆ๐š๐ญ๐ข๐จ๐ง

Model weights on
@huggingface : Code:
Curious to see Champ in action? Build your app & submit for Spaces GPU grants!
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,834
Reputation
7,926
Daps
148,802

1/1
Congratulations to
@aidangomez and
@cohere
team! Well deserved reception from the OS community
Play with the Command R+ uber-cool chatbot here: C4AI Command R Plus - a Hugging Face Space by CohereForAI
GKpxOqFWYAATSo1.jpg

GKpxO9_XcAAtt62.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,834
Reputation
7,926
Daps
148,802

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,834
Reputation
7,926
Daps
148,802



1/3
Introducing ๐‡๐š๐ข๐ซ๐…๐š๐ฌ๐ญ๐†๐€๐: a creative approach to realistic and robust Hair Transfer!

High res output and near real-time

Utilizes a new architecture in StyleGAN's FS latent space, enhanced inpainting, and improved encoders

2/3
Calling the Gradio community!
Let's bring HairFastGAN to life by building an interactive demo on Spaces using Gradio!

Showcase the fun and interesting hair transfer technology and make it accessible to everyone with Gradio.

3/3
Open in Simple Colab: Google Colaboratory
Project:
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,834
Reputation
7,926
Daps
148,802



1/3
103.5 toks-per-sec, 4-bit Mistral 7B, M2 Ultra, MLX

Your move
@ggerganov

2/3
*caveat* Not yet in main. Still

If curious, some relevant PRs (mostly reducing overheads):

https://github.com/ml-explore/mlx/p...//github.com/ml-explore/mlx-examples/pull/651

3/3
192 GB
GKlsNyCaMAE5pas.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,834
Reputation
7,926
Daps
148,802


1/2
Collect 30 examples of good and bad outputs.

Build a DSPy classifier to evaluate outputs. Optimize it on those 30 examples.

Use this as the metric.

2/2
Yes, 30 input-output pairs could be interesting if your classifier needs to see inputs too. It depends on the problem really.
GKprMroacAAI7iC.jpg

GKlq5WvXwAAUZKL.jpg

GKkipRRXsAAXKuj.png

GKgj0stW4AAceIk.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,834
Reputation
7,926
Daps
148,802


1/2
Introducing Know Your Neighbors (KYN) (#CVPR2024), a single-view scene reconstruction approach that recovers occluded geometry using vision-language spatial reasoning

Project: KYN
Paper: Paper page - Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning
Code: GitHub - ruili3/Know-Your-Neighbors: [CVPR 2024] ๐ŸกKnow Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

2/2
A team effort with
@TobiasFischer11 ,
@MattiaSegu
,
@mapo1
, Luc Van Gool, and
@fedassa
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,834
Reputation
7,926
Daps
148,802







1/7
Sigma

Siamese Mamba Network for Multi-Modal Semantic Segmentation

Multi-modal semantic segmentation significantly enhances AI agents' perception and scene understanding, especially under adverse conditions like low-light or overexposed environments. Leveraging additional

2/7
modalities (X-modality) like thermal and depth alongside traditional RGB provides complementary information, enabling more robust and reliable segmentation. In this work, we introduce Sigma, a Siamese Mamba network for multi-modal semantic segmentation, utilizing the Selective

3/7
Structured State Space Model, Mamba. Unlike conventional methods that rely on CNNs, with their limited local receptive fields, or Vision Transformers (ViTs), which offer global receptive fields at the cost of quadratic complexity, our model achieves global receptive fields

4/7
coverage with linear complexity. By employing a Siamese encoder and innovating a Mamba fusion mechanism, we effectively select essential information from different modalities. A decoder is then developed to enhance the channel-wise modeling ability of the model. Our

5/7
method, Sigma, is rigorously evaluated on both RGB-Thermal and RGB-Depth segmentation tasks, demonstrating its superiority and marking the first successful application of State Space Models (SSMs) in multi-modal perception tasks.

6/7
paper page:

7/7
daily papers:
GKojwj5WsAA475z.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,834
Reputation
7,926
Daps
148,802







1/7
Robust Gaussian Splatting

In this paper, we address common error sources for 3D Gaussian Splatting (3DGS) including blur, imperfect camera poses, and color inconsistencies, with the goal of improving its robustness for practical applications

2/7
like reconstructions from handheld phone captures. Our main contribution involves modeling motion blur as a Gaussian distribution over camera poses, allowing us to address both camera pose refinement and motion blur correction in a unified way. Additionally, we

3/7
propose mechanisms for defocus blur compensation and for addressing color in-consistencies caused by ambient light, shadows, or due to camera-related factors like varying white balancing settings. Our proposed solutions integrate in a seamless way with the 3DGS

4/7
formulation while maintaining its benefits in terms of training efficiency and rendering speed. We experimentally validate our contributions on relevant benchmark datasets

5/7
including Scannet++ and Deblur-NeRF, obtaining state-of-the-art results and thus consistent improvements over relevant baselines.

6/7
paper page:

7/7
daily papers:
GKnqTEhWQAAjv_1.jpg

GKnqKNNXcAAcXOS.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,834
Reputation
7,926
Daps
148,802


1/2
The first project of my PhD, Generative Rendering, will be presented at #CVPR2024!

Generative Rendering is a framework that directly renders low-fidelity animated meshes to animations, enabling a creator to seamlessly create very basic geometry and motion modeling in Blender within minutes, and then hallucinate the other parts of the rendering process using generative models. Inspired by Deferred Rendering, we perform UV-space feature & noise unification and use only the 2D Stable Diffusion model without any further tuning/distillation or additional video/3D data.

Website: Generative Rendering
Paper: [2312.01409] Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

I had a long philosophical conversation with
@Michael_J_Black in early 2023, and since then have been wanting to do some research targeting animation, and bring some blocks from the very polished traditional CG pipeline into generative AI. With the amazing opportunity provided at
@AdobeResearch
by
@guerrera_desesp
, we were able to put some thoughts together! A huge thanks to my mentors:
@guerrera_desesp
,
@gadelha_m
,
@paulchhuang
, Tuanfeng Wang, and
@GordonWetzstein
.

2/2
Thanks!
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,834
Reputation
7,926
Daps
148,802






1/6
Social Skill Training with Large Language Models

People rely on social skills like conflict resolution to communicate effectively and to thrive in both work and personal life. However, practice environments for social skills are typically out of reach for most people.

2/6
How can we make social skill training more available, accessible, and inviting? Drawing upon interdisciplinary research from communication and psychology, this perspective paper identifies social skill barriers to enter specialized fields. Then we present a solution that

3/6
leverages large language models for social skill training via a generic framework. Our AI Partner, AI Mentor framework merges experiential learning with realistic practice and

4/6
tailored feedback. This work ultimately calls for cross-disciplinary innovation to address the broader implications for workforce development and social equality.

5/6
paper page:

6/6
daily papers:
GKnn27kWMAA1CR7.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,834
Reputation
7,926
Daps
148,802









1/9
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

Web-crawled pretraining datasets underlie the impressive "zero-shot" evaluation performance of multimodal models, such as CLIP for classification/retrieval and Stable-

2/9
Diffusion for image generation. However, it is unclear how meaningful the notion of "zero-shot" generalization is for such multimodal models, as it is not known to what extent their pretraining datasets encompass the downstream concepts targeted for during "zero-shot"

3/9
evaluation. In this work, we ask: How is the performance of multimodal models on downstream concepts influenced by the frequency of these concepts in their pretraining datasets? We comprehensively investigate this question across 34 models and five

4/9
standard pretraining datasets (CC-3M, CC-12M, YFCC-15M, LAION-400M, LAION-Aesthetics), generating over 300GB of data artifacts. We consistently find that, far from exhibiting "zero-shot" generalization, multimodal models require exponentially more data to achieve linear

5/9
improvements in downstream "zero-shot" performance, following a sample inefficient log-linear scaling trend. This trend persists even when controlling for sample-level similarity between pretraining and downstream datasets, and testing on purely synthetic data

6/9
distributions. Furthermore, upon benchmarking models on long-tailed data sampled based on our analysis, we demonstrate that multimodal models across the board perform poorly. We contribute this long-tail test set as the "Let it Wag!" benchmark to further research in this

7/9
direction. Taken together, our study reveals an exponential need for training data which implies that the key to "zero-shot" generalization capabilities under large-scale training paradigms remains to be found.

8/9
paper page:

9/9
daily papers:
GKnVTq1WoAAGCVO.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,834
Reputation
7,926
Daps
148,802

1/1
Open Sora Plan v1.0.0
@Gradio demo is live on
@huggingface
, courtesy of
@fffiloni


space: Open Sora Plan v1.0.0 - a Hugging Face Space by fffiloni






1/3
We are thrilled to present Open-Sora-Plan v1.0.0, which can generate 10 seconds of 1024ร—1024 video with 24 FPS. And it is also able to generate high resolution images.
If you like our project, please give us a star on GitHub for latest update.

2/3
Sunset_over_the_sea
65ร—1024ร—1024

3/3
A_quiet_beach_at_dawn,_the_waves_gently_lapping_at_the_shore_and_the_sky_painted_in_pastel_hues

65ร—1024ร—1024 without super resolution and frame interpolation.
GKjQKMeaUAE8E1G.jpg
 
Top