bnew

Veteran
Joined
Nov 1, 2015
Messages
57,843
Reputation
8,562
Daps
161,302




1/3
StructLM

Towards Building Generalist Models for Structured Knowledge Grounding

Structured data sources, such as tables, graphs, and databases, are ubiquitous knowledge sources. Despite the demonstrated capabilities of large language models (LLMs) on plain text, their proficiency in interpreting and utilizing structured data remains limited. Our investigation reveals a notable deficiency in LLMs' ability to process structured data, e.g., ChatGPT lags behind state-of-the-art (SoTA) model by an average of 35%. To augment the Structured Knowledge Grounding (SKG) capabilities in LLMs, we have developed a comprehensive instruction tuning dataset comprising 1.1 million examples. Utilizing this dataset, we train a series of models, referred to as StructLM, based on the Code-LLaMA architecture, ranging from 7B to 34B parameters. Our StructLM series surpasses task-specific models on 14 out of 18 evaluated datasets and establishes new SoTA achievements on 7 SKG tasks. Furthermore, StructLM demonstrates exceptional generalization across 6 novel SKG tasks. Contrary to expectations, we observe that scaling model size offers marginal benefits, with StructLM-34B showing only slight improvements over StructLM-7B. This suggests that structured knowledge grounding is still a challenging task and requires more innovative design to push to a new level.

2/3
paper page:

3/3
Beyond Language Models

Byte Models are Digital World Simulators

Traditional deep learning often overlooks bytes, the basic units of the digital world, where all forms of information and operations are encoded and manipulated in binary format. Inspired by the success of next
GHUf8_dX0AAYf7O.jpg

GHjz25WXwAAyX_f.jpg

GHjDf7KXgAAgrbC.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,843
Reputation
8,562
Daps
161,302










Outfit Anyone: Ultra-high quality virtual try-on for Any Clothing and Any Person​

Institute for Intelligent Computing, Alibaba Group

GitHub

图片1
图片2

图片3
图片4

Abstract​

Virtual try-on has become a transformative technology, empowering users to experiment with fashion without ever having to physically try on clothing. However, existing methods often struggle with generating high-fidelity and detail-consistent results. Diffusion models have demonstrated their ability to generate high-quality and photorealistic images, but when it comes to conditional generation scenarios like virtual try-ons, they still face challenges in achieving control and consistency. Outfit Anyone addresses these limitations by leveraging a two-stream conditional diffusion model, enabling it to adeptly handle garment deformation for more lifelike results. It distinguishes itself with scalability—modulating factors such as pose and body shape—and broad applicability, extending from anime to in-the-wild images. Outfit Anyone's performance in diverse scenarios underscores its utility and readiness for real-world deployment.​

Method​

MY ALT TEXT

The conditional Diffusion Model central to our approach processes images of the model, garments, and accompanying text prompts, using garment images as the control factor. Internally, the network segregates into two streams for independent processing of model and garment data. These streams converge within a fusion network that facilitates the embedding of garment details onto the model's feature representation. On this foundation, we have established Outfit Anyone, comprising two key elements: the Zero-shot Try-on Network for initial try-on imagery, and the Post-hoc Refiner for detailed enhancement of clothing and skin texture in the output images.

Various Try-On Results​


Real World​

We showcase Outfit Anyone's capability for versatile outfit changes, including full ensembles and individual pieces, in realistic scenarios.

Individual Garment​

MY ALT TEXT

Outfit​

MY ALT TEXT
MY ALT TEXT


Bizarre Fashion​

Here we showcase our model's ability to handle a wide range of eccentric and unique clothing styles, dress them onto the models, and even create corresponding outfit combinations when necessary.



MY ALT TEXT

MY ALT TEXT

MY ALT TEXT

MY ALT TEXT





Various Body Shapes​

Our model demonstrates the ability to generalize to various body types, including those that are fit, curve and petite, thereby catering to the try-on demands of individuals from all walks of life.



MY ALT TEXT

MY ALT TEXT

MY ALT TEXT






Anime​

we demonstrate the powerful generalization ability of our model, which can support the creation of new animation characters.

1.gif

2.gif


Refiner​

Furthermore, We showcase the effects before and after using the Refiner, demonstrating its ability to significantly enhance the texture and realism of the clothing, while maintaining consistency in the apparel.

r4.jpg

r3.jpg

r2.jpg
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,843
Reputation
8,562
Daps
161,302

1/2
Ideogram AI presents Ideogram 1.0

text-to-image model

offers state-of-the-art text rendering, unprecedented photorealism, exceptional prompt adherence, and a new feature called Magic Prompt to help with prompting






AI in practice

Feb 29, 2024


Ideogram 1.0 outshines Midjourney and DALL-E 3 with impressive text rendering​

Ideogram prompted by THE DECODER

Ideogram 1.0 outshines Midjourney and DALL-E 3 with impressive text rendering

Ideogram prompted by THE DECODER

Matthias Bastian

Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.


Ideogram has released its most advanced text-to-image model to date, Ideogram 1.0, which aims to differentiate itself from the competition with text rendering, photorealism, improved prompt following and a new feature called Magic Prompt.

Until now, image AIs have not been good at rendering text properly within AI-generated images. Ideogram 1.0 addresses this issue with reliable text rendering capabilities that Ideogram says can be used to create personalized messages, memes, posters, t-shirt designs, birthday cards, logos and more.

The company claims that Ideogram 1.0 reduces the text error rate by nearly half compared to DALL-E 3. Midjourney is worse than DALL-E 3 when it comes to text rendering.


Ideogram is not perfect when it comes to text rendering, but it should be much better than DALL-E 3 and Midjourney. First tests confirm this. | Image: Ideogram

In comparison tests, users rated images created with Ideogram better than those created with DALL-E 3 and Midjourney v6 in all areas.


In benchmarks conducted by Ideogram, people rated images generated by Ideogram better than images generated by DALL-E 3 and Midjourney. In both cases, rendering text was the biggest advantage. | Image: Ideogram

Ideogram is capable of generating images in a wide range of aspect ratios and styles, from photorealistic to more artistic results, and is designed to handle long and complex prompts well.

The "Magic Prompt" feature, similar to OpenAI's DALL-E ChatGPT integration, automatically rewrites a short prompt into a detailed image description. Unlike in DALL-E 3, this rewriting can be turned off in Ideogram.

A first test shows that Ideogram does not have to hide behind Midjourney in terms of image quality, and may even have slight advantages over Midjourney v6 and DALL-E 3 in terms of prompt following.

Ideogram definitely has a clear advantage when it comes to text rendering, even if it is not perfect, especially when several texts are to be included in one image. For this reason, Ideogram cannot create precise infographics.


Prompt: "The letters "SORA" being generated on a digital screen" | Image: Midjourney
Prompt: "The letters "SORA" being generated on a digital screen" - Ideogram follows the prompt better and writes SORA correctly every time. | Image: Ideogram prompted by THE DECODER

In terms of image quality and composition, Midjourney and Ideogram surpass OpenAI's often kitschy and colorful DALL-3. Of the three, Midjourney currently offers the most features for image editing, such as changing individual elements in the image using text commands.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,843
Reputation
8,562
Daps
161,302


1/1
Want to serve the LLaMA-7B with a context length of up to 1 million on a single A100-80GB GPU and up to 10 million on an 8-GPU system

Paper - "KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization"

The existing problem - LLMs are seeing growing use for applications such as document analysis and summarization which require large context windows, and with these large context windows KV cache activations surface as the dominant contributor to memory consumption during inference. Quantization is a promising approach for compressing KV cache activations; however, existing solutions fail to represent activations accurately in ultra-low precisions, such as sub-4-bit.

This paper presents KVQuant, which addresses this problem by incorporating novel methods for quantizing cached KV activations, including:

(i) Per-Channel Key Quantization, where we adjust the dimension along which we quantize the Key activations to better match the distribution;

(ii) Pre-RoPE Key Quantization, where we quantize Key activations before the rotary positional embedding to mitigate its impact on quantization;

(iii) Non-Uniform KV Cache Quantization, where we derive per-layer sensitivity-weighted non-uniform datatypes that better represent the distributions;

(iv) Per-Vector Dense-and-Sparse Quantization, where we isolate outliers separately for each vector to minimize skews in quantization ranges; and

(v) Q-Norm, where we normalize quantization centroids in order to mitigate distribution shift, providing additional benefits for 2-bit quantization.

By applying this method to the LLaMA, LLaMA-2, and Mistral models, the paper achieves <0.1 perplexity degradation with 3-bit quantization on both Wikitext-2 and C4, outperforming existing approaches.

----

On a related note, a recent paper "Activation Beacon" was about extending context length, but "KVQuant" the technique proposed in this paper is about making context more compact in memory.

"KVQuant" is not for extending context beyond trained-in limits. It's only about making KV cache more compact by quantizing it. Quantization cannot prevent catastrophic loss of coherence when 8K context model goes beyond 8K.

And from page-6 of "KVQuant" paper, for context extension they used Longlora and Lm-infinite.

Also memory footprint is not the only concern. Transformers slow down with growing context size because each crank requires looking back at all preceding tokens. Over long ranges it becomes just too slow. Beacons paper tries to address that too.
GHbZqqCWUAAkTCI.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,843
Reputation
8,562
Daps
161,302







1/7
The next generation of visual storytelling is here.

Join the waitlist at AI Powered Filmmaking | LTX Studio

2/7
The next generation of visual storytelling is here.

Join the waitlist at http:///

3/7
Of course.

4/7
We license closed models, use open ones, and train our own in order to build the best products we can. Given no single provider currently offers all needed models for story creation—stories, visuals, sounds, dialogue, music—we enhance our in-house training. We also release models…

5/7
That’s right

6/7
Join the waitlist!

7/7
For early access, be sure to join our waitlist
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,843
Reputation
8,562
Daps
161,302


1/2
Groq-powered inference for Mixtral is now available on Poe! You can use Mixtral-8x7b-Groq and experience 400 token/second responses. (1/2)

1/2
The bot is available at Mixtral-8x7b-Groq - Poe and across the Poe iOS, Android, Mac, and Windows apps in the official bots category. (2/2)
GHhMVeSacAAyvol.jpg






1/3
Just launched: Playground v2.5 on Poe! This new image model produces more realistic and visually compelling images, with key improvements in color vibrancy and contrast, finer human-related details, and multi-aspect ratio support. (1/3)

2/3
Playground’s internal evaluation and user studies have found that v2.5 demonstrates a significant increase in aesthetic quality and outperforms almost all other models. (2/3)

3/3
Playground v2.5 is available now at https:// and across all Poe apps. (3/3)
GHXGkM3a0AAKaHE.jpg

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,843
Reputation
8,562
Daps
161,302









1/10
AI gives every student a 1-on-1 tutor.

I hooked up GPT-4 Vision & OpenAI Whisper/TTS to a camera for a 5min prototype.

Showed it a math problem, and it explained it.

Imagine giving a better version of this to *every* student in the world.

The future of education is so bright.

2/10
AI gives every student a 1-on-1 tutor.

I hooked up GPT-4 Vision & OpenAI Whisper/TTS to a camera for a 5min prototype.

Showed it a math problem, and it explained it.

Imagine giving a better version of this to *every* student in the world.

The future of education is so bright.

3/10
Somebody I respect in education asked me “Where are these tools at for AI tutors?” and I sent them this and said “You can do this in 5min.”

They were pleasantly surprised.

Now give a real team working in this space these tools, let them cook, and boom…

Way better education.

4/10
And tech like this will proliferate to all sorts of domains

5/10
Can this apply to hand on trade such as HVAC tech, plumber etc?

6/10
Also - major emphasis on this being a 5min prototype to demonstrate a basic concept.

And this is the worst the tech will ever be.

7/10
Praying Zuck allows this

8/10
I’m hoping OpenAI Sora paves the way for this

9/10
Last year I did some toying around with object detection + GPT but I need to try again with the vision model

10/10
The AI will be able to converse back and forth with you to help you learn.

Don’t let the simple example give you tunnel vision. It goes far beyond a simple how-to for a multiplication problem.

As I like to say, this is the worst it’ll ever be.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,843
Reputation
8,562
Daps
161,302


GHt4a7gXkAAtAc3.jpg


About
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch



We have released the implementation of BitNet b1.58!

BitNet b1.58 is a hot topic after the theory was announced by Microsoft, but since there is no ternary implementation that matches the paper, we implemented it ourselves and published it on GitHub!

I've started learning with the Wikipedia corpus, and it seems like I'm off to a good start!




1/4

We have released the implementation of BitNet b1.58!

BitNet b1.58 is a hot topic after the theory was announced by Microsoft, but since there is no ternary implementation that matches the paper, we implemented it ourselves and published it on GitHub!

I've started learning with the Wikipedia corpus, and it seems like I'm off to a good start!

4/4
オリジナルのBitNetを1.58bの論文に従って3値にするように修正しました
GitHub - frodo821/BitNet-Transformers: 0️⃣1️⃣🤗 BitNet-Transformers: Huggingface Transformers Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch with Llama(2) Architecture

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,843
Reputation
8,562
Daps
161,302




1/4
The Orca-Math paper does a comparison of DPO and KTO for mathematical reasoning, finding that KTO is slightly better when all data is used and 25+ pts better when you have fewer positive examples than negative examples.

2/4
Big gap (25 points!) when you discard prompts for which there are only positive responses. Great work by

@Arindam1408 @corby_rosset @hamedkhanpour @AhmedHAwadallah !

3/4
cc @4evaBehindSOTA @Teknium1 @fentpot @alexgraveley

4/4
Going from a 1:1 ratio of positive:negative examples to a 1:5 or 5:1 ratio only drops winrate by like 5 points, so pretty robust, provided you adjust the weights on the losses so that the effective impact of positive and negative examples is the same, despite the imbalance. Going
GHoGyZqaQAA2TJS.png

GHOrTEIWgAAk_M8.png

GHoHAR6awAEdYqn.jpg

GHoHce-bEAA9QqT.png




1/2
Microsoft presents Orca-Math

Unlocking the potential of SLMs in Grade School Math

Mathematical word problem-solving has long been recognized as a complex task for small language models (SLMs). A recent study hypothesized that the smallest model size, needed to achieve over 80% accuracy on the GSM8K benchmark, is 34 billion parameters. To reach this level of performance with smaller models, researcher often train SLMs to generate Python code or use tools to help avoid calculation errors. Additionally, they employ ensembling, where outputs of up to 100 model runs are combined to arrive at a more accurate result. Result selection is done using consensus, majority vote or a separate a verifier model used in conjunction with the SLM. Ensembling provides a substantial boost in accuracy but at a significant cost increase with multiple calls to the model (e.g., Phi-GSM uses top-48 to boost the performance from 68.2 to 81.5). In this work, we present Orca-Math, a 7-billion-parameter SLM based on the Mistral-7B, which achieves 86.81% on GSM8k without the need for multiple model calls or the use of verifiers, code execution or any other external tools. Our approach has the following key elements: (1) A high quality synthetic dataset of 200K math problems created using a multi-agent setup where agents collaborate to create the data, (2) An iterative learning techniques that enables the SLM to practice solving problems, receive feedback on its solutions and learn from preference pairs incorporating the SLM solutions and the feedback. When trained with Supervised Fine-Tuning alone, Orca-Math achieves 81.50% on GSM8k pass@1 metric. With iterative preference learning, Orca-Math achieves 86.81% pass@1. Orca-Math surpasses the performance of significantly larger models such as LLAMA-2-70B, WizardMath-70B, Gemini-Pro, ChatGPT-3.5. It also significantly outperforms other smaller models while using much smaller data (hundreds of thousands vs. millions of problems).

2/2
paper page:
GHOrTEIWgAAk_M8.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,843
Reputation
8,562
Daps
161,302


1/1
Trending collections are a great way to be up-to-date with exciting OS releases Collections - Hugging Face

Trending this week:
- StarCoder2
- Gemma
- OpenCOdeINterpreter
- Qwen 1.5
- MobileLlama
- Sailor LMs
- Matroshka Embedding
- Zephyr Gemma
- Sora reference papers
- Tower
GHvIrcUW0AAOiox.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,843
Reputation
8,562
Daps
161,302


1/2
Introducing: Zephyr Gemma!

The community has struggled to do a good preference-tune of Gemma, so we built an open-source recipe and trained a model to help people get started.

Model: HuggingFaceH4/zephyr-7b-gemma-v0.1 · Hugging Face
Demo: Zephyr Gemma Chat - a Hugging Face Space by HuggingFaceH4
Handbook: GitHub - huggingface/alignment-handbook: Robust recipes to align language models with human and AI preferences

2/2
The model MT Bench is quite strong and outperforms the officially launched instruct model for different benchmarks.

It was fine-tuned on DEITA and then DPOed with Argilla dataset.

Enjoy!
GHmlQwZX0AAUGPE.jpg

GHm8wtXWQAEtZBi.jpg
 
Top