bnew

Veteran
Joined
Nov 1, 2015
Messages
56,133
Reputation
8,239
Daps
157,844
{post is incomplete, go to the site for more info/examples}

Stability AI releases DeepFloyd IF, a powerful text-to-image model that can smartly integrate text into images​

28 Apr
Today Stability AI, together with its multimodal AI research lab DeepFloyd, announced the research release of DeepFloyd IF, a powerful text-to-image cascaded pixel diffusion model.
DeepFloyd IF is a state-of-the-art text-to-image model released on a non-commercial, research-permissible license that provides an opportunity for research labs to examine and experiment with advanced text-to-image generation approaches. In line with other Stability AI models, Stability AI intends to release a DeepFloyd IF model fully open source at a future date.

Description and Features
  • Deep text prompt understanding:
    The generation pipeline utilizes the large language model T5-XXL-1.1 as a text encoder. A significant amount of text-image cross-attention layers also provides better prompt and image alliance.
  • Application of text description into images:
    Incorporating the intelligence of the T5 model, DeepFloyd IF generates coherent and clear text alongside objects of different properties appearing in various spatial relations. Until now, these use cases have been challenging for most text-to-image models.
  • A high degree of photorealism:
    This property is reflected by the impressive zero-shot FID score of 6.66 on the COCO dataset (FID is a main metric used to evaluate the performance of text-to-image models; the lower the score, the better).
  • Aspect ratio shift:
    The ability to generate images with a non-standard aspect ratio, vertical or horizontal, as well as the standard square aspect.
  • Zero-shot image-to-image translations:
    Image modification is conducted by (1) resizing the original image to 64 pixels, (2) adding noise through forward diffusion, and (3) using backward diffusion with a new prompt to denoise the image (in inpainting mode, the process happens in the local zone of the image). The style can be changed further through super-resolution modules via a prompt text description. This approach gives the opportunity to modify style, patterns and details in output while maintaining the basic form of the source image – all without the need for fine-tuning.
View fullsize unnamed (1).png
View fullsize unnamed (2).png
View fullsize unnamed (3).png
View fullsize unnamed (4).png
View fullsize unnamed (5).png
View fullsize
deep_floyd_if_image_2_image.gif




Definitions and processes
DeepFloyd IF is a modular, cascaded, pixel diffusion model. We break down the definitions of each of these descriptors here:
  • Modular:
    DeepFloyd IF consists of several neural modules (neural networks that can solve independent tasks, like generating images from text prompts and upscaling) whose interactions in one architecture create synergy.

  • Cascaded:
    DeepFloyd IF models high-resolution data in a cascading manner, using a series of individually trained models at different resolutions. The process starts with a base model that generates unique low-resolution samples (a ‘player’), then upsampled by successive super-resolution models (‘amplifiers’) to produce high-resolution images.

  • Diffusion:
    DeepFloyd IF’s base and super-resolution models are diffusion models, where a Markov chain of steps is used to inject random noise into data before the process is reversed to generate new data samples from the noise.

  • Pixel:
    DeepFloyd IF works in pixel space. The diffusion is implemented on a pixel level, unlike latent diffusion models (like Stable Diffusion), where latent representations are used.
Screenshot+2023-04-28+at+17.05.23.png
 

Regular Developer

Supporter
Joined
Jun 2, 2012
Messages
8,202
Reputation
1,806
Daps
23,106
Reppin
NJ
THanks for updating this. I don't have time to search for all this stuff myself, and I'm a little wary about how its gonna affect the software development industry. I might get github copilot for my personal projects. I'm not going to take the time to learn no damn react-native
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,133
Reputation
8,239
Daps
157,844
[/U]

Stability AI releases StableVicuna, the AI World’s First Open Source RLHF LLM Chatbot​


28 Apr
A Stable Vicuña — Stable Diffusion XL

“A Stable Vicuña” — Stable Diffusion XL
Background
In recent months, there has been a significant push in the development and release of chatbots. From Character.ai's chatbot last spring to ChatGPT in November and Bard in December, the user experience created by tuning language models for chat has been a hot topic. The emergence of open access and open-source alternatives has further fueled this interest.

The Current Environment of Open Source Chatbots
The success of these chat models is due to two training paradigms: instruction finetuning and reinforcement learning through human feedback (RLHF). While there have been significant efforts to build open source frameworks for helping train these kinds of models, such as trlX, trl, DeepSpeed Chat and ColossalAI, there is a lack of open access and open source models that have both paradigms applied. In most models, instruction finetuning is applied without RLHF training because of the complexity that it involves.
Recently, Open Assistant, Anthropic, and Stanford have begun to make chat RLHF datasets readily available to the public. Those datasets, combined with the straightforward training of RLHF provided by trlX, are the backbone for the first large-scale instruction fintuned and RLHF model we present here today: StableVicuna.

Introducing the First Large-Scale Open Source RLHF LLM Chatbot
We are proud to present StableVicuna, the first large-scale open source chatbot trained via reinforced learning from human feedback (RHLF). StableVicuna is a further instruction fine tuned and RLHF trained version of Vicuna v0 13b, which is an instruction fine tuned LLaMA 13b model. For the interested reader, you can find more about Vicuna here.
Here are some of the examples with our Chatbot,
  1. Ask it to do basic math
  2. Ask it to write code
  3. Ask it to help you with grammar
example 1.png

example 2.png

example 3.png

Similarly, here are a number of benchmarks showing the overall performance of StableVicuna compared to other similarly sized open source chatbots.
Table.png

In order to achieve StableVicuna’s strong performance, we utilize Vicuna as the base model and follow the typical three-stage RLHF pipeline outlined by Steinnon et al. and Ouyang et al. Concretely, we further train the base Vicuna model with supervised finetuning (SFT) using a mixture of three datasets:
  • OpenAssistant Conversations Dataset (OASST1), a human-generated, human-annotated assistant-style conversation corpus comprising 161,443 messages distributed across 66,497 conversation trees, in 35 different languages;
  • GPT4All Prompt Generations, a dataset of 437,605 prompts and responses generated by GPT-3.5 Turbo;
  • And Alpaca, a dataset of 52,000 instructions and demonstrations generated by OpenAI's text-davinci-003 engine.
We use trlx to train a reward model that is first initialized from our further SFT model on the following RLHF preference datasets:
  • OpenAssistant Conversations Dataset (OASST1) contains 7213 preferences samples;
  • Anthropic HH-RLHF, a dataset of preferences about AI assistant helpfulness and harmlessness containing 160,800 human labels;
  • And Stanford Human Preferences (SHP), a dataset of 348,718 collective human preferences over responses to questions/instructions in 18 different subject areas, from cooking to philosophy.
Finally, we use trlX to perform Proximal Policy Optimization (PPO) reinforcement learning to perform RLHF training of the SFT model to arrive at StableVicuna!

Obtaining StableVicuna-13B
StableVicuna is of course on the HuggingFace Hub! The model is downloadable as a weight delta against the original LLaMA model. To obtain StableVicuna-13B, you can download the weight delta from here. However, please note that you also need to have access to the original LLaMA model, which requires you to apply for LLaMA weights separately using the link provided in the GitHub repo or here. Once you have both the weight delta and the LLaMA weights, you can use a script provided in the GitHub repo to combine them and obtain StableVicuna-13B.

Announcing Our Upcoming Chatbot Interface
Alongside our chatbot, we are excited to preview our upcoming chat interface which is in the final stages of development. The following screenshots offer a glimpse of what users can expect.
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,133
Reputation
8,239
Daps
157,844

Engineering and Product

Large Language Models for Commercial Use​

This blog explains what a license is for LLM models and why it is important. We resolve any doubts that you might have about the licensing of these models so that you do not run into legal troubles while using, modifying, or sharing them.


Truefoundry

Apr 27, 2023 • 5 min read

Seeing LLMs (Large language models) being used for a variety of value-generating tasks all across the industry, every business wants to get their hands on them. However, before you start using these models commercially, it is important to understand the licensing and legal norms around this.

In this blog, we will resolve any doubts that you might have about the licensing of these models so that you do not run into legal troubles while using, modifying, or sharing them.

📌
We will continuously update this blog to cover all major LLMs and their licensing implications.

Can you just start using LLMs for your business?​

We had a chat with a few leaders, and it turns out licensing LLMs commercially is more complicated. Let's take the example of Vicuna.

Vicuna is an open-source chatbot trained by fine-tuning on LLaMA

If you deploy the Vicuna 13B model using a hugging face, you would find that the team behind the project has released just the delta weights of the models, which in turn need to be applied to the Original LLaMA model to make it work.

The Vicuna model card would show the license to be Apache 2.0 license, making one believe that the model can be used commercially.

Vicuna 13B Delta weights model Hugging Face Model Card
However, the LLaMA weights are not available commercially, making the vicuna model, in turn, only usable in research settings, not commercially.

Confusing, right? Let us try to explain how this works

Different types of licenses and what do they mean?​

Here is a table with some of the common licenses that LLMs are found to have:

LICENSELLMSPERMISSIVE OR COPYLEFTPATENT GRANTCOMMERCIAL USEREDISTRIBUTIONMODIFICATION
Apache 2.0BERT, XLNet, and XLM-RoBERTaPermissiveYesYes (with attribution)YesYes
MITGPT-2, T5, and BLOOMPermissiveNoYes (with attribution)YesYes
GPL-3.0GLM-130B and NeMO LLMCopyleftYes (for GPL-3.0 licensed software only)Yes (with source code)Yes (with source code)Yes
ProprietaryGPT-3, LaMDA and CohereVariesVariesVariesVariesVaries
Copyleft licenses like GPL-3.0 require that any derivative works of the software be licensed under the same license. This means that if you use GPL-3.0-licensed software in your project, your project must also be licensed under GPL-3.0.

Permissive licenses like Apache 2.0 and MIT allow users to use, modify and distribute the software under the license with minimal restrictions on how they use it or how they distribute it.

Explaining some of the common licenses that LLMs are licensed under:

Apache 2.0 License​

Under this license, users must give credit to the original authors, include a copy of the license, and state any changes made to the software. Users must also not use any trademarks or logos associated with the software without permission.

MIT License​

This license allows anyone to use, modify, and distribute the software for any purpose as long as they include a copy of the license and a notice of the original authors. The MIT License is similar to the Apache 2.0 License, but it does not have any conditions regarding trademarks or logos.

GPL-3.0 License​

It allows anyone to use, modify, and distribute the software for any purpose as long as they share their source code under the same license. This means that users cannot create proprietary versions of the software or incorporate it into closed-source software without disclosing their code. The GPL-3.0 License also has some other conditions, such as providing a disclaimer of warranty and liability and ensuring that users can access or run the software without any restrictions.

Proprietary License​

The last type of license for LLMs that we will discuss is the proprietary license, which is a non-open source license that grants limited rights to use the software under certain terms and conditions. The proprietary license usually requires users to pay a fee or obtain permission to access or use the software and may impose restrictions on how the software can be used or modified. The proprietary license may also prohibit users from sharing or distributing the software or its outputs without authorization.

RAIL License​

RAIL license is a new copyright license that combines an Open Access approach to licensing with behavioral restrictions aimed at enforcing a vision of responsible AI. This license has certain use-based restrictions like it cannot be used in

  1. Anything that violates laws and regulations
  2. Exploit or harm minors, or uses that discriminate or harm “individuals or groups based on social behavior or known or predicted personal or personality characteristics.”
Some models under this license are- OPT, Stable Diffusion, BLOOM

The following table summarizes the license details of each LLM:

LLM License
Sheet2 Model,Domain,License ,Number of Parameters (in millions),Pretraining Dataset,Results,creator,Inference Speed,Release DateAlpaca-medium,general,MIT,7000,Base model Llama. Fine tuned on interactions collected from DaVinci model.,Almost equivalent with GPT-3 variant of ChatGPT. ,Stanford,M…
Google Docs


Which models can I use?​

The problem with LLMs for commercial use is that they may not be open-source or may not allow commercial use (Models built on top of Meta's LLaMA model). This means that companies may have to pay to use them or may not be able to use them at all. Additionally, some companies may prefer to use open-source models for reasons such as transparency and the ability to modify the code.

There are several open-source language models that can be used commercially for free.

Bloom​

Bloom is an open-access multilingual language model that contains 176 billion parameters and is trained for 3.5 months on 384 A100–80GB GPUs.

It is licensed under bigscience-bloom-rail-1.0 license. This restricts BLOOM to not be used for certain use cases like for giving medical advice and medical results interpretation. This is in addition to other restrictions that are present under the RAIL license (Described above)

Dolly 2.0​

Dolly 2.0 is a 12B parameter language model based on the EleutherAI Pythia model family and fine-tuned exclusively on a new, high-quality human-generated instruction following dataset, crowdsourced among Databricks employees. It is the first open-source, instruction-following LLM fine-tuned on a human-generated instruction dataset licensed for research and commercial use. The entirety of Dolly 2.0, including the training code, the dataset, and the model weights, are open-sourced and suitable for commercial use.

RWKV Raven​

RWKV-LM is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it combines the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, “infinite” ctx_len, and free sentence embedding.

It is licensed under Apache 2.0.

Eleuther AI Models (Polyglot, GPT Neo, GPT NeoX, GPT-J, Pythia)​

EleutherAI has trained and released several LLMs and the codebases used to train them. Several of these LLMs were the largest or most capable available at the time and have been widely used since in open-source research applications.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,133
Reputation
8,239
Daps
157,844




https://web.archive.org/web/20230429231517/https://twitter.com/bohanhou1998/status/1652151502012837890



MLC LLM is a universal solution that allows any language models to be deployed natively on a diverse set of hardware backends and native applications, plus a productive framework for everyone to further optimize model performance for their own use cases.

Our mission is to enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

Everything runs locally with no server support and accelerated with local GPUs on your phone and laptops. Supported platforms include:

  • iPhone
  • Metal GPUs and Intel/ARM MacBooks;
  • AMD and NVIDIA GPUs via Vulkan on Windows and Linux;
  • NVIDIA GPUs via CUDA on Windows and Linux;
  • WebGPU on browsers (through companion project WebLLM).
Check out our instruction page to try out!


What is MLC LLM?​

In recent years, there has been remarkable progress in generative artificial intelligence (AI) and large language models (LLMs), which are becoming increasingly prevalent. Thanks to open-source initiatives, it is now possible to develop personal AI assistants using open-sourced models. However, LLMs tend to be resource-intensive and computationally demanding. To create a scalable service, developers may need to rely on powerful clusters and expensive hardware to run model inference. Additionally, deploying LLMs presents several challenges, such as their ever-evolving model innovation, memory constraints, and the need for potential optimization techniques.

The goal of this project is to enable the development, optimization, and deployment of AI models for inference across a range of devices, including not just server-class hardware, but also users' browsers, laptops, and mobile apps. To achieve this, we need to address the diverse nature of compute devices and deployment environments. Some of the key challenges include:

  • Supporting different models of CPUs, GPUs, and potentially other co-processors and accelerators.
  • Deploying on the native environment of user devices, which may not have python or other necessary dependencies readily available.
  • Addressing memory constraints by carefully planning allocation and aggressively compressing model parameters.
MLC LLM offers a repeatable, systematic, and customizable workflow that empowers developers and AI system researchers to implement models and optimizations in a productivity-focused, Python-first approach. This methodology enables quick experimentation with new models, new ideas and new compiler passes, followed by native deployment to the desired targets. Furthermore, we are continuously expanding LLM acceleration by broadening TVM backends to make model compilation more transparent and efficient.
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,133
Reputation
8,239
Daps
157,844

About​

An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) for key-frame segmentation and Associating Objects with Transformers (AOT) for efficient tracking and propagation purposes.


Segment and Track Anything (SAM-Track)​

Online Demo: Open In Colab

Tutorial: tutorial-v1.5 (Text), tutorial-v1.0 (Click & Brush)


Segment and Track Anything is an open-source project that focuses on the segmentation and tracking of any objects in videos, utilizing both automatic and interactive methods. The primary algorithms utilized include the SAM (Segment Anything Models) for automatic/interactive key-frame segmentation and the DeAOT (Decoupling features in Associating Objects with Transformers) (NeurIPS2022) for efficient multi-object tracking and propagation. The SAM-Track pipeline enables dynamic and automatic detection and segmentation of new objects by SAM, while DeAOT is responsible for tracking all identified objects.



This video showcases the segmentation and tracking capabilities of SAM-Track in various scenarios, such as street views, AR, cells, animations, aerial shots, and more.

Demo1 showcases SAM-Track's ability to interactively segment and track individual objects. The user specified that SAM-Track tracked a man playing street basketball.



Demo2 showcases SAM-Track's ability to interactively add specified objects for tracking.The user customized the addition of objects to be tracked on top of the segmentation of everything in the scene using SAM-Track.

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,133
Reputation
8,239
Daps
157,844

About​

Raising the Cost of Malicious AI-Powered Image Editing

gradientscience.org/photoguard/

Raising the Cost of Malicious AI-Powered Image Editing​

This repository contains the code for our recent work on safe-guarding images against manipulation by ML-powerd photo-editing models such as stable diffusion.

Raising the Cost of Malicious AI-Powered Image Editing
Hadi Salman*, Alaa Khaddaj*, Guillaume Leclerc*, Andrew Ilyas, Aleksander Madry
Paper: [2302.06588] Raising the Cost of Malicious AI-Powered Image Editing
Blog post: Raising the Cost of Malicious AI-Powered Image Editing
Interactive demo: Photoguard - a Hugging Face Space by hadisalman (check below for how to run it locally)

@article{salman2023raising,
title={Raising the Cost of Malicious AI-Powered Image Editing},
author={Salman, Hadi and Khaddaj, Alaa and Leclerc, Guillaume and Ilyas, Andrew and Madry, Aleksander},
journal={arXiv preprint arXiv:2302.06588},
year={2023}
}


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,133
Reputation
8,239
Daps
157,844


https://web.archive.org/web/20230430005058/https://twitter.com/lupantech/status/1652022897563795456

@lupantech
🚀65B LLaMA-Adapter-V2 code & checkpoint are NOW ready at GitHub - ZrrSkywalker/LLaMA-Adapter: Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters!
🛠️Big update enhancing multimodality & chatbot.
🔥LLaMA-Adapter-V2 surpasses #ChatGPT in response quality (102%:100%) & beats #Vicuna in win-tie-lost (50:14).
☕️Thanks to Peng Gao &
@opengvlab
!

 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,133
Reputation
8,239
Daps
157,844

https://web.archive.org/save/https://twitter.com/gdb/status/1652369023609470976
Fu5k3sXaMAEHbiA




edit:
copied the prompt using 🖼️ Image-to-Multilingual-OCR 👁️ Gradio - a Hugging Face Space by awacke1

Act as a dual PhD in sports psychology and neuroscience: Your job is to design a system to someone addicted to something that will positively impact their life; in this case, starting an exercise habit (running): Create a 60 plan using research-backed principles to have anyone--even someone who hates running--build a running habit if they follow the plan: Incorporate research such as BF Skinner's study of addiction, BJ Fogg's Behavioral Model; and similar research on addiction and compulsion.

Outline a week-by-week plan; but give a detailed day-by-day plan for the first week:
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,133
Reputation
8,239
Daps
157,844

nl4UkGg.png
https://web.archive.org/web/20230430012709/https://twitter.com/DrEalmutairi/status/1652272468105543681


ChatGPT and Artificial Intelligence in higher education
Quick start guide Portrait created by DALL.E 2, an AI system that can create realistic images and art in response to a text description. The AI was asked to produce an impressionist portrait of how artificial intelligence would look going to university. Concept by UNESCO IESALC
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,133
Reputation
8,239
Daps
157,844

https://web.archive.org/web/20230430015051/https://twitter.com/jbrowder1/status/1652187049255120897
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,133
Reputation
8,239
Daps
157,844


https://web.archive.org/web/20230430015207/https://twitter.com/gdb/status/1652411976713371649
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,133
Reputation
8,239
Daps
157,844

https://web.archive.org/web/20230430010034/https://twitter.com/emollick/status/1652170706312896512
 
Top