bnew

Veteran
Joined
Nov 1, 2015
Messages
51,799
Reputation
7,926
Daps
148,650

Installing SDXL 1.0 on Local Computer​

by Harmeet G | Jul 29, 2023 | Resources | 6 comments
sdxl-title-1080x675.jpeg

SDXL 1.0 (Stable Diffusion XL) has been released earlier this week which means you can run the model on your own computer and generate images using your own GPU. I wanted to document the steps required to run your own model and share some tips to ensure that you are starting on the right foot. If you want to see my specifications of my PC have a read of this post.

What you need​

  • Your own hardware with at least 16 GB of GPU memory. I have not found the official specs but my GPUs usage is 12 GB (of total 16GB) for a 1024x1024px image.
  • Automatic1111 v1.5.0 or higher
  • SDXL Models only from their original huggingface page
You will need to move the model file in the sd-webui\models\stable-diffusion directory

Automatic1111 Installation on Windows​

  • Install Python 3.10.6 (make sure you choose the option “Add Python.exe to path” during the installation)
  • Install git.
  • Download the stable-diffusion-webui repository, by running the command. Make sure you are in the desired directory where you want to install eg: c:\AI

  • git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
  • Run webui-user.bat from Windows Explorer as normal, non-administrator, user.

Automatic1111 Installation on Linux​

Install the dependencies:
# Debian-based:
sudo apt install wget git python3 python3-venv
# Red Hat-based:
sudo dnf install wget git python3
# Arch-based:
sudo pacman -S wget git python3
Make sure you are in the desired directory where you want to install eg: \home\AI. Launch command line terminal and execute command:

wget -qO- https://raw.githubusercontent.com/AUTOMATIC1111/stable-diffusion-webui/master/webui.sh)
Run webui.sh, check webui-user.sh for options.

Follow these steps and you will be up and running in no time with your SDXL 1.0

Already running SD 1.x or 2.x with Automatic1111​

If you are already running Automatic1111 with Stable Diffusion (any 1.x or 2.x version) then all you need to do is run your webui-user.bat file with added command git pull. Your file should look like this:

Code:
@echo off
set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS=--xformers --no-half-vae
git pull
call webui.bat
After you run this you can re-edit the file to remove this line or add REM at the start to turn it into a comment which means its not executed. You can also choose to keep it there which means every time you launch the instance of SD, your command line with check the A1111 repo online and update your instance. This will keep you up to date all the time.

Lower GPU Tip​

If you have a 8GB or 10GB sized GPU then you might want to run the batch file webui-user.bat with additional arguments like –xformers –no-half-vae which will optimize memory usage and reduce the “CUDA out of memory” errors. Also reducing the image size to 512x512px or 768x768px will help.

Running SDXL 1.0​

Launch the webui-user.bat file and the command line will execute creating a URL which you can open in a browser.

SDXL-cmd-line-1024x577.jpg

Next you enter the Prompt and any Negative prompt, set the Width and Height to 1024px for best result. Click on Generate and within a few seconds you should have the image produced (time taken depends on the GPU speed). For me SDXL 1.0 was able to generate a new image in <10 seconds.

SDXL-interface-1024x533.jpg

Image Sizes​

For best results you should be using 1024x1024px but what if you want to generate tall images or wider images. You can do this as well using SDXL 1.0 however as per their documentation they suggest using the following dimensions:

  • 1024 x 1024
  • 1152 x 896
  • 896 x 1152
  • 1216 x 832
  • 832 x 1216
  • 1344 x 768
  • 768 x 1344
  • 1536 x 640
  • 640 x 1536
The objective is that the total of the dimensions should add up to 2048px. In my experimentation however, I have tried different ratios as well and the results are good, occasionally you will get the elongation or

Sample Images​

Below images have been generated using the txt2img base model, these were not run through img2img using the refiner model.

sdxl-sample-54-1024x683.jpeg

sdxl-sample-58.jpeg
sdxl-sample-23.jpeg
sdxl-sample-99.jpeg
sdxl-sample-71.jpeg

sdxl-sample-37.jpeg
sdxl-sample-86.jpeg
sdxl-sample-28.jpeg
sdxl-sample-14.jpeg
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,799
Reputation
7,926
Daps
148,650

WizardCoder-34B surpasses GPT-4, ChatGPT-3.5 and Claude-2 on HumanEval with 73.2% pass@1​



6rFrAXP.png

WPNM9Cd.png




News​

  • 🔥🔥🔥[2023/08/26] We released WizardCoder-Python-34B-V1.0 , which achieves the 73.2 pass@1 and surpasses GPT4 (2023/03/15), ChatGPT-3.5, and Claude2 on the HumanEval Benchmarks.
  • [2023/06/16] We released WizardCoder-15B-V1.0 , which achieves the 57.3 pass@1 and surpasses Claude-Plus (+6.8), Bard (+15.3) and InstructCodeT5+ (+22.3) on the HumanEval Benchmarks.
❗Note: There are two HumanEval results of GPT4 and ChatGPT-3.5. The 67.0 and 48.1 are reported by the official GPT4 Report (2023/03/15) of OpenAI. The 82.0 and 72.5 are tested by ourselves with the latest API (2023/08/26).

ModelCheckpointPaperHumanEvalMBPPDemoLicense
WizardCoder-Python-34B-V1.0🤗 HF Link📃 [WizardCoder]73.261.2DemoLlama2
WizardCoder-15B-V1.0🤗 HF Link📃 [WizardCoder]59.850.6--OpenRAIL-M
  • Our WizardMath-70B-V1.0 model slightly outperforms some closed-source LLMs on the GSM8K, including ChatGPT 3.5, Claude Instant 1 and PaLM 2 540B.
  • Our WizardMath-70B-V1.0 model achieves 81.6 pass@1 on the GSM8k Benchmarks, which is 24.8 points higher than the SOTA open-source LLM, and achieves 22.7 pass@1 on the MATH Benchmarks, which is 9.2 points higher than the SOTA open-source LLM.
ModelCheckpointPaperGSM8kMATHOnline DemoLicense
WizardMath-70B-V1.0🤗 HF Link📃 [WizardMath]81.622.7DemoLlama 2
WizardMath-13B-V1.0🤗 HF Link📃 [WizardMath]63.914.0DemoLlama 2
WizardMath-7B-V1.0🤗 HF Link📃 [WizardMath]54.910.7DemoLlama 2
ModelCheckpointPaperMT-BenchAlpacaEvalGSM8kHumanEvalLicense
WizardLM-70B-V1.0🤗 HF Link📃Coming Soon7.7892.91%77.6%50.6Llama 2 License
WizardLM-13B-V1.2🤗 HF Link7.0689.17%55.3%36.6Llama 2 License
WizardLM-13B-V1.1🤗 HF Link6.7686.32%25.0Non-commercial
WizardLM-30B-V1.0🤗 HF Link7.0137.8Non-commercial
WizardLM-13B-V1.0🤗 HF Link6.3575.31%24.0Non-commercial
WizardLM-7B-V1.0🤗 HF Link📃 [WizardLM]19.1Non-commercial

Comparing WizardCoder-Python-34B-V1.0 with Other LLMs.​

🔥 The following figure shows that our WizardCoder-Python-34B-V1.0 attains the second position in this benchmark, surpassing GPT4 (2023/03/15, 73.2 vs. 67.0), ChatGPT-3.5 (73.2 vs. 72.5) and Claude2 (73.2 vs. 71.2).

WizardCoder
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,799
Reputation
7,926
Daps
148,650


AI Has Already Created As Many Images As Photographers Have Taken in 150 Years. Statistics for 2023​

In the past year, dozens of communities dedicated to AI art have accelerated across the Internet, from Reddit to Twitter to Discord, with thousands of AI artists practicing their skills to create precise prompts and sharing the results with others. The amount of content created during this time is hard to measure, but whatever it is, it’s incredibly big. We’ve kept track of some AI image statistics and facts and tried to estimate (at least roughly) how much content has been created since text-to-image algorithms took off last year. Read on to learn more about how we arrived at this number and how some of the most prominent algorithms contribute to it.

Featured image created by Discord user Sixu using Midjourney
AI image statistics: Number of AI-created images

Key Insights​

  • More than 15 billion images created using text-to-image algorithms since last year. To put this in perspective, it took photographers 150 years, from the first photograph taken in 1826 until 1975, to reach the 15 billion mark.
  • Since the launch of DALLE-2, people are creating an average of 34 million images per day.
  • The fastest-growing product is Adobe Firefly, the suite of AI algorithms built into Adobe Photoshop. It reached 1 billion images created in just three months since its launch.
  • Midjourney has 15 million users, the largest user base of any image generation platform for which we have publicly available statistics. Compare: Adobe Creative Cloud, which consists of Adobe Photoshop and other graphic design and video editing software, including the generative AI tool Adobe Firefly, has 30 million users, according to Prodesigntools.
  • Approximately 80% of the images (i.e. 12.590 billion) were created using models, services, platforms, and applications based on Stable Diffusion, which is open source.

DALL-E 2​

In April 2022, OpenAI released its image-generation model, DALL-E 2. For the first few months, it was available by invitation only, and the company gradually expanded access until September 2022, when the tool became available to all users without any barriers. Then OpenAI reported that users were generating more than 2 million images per day with DALL-E 2. We don’t know for sure what time period OpenAI meant by this number, or if they took an average volume of images generated. We assume that this is an average, which means that approximately 916 million images have been generated on a single platform in 15 months.

Midjourney​

Another prominent generative AI model, Midjourney, went live in July 2022. According to Photutorial’s estimate, Midjourney’s Discord (the algorithm is only available through Discord) receives about 20 to 40 jobs per second, with 15 million registered users and 1.5-2.5 million active members at any given time. With this in mind, we used 30 jobs per second as an average number of images created and get up to 2.5 million images created daily. As a result, there are 964 million images that have been created with Midjourney since its launch.

Stable Diffusion​

Stable Diffusion, a text-to-image model behind the AI company Stability AI, was released in August 2022. So far, we have two official places to test Stable Diffusion: these are Dreamstudio and Stability AI’s space on Hugging Face. According to Emad Mostaque, CEO of Stability AI, Stable Diffusion has more than 10 million users across all channels. If we extrapolate the Midjourney’s numbers and trends that we have at hand, it turns out that through the official Stable Diffusion channels, users generate 2 million images on a daily basis, and in more than a year since the release, this number has reached 690 million images.

However, the most challenging part is that Stable Diffusion is an open-source model, which means that the amount of content created with this model is not limited to what was produced on the official spaces owned by Stability AI. We have multiple platforms, applications, and services built on top of Stable Diffusion technology. The total audience of all these entities is also quite large and incalculable, and the amount of content they produce on a daily basis is really hard to estimate. And it’s growing all the time.

To get at least an idea of the scale, we took a look at some of the most popular repositories, such as GitHub, HuggingFace, and Civitai, which together have as many as tens of thousands of Stable Diffusion-based models uploaded by their users. We then went back to the Midjourney case and applied its trends to Stable Diffusion models on these platforms. However, just before we hit “publish,” we received an email from the Civitai team with some valuable statistics about their platform that helped us make our estimates more precise and accurate. For example, the Civitai team shared that they have a total of 213,994,828 model downloads on their platform, while the top 10 most downloaded models account for 2% of the total weekly downloads.

As a result, we recalculated some of our estimates and found that more than 11 billion images have been created using models from these three repositories. If we add other popular models (such as Runway, which we count separately) and the official channels of Stability AI, the number of images created with Stable Diffusion increases to 12.590 billion, which represents 80% of all AI images created with text-to-image algorithms.

Adobe Firefly​

The last and most recent model from our research was published in March 2023. Then, Adobe revealed Firefly, a suite of generative AI models focused on visual content creation. Within 6 weeks of launch, users created more than 100 million assets. With Firefly’s integration into Adobe Photoshop in May 2023, the number of images has grown exponentially, given the number of people using Photoshop worldwide. In its latest press-release, Adobe shared its AI image statistics: the number of images created with Adobe Firefly has reached 1 billion just 3 months after launch.


In total, more than 15 billion AI-created images have been generated using Stable Diffusion, Adobe Firefly, Midjourney, and DALLE-2. That’s more than Shutterstock’s entire library of photos, vectors, and illustrations, and one-third of the number of images ever uploaded to Instagram.
AI image statistics: amount of content across platforms vs. AI-created images
AI image statistics: time it took to reach 15 billion

Limitations​

To sum up, our exploration of the realm of AI image statistics based on available data, and extrapolations has shed light on the scope of this phenomenon. However, it’s important to acknowledge the limitations of our research.

While we’ve strived to provide insights into the volume of images generated by AI algorithms, obtaining accurate and up-to-date AI image statistics remains a challenge. In addition, the rapidly evolving nature of AI technology, combined with the increasing breadth of models and applications, makes it difficult to keep track of current data, which becomes outdated on a daily basis. As the landscape continues to evolve, we’re committed to updating these statistics as new information becomes available. Feel free to share your feedback along with any data, insights, or observations you may have to help us keep AI image statistics current and accurate.

Sources​

Civitai
EarthWeb, How Many Pictures Are on Instagram in 2023? (Photo Statistics), 2023
Photutorial, Midjourney statistics (Updated: August 2023), 2023
Photutorial, Shutterstock statistics (2023): Revenue, subscribers, market share, & more, 2023
Adobe, Adobe Firefly Expands Globally, Supports Prompts in Over 100 Languages, 2023
Social Shepherd, 22 Essential Pinterest Statistics You Need to Know in 2023, 2023
Bloomberg, Stability AI Raises Seed Round at $1 Billion Value, 2022
OpenAI, DALL·E now available without waitlist, 2022
Insider, Facebook Users Are Uploading 350 Million New Photos Each Day, 2013
Fstoppers, [Stats] How Many Photos Have Ever Been Taken?, 2012
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,799
Reputation
7,926
Daps
148,650


Google Gemini Eats The World – Gemini Smashes GPT-4 By 5X, The GPU-Poors​

Compute Resources That Make Everyone Look GPU-Poor​


DYLAN PATEL
AND
DANIEL NISHBALL
AUG 27, 2023
∙ PAID


Before Covid, Google released the MEENA model, which for a short period of time, was the best large language model in the world. The blog and paper Google wrote were incredibly cute, because it specifically compared against to OpenAI.

Compared to an existing state-of-the-art generative model, OpenAI GPT-2, Meena has 1.7x greater model capacity and was trained on 8.5x more data.
This model required more than 14x the FLOPS of GPT-2 to train, but this was largely irrelevant because only a few months later OpenAI dropped GPT-3, which was >65x more parameters and >60x the token count, >4,000x more FLOPS. The performance difference between these two models was massive.

The MEENA model sparked an internal memo written by Noam Shazeer titled "MEENA Eats The World.” In this memo, he predicted many of the things that the rest of the world woke up to after the release of ChatGPT. The key takeaways were that language models would get increasingly integrated into our lives in a variety of ways, and that they would dominate the globally deployed FLOPS. Noam was so far ahead of his time when he wrote this, but it was mostly ignored or even laughed at by key decision makers.

Let’s go on a tangent about how far ahead of his time, Noam really was. He was part of the team that did the original Transformer paper, “Attention is All You Need.” He also was part of the first modern Mixture of Experts paper, Switch Transformer, Image Transformer, and various elements of LaMDA and PaLM. One of the ideas from 2018 he hasn’t yet gotten credit for more broadly is speculative decoding which we detailed here in our exclusive tell-all about GPT-4. Speculative decoding reduces the cost of inference by multiple-fold.

The point here is Google had all the keys to the kingdom, but they fumbled the bag. A statement that is obvious to everyone.

The statement that may not be obvious is that the sleeping giant, Google has woken up, and they are iterating on a pace that will smash GPT-4 total pre-training FLOPS by 5x before the end of the year. The path is clear to 20x by the end of next year given their current infrastructure buildout. Whether Google has the stomach to put these models out publicly without neutering their creativity or their existing business model is a different discussion.

Today we want to discuss Google’s training systems for Gemini, the iteration velocity for Gemini models, Google’s Viperfish (TPUv5) ramp, Google’s competitiveness going forward versus the other frontier labs, and a crowd we are dubbing the GPU-Poor.

The GPU-Rich

Access to compute is a bimodal distribution. There are a handful of firms with 20k+ A/H100 GPUs, and individual researchers can access 100s or 1,000s of GPUs for pet projects. The chief among these are researchers at OpenAI, Google, Anthropic, Inflection, X, and Meta, who will have the highest ratios of compute resources to researchers. A few of the firms above as well as multiple Chinese firms will 100k+ by the end of next year, although we are unsure of the ratio of researchers in China, only the GPU volumes.

One of the funniest trends we see in the Bay area is with top ML researchers bragging about how many GPUs they have or will have access to soon. In fact, this has become so pervasive over the last ~4 months that it’s become a measuring contest that is directly influencing where top researchers decide to go. Meta, who will have the 2nd most number of H100 GPUs in the world, is actively using it as a recruiting tactic.

The GPU-Poor

Then there are a whole host of startups and open-source researchers who are struggling with far fewer GPUs. They are spending significant time and effort attempting to do things that simply don’t help, or frankly, matter. For example, many researchers are spending countless hours agonizing on fine-tuning models with GPUs that don’t have enough VRAM. This is an extremely counter-productive use of their skills and time.

These startups and open-source researchers are using larger LLMs to fine-tune smaller models for leaderboard style benchmarks with broken evaluation methods that give more emphasis to style rather than accuracy or usefulness. They are generally ignorant that pretraining datasets and IFT data need to be significantly larger/higher quality for smaller open models to improve in real workloads.

Yes, being efficient with GPUs is very important, but in many ways, that’s being ignored by the GPU-poors. They aren’t concerned with efficiency at scale, and their time isn’t being spent productively. What can be done commercially in their GPU-poor environment is mostly irrelevant to a world that will be flooded by more than 3.5 million H100s by the end of next year. For learning, experimenting, smaller weaker gaming GPUs are just fine.

The GPU poor are still mostly using dense models because that’s what Meta graciously dropped on their lap with the LLAMA series of models. Without God’s Zuck’s good grace, most open source projects would be even worse off. If they were actually concerned with efficiency, especially on the client side, they’d be running sparse model architectures like MoE, training on these larger datasets, and implementing speculative decoding like the Frontier LLM Labs (OpenAI, Anthropic, Google Deepmind).

The underdogs should be focusing on tradeoffs that improve model performance or token to token latency by upping compute and memory capacity requirements in favor of reduced memory bandwidth because that’s what the edge needs. They should be focused on efficient serving of multiple finetuned models on shared infrastructure without paying the horrendous cost penalties of small batch sizes. Instead, they continually are focused on memory capacity constraints or quantizing too far while covering their eyes about real quality decreases.

To take the rant on a slight tangent, in general, model evaluation is broken. While there is a lot of effort in the closed world to improve this, the land of open benchmarks is pointless and measures almost nothing useful. For some reason there is an unhealthy obsession over the leaderboard-ification of LLMs, and meming with silly names for useless models (WizardVicunaUncensoredXPlusPlatypus). Hopefully the open efforts are redirected towards evaluations, speculative decoding, MoE, open IFT data, and clean pre-training datasets with over 10 trillion tokens, otherwise, there is no way for the open source to compete with commercial giants.

While the US and China will be able to keep racing ahead, the European startups and government backed supercomputers such as Jules Verne are also completely uncompetitive. Europe will fall behind in this race due to the lack of ability to make big investments and choosing to stay GPU-poor. Even multiple Middle Eastern countries are investing more on enabling large scale infrastructure for AI.

Being GPU-poor isn’t limited to only scrappy startups though. Some of the most well recognized AI firms, HuggingFace, Databricks (MosaicML), and Together are also part of this GPU-poor group. In fact, they may be the most GPU-poor groups out there with regard to both the number of world class researchers per GPU and the number of GPUs versus the ambition/potential customer demand. They have world class researchers, but all of them are limited by working on systems with orders of magnitude less capabilities. These firms have tremendous inbound from enterprises on training real models, and on the order of thousands of H100s coming in, but that won’t be enough to grab much of the market.

Nvidia is eating their lunch with multiple times as many GPUs in their DGX Cloud service and various in-house supercomputers. Nvidia’s DGX Cloud offers pretrained models, frameworks for data processing, vector databases and personalization, optimized inference engines, APIs, and support from NVIDIA experts to help enterprises tune models for their custom use cases. That service has also already racked up multiple larger enterprises from verticals such as SaaS, insurance, manufacturing, pharmaceuticals, productivity software, and automotive. While not all customers are announced, even the public list of Amgen, Adobe, CCC, ServiceNow, Accenture, AstraZeneca, Getty Images, Shutterstock, Morningstar, Evozyne, Insilico Medicine, Quantiphi, InstaDeep, Oxford Nanopore, Peptone, Relation Therapeutics, ALCHEMAB Therapeutics, and Runway is quite impressive.

This is a far longer list than the other players have, and Nvidia has many other undisclosed partnerships too. To be clear, revenue from these announced customers of Nvidia’s DGX cloud service is unknown, but given the size of Nvidia’s cloud spending and in-house supercomputer construction, it seems that more services can/will be purchased from Nvidia’s Cloud than HuggingFace, Together, and Databricks can hope to offer, combined.

The few hundred million that HuggingFace and Together have raised collectively means they will remain GPU-poor, getting left in the dust as they will be unable to train N-1 LLMs that can serve as the base to fine tune for customers. This means they will ultimately be unable to capture high share at enterprises who can just access Nvidia’s service today anyways.

HuggingFace in particular has one the biggest names in the industry, and they need to leverage that to invest a huge amount and build a lot more model, customization, and inference capabilities. Their recent round was done at too high of a valuation to garner the investment they need to compete. HuggingFace’s leaderboards show how truly blind they are because they actively hurting the open source movement by tricking it into creating a bunch of models that are useless for real usage.

Databricks (MosaicML) could at least maybe catch up, due to their data and enterprise connections. The issue is they need to accelerate spend by multiple times if they want to have hopes of serving their over 7,000 customers. The $1.3B acquisition of MosaicML was a big bet on this vertical, but they also need to throw a similar amount of money at infrastructure. Unfortunately for Databricks, they can’t pay for GPUs in shares. They need to do a large offering via their upcoming private round/IPO, and use that cold hard cash to quadruple down on hardware.

The economic argument falls flat on its face because they must build before customers can come, because Nvidia is throwing money at their service. To be clear, many folks are buying loads of compute not making their money back, (Cohere, Saudi Arabia, UAE), but it is a pre-requisite to compete.

The picks and shovels training and inference ops firms (Databricks, HuggingFace, and Together) are behind their chief competition, who also happens to also be the source of almost all of their compute. The next largest operator of customized models is simply the fine tuning APIs from OpenAI.

The key here is everyone from Meta to Microsoft to startups are simply serving as a pipeline of capital to Nvidia’s bank account.

Can anyone save us from Nvidia slavery?

Yes, there is one potential savior.

Google – The Most Compute Rich Firm In The World

While Google does use GPUs internally as well as a significant number sold via GCP, they a have a few Ace’s up their sleeve. These include Gemini and the next iteration which has already begun training. The most important advantage they have is their unbeatably efficient infrastructure.

Before getting into Gemini and their cloud business, we will share some datapoints on their insane buildout. The chart below shows the total advanced chips added by quarter. Here we give OpenAI every benefit of the doubt. That the number of total GPUs they have will 4x over 2 years. For Google, we ignore their entire existing fleet of TPUv4 (Pufferfish), TPUv4 lite, and internally used GPUs. Furthermore, we are also not including the TPUv5 lite, despite that likely being the workhorse for inference of smaller language models. Google’s growth in this chart is only TPUv5 (Viperfish).
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,799
Reputation
7,926
Daps
148,650


Reinforced Self-Training (ReST) for Language Modeling​



Reinforcement learning from human feedback (RLHF) can improve the quality of large language model's (LLM) outputs by aligning them with human preferences. We propose a simple algorithm for aligning LLMs with human preferences inspired by growing batch reinforcement learning (RL), which we call Reinforced Self-Training (ReST). Given an initial LLM policy, ReST produces a dataset by generating samples from the policy, which are then used to improve the LLM policy using offline RL algorithms. ReST is more efficient than typical online RLHF methods because the training dataset is produced offline, which allows data reuse. While ReST is a general approach applicable to all generative learning settings, we focus on its application to machine translation. Our results show that ReST can substantially improve translation quality, as measured by automated metrics and human evaluation on machine translation benchmarks in a compute and sample-efficient manner.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,799
Reputation
7,926
Daps
148,650
SmartGPT: Major Benchmark Broken - 89.0% on MMLU + Exam's Many Errors

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,799
Reputation
7,926
Daps
148,650

AI images are getting harder to spot. Google thinks it has a solution.​

The tech giant is among companies pushing out AI tools while promising to build more tools to protect against their misuse​


By Gerrit De Vynck
August 29, 2023 at 8:00 a.m. EDT

imrs.php

Illustration by Elena Lacey/The Washington Post; iStock

Artificial intelligence-generated images are becoming harder to distinguish from real ones as tech companies race to improve their AI products. As the 2024 presidential campaign ramps up, concern is quickly rising that such images might be used to spread false information.

On Tuesday, Google announced a new tool — called SynthID — that it says could be part of the solution. The tool embeds a digital “watermark” directly into the image that can’t be seen by the human eye but can be picked up by a computer that’s been trained to read it. Google said its new watermarking tech is resistant to tampering, making it a key step toward policing the spread of fake images and slowing the dissemination of disinformation.

AI image generators have been available for several years and have been increasingly used to create “deepfakes” — false images purporting to be real. In March, fake AI images of former president Donald Trump running away from police went viral online, and in May a fake image showing an explosion at the Pentagon caused a momentary crash in stock markets. Companies have placed visible logos on AI images, as well as attached text “metadata” noting an image’s origin, but both techniques can be cropped or edited out relatively easily.

“Clearly the genie’s already out of the bottle,” Rep. Yvette D. Clarke (D-N.Y.), who has pushed for legislation requiring companies to watermark their AI images, said in an interview. “We just haven’t seen it maximized in terms of its weaponization.”
For now, the Google tool is available only to some paying customers of its cloud computing business — and it works only with images that were made with Google’s image-generator tool, Imagen. The company say it’s not requiring customers to use it because it’s still experimental.
The ultimate goal is to help create a system where most AI-created images can be easily identified using embedded watermarks, said Pushmeet Kohli, vice president of research at Google DeepMind, the company’s AI lab, who cautioned that the new tool isn’t totally foolproof. “The question is, do we have the technology to get there?”

As AI gets better at creating images and video, politicians, researchers and journalists are concerned that the line between what’s real and false online will be eroded even further, a dynamic that could deepen existing political divides and make it harder to spread factual information. The improvement in deepfake tech is coming as social media companies are stepping back from trying to police disinformation on their platforms.

Watermarking is one of the ideas that tech companies are rallying around as a potential way to decrease the negative impact of the “generative” AI tech they are rapidly pushing out to millions of people. In July, the White House hosted a meeting with the leaders of seven of the most powerful AI companies, including Google and ChatGPT maker OpenAI. The companies all pledged to create tools to watermark and detect AI-generated text, videos and images.

Microsoft has started a coalition of tech companies and media companies to develop a common standard for watermarking AI images, and the company has said it is researching new methods to track AI images. The company also places a small visible watermark in the corner of images generated by its AI tools. OpenAI, whose Dall-E image generator helped kick off the wave of interest in AI last year, also adds a visible watermark. AI researchers have suggested ways of embedding digital watermarks that the human eye can’t see but can be identified by a computer.

Kohli, the Google executive, said Google’s new tool is better because it works even after the image has been significantly changed — a key improvement over previous methods that could be easily thwarted by modifying or even flipping an image.

imrs.php

Google’s new tool digitally embeds watermarks not visible to the human eye onto AI-generated images. Even if an AI-generated image has been edited or manipulated, as seen here, the tool will still be able to detect the digital watermark. (Washington Post illustration; Google)

“There are other techniques that are out there for embedded watermarking, but we don’t think they are that reliable,” he said.

Even if other major AI companies like Microsoft and OpenAI develop similar tools and social media networks implement them, images made with open-source AI generators would be still be undetectable. Open-source tools like ones made by AI start-up Stability AI, which can be modified and used by anyone, are already being used to create nonconsensual sexual images of real people, as well as create new child sexual exploitation material.

“The last nine months to a year, we’ve seen this massive increase in deepfakes,” said Dan Purcell, founder of Ceartas, a company that helps online content creators identify if their content is being reshared without their permission. In the past, the company’s main clients have been adult content makers trying to stop their videos and images from being illicitly shared. But more recently, Purcell has been getting requests from people who have had their social media images used to make AI-generated pornography against their will.

As the United States heads toward the 2024 presidential election, there’s growing pressure to develop tools to identify and stop fake AI images. Already, politicians are using the tools in their campaign ads. In June, Florida Gov. Ron DeSantis’s campaign released a video that included fake images of Donald Trump hugging former White House coronavirus adviser Anthony S. Fauci.

U.S. elections have always featured propaganda, lies and exaggerations in official campaign ads, but researchers, democracy activists and some politicians are concerned that AI-generated images, combined with targeted advertising and social media networks, will make it easier to spread false information and mislead voters.

“That could be something as simple as putting out a visual depiction of an essential voting place that has been shut down,” said Clarke, the Democratic congresswoman. “It could be something that creates panic among the public, depicting some sort of a violent situation and creating fear.”

AI could be used by foreign governments that have already proved themselves willing to use social media and other technology to interfere in U.S. elections, she said. “As we get into the heat of the political season, as things heat up, we could easily see interference coming from our adversaries internationally.”

Looking closely at an image from Dall-E or Imagen usually reveals some inconsistency or bizarre feature, such as a person having too many fingers, or the background blurring into the subject of the photo. But fake image generators will “absolutely, 100 percent get better and better,” said Dor Leitman, head of product and research and development at Connatix, a company that builds tools that help marketers use AI to edit and generate videos.

The dynamic is going to be similar to how cybersecurity companies are locked in a never-ending arms race with hackers trying to find their way past newer and better protections, Leitman said. “It’s an ongoing battle.”

Those who want to use fake images to deceive people are also going to keep finding ways to confound deepfake detection tools. Kohli said that’s the reason Google isn’t sharing the underlying research behind its watermarking tech. “If people know how we have done it, they will try to attack it,” he said.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,799
Reputation
7,926
Daps
148,650

Phind-CodeLlama-34B-v2

We've fine-tuned Phind-CodeLlama-34B-v1 on an additional 1.5B tokens high-quality programming-related data, achieving 73.8% pass@1 on HumanEval. It's the current state-of-the-art amongst open-source models.

Furthermore, this model is instruction-tuned on the Alpaca/Vicuna format to be steerable and easy-to-use.

More details can be found on our blog post.

Model Details​

This model is fine-tuned from Phind-CodeLlama-34B-v1 and achieves 73.8% pass@1 on HumanEval.

Phind-CodeLlama-34B-v2 is multi-lingual and is proficient in Python, C/C++, TypeScript, Java, and more.

Dataset Details​

We fined-tuned on a proprietary dataset of 1.5B tokens of high quality programming problems and solutions. This dataset consists of instruction-answer pairs instead of code completion examples, making it structurally different from HumanEval. LoRA was not used -- both models are a native finetune. We used DeepSpeed ZeRO 3 and Flash Attention 2 to train these models in 15 hours on 32 A100-80GB GPUs. We used a sequence length of 4096 tokens.

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,799
Reputation
7,926
Daps
148,650

THE POISON OF ALIGNMENT​



ABSTRACT​

From the perspective of content safety issues, alignment has shown to limit large language models’ (LLMs) harmful content generation. This intentional method of reinforcing models to not respond to certain user inputs seem to be present in many modern open-source instruction tuning datasets such as OpenAssistant or Guanaco. We introduce a novel insight to an instruction-tuned model’s performance affected by the presence of alignment in supervised fine-tuning dataset. To be specific, we noticed that alignment acts as if it is poisoning the instruction dataset. Experimentally, we demonstrate that aligned answers significantly worsen the performance of the resulting fine-tuned model’s on various reasoning benchmarks such as Big Bench (BBH), Massive Multitask Language Understanding (MMLU), Human Eval, and Discrete Reasoning Over Paragraphs (DROP), performing worse than the counterpart tuned without alignment by 4-33%
 
Top