bnew

Veteran
Joined
Nov 1, 2015
Messages
58,208
Reputation
8,623
Daps
161,872

Chinese Nobel laureate Mo Yan shocks audience after revealing he used ChatGPT to write speech​

‘I was supposed to write a commendation for [Yu Hua] as per tradition, but I struggled for several days,’ novelist said​

Peony Hirwani
1 day ago
Comments

ChineseNobel
laureate Mo Yan revealed he used ChatGPT to write a speech to praise fellow author Yu Hua.

This week, the 68-year-old novelist presented a book award to Yu at the Shanghai Dance Centre during the 65th-anniversary celebration of Shouhuo magazine.


“The person who is receiving this award is truly remarkable and, of course, he is also my good friend. He is extraordinary, so I must be too,” Mo said during his speech.


“A few days ago, I was supposed to write a commendation for him as per tradition, but I struggled for several days and couldn’t come up with anything. So I asked a doctoral student to help me by using ChatGPT.”

According to South China Morning Post, there was an “audible gasp” from the audience when they found out that the Nobel Prize winner crafted his speech using artificial intelligence.

The Independent attempted reaching out to Mo’s representatives for comment.

Mo is a Chinese novelist and short story writer.

In 2012, he was awarded the Nobel Prize in Literature for his work as a writer “who with hallucinatory realism merges folk tales, history and the contemporary”.

GettyImages-163070663.jpeg

(AFP via Getty Images)

He is best known to global readers for his 1986 novel Red Sorghum, the first two parts of which were adapted into the Golden Bear-winning film of the same name.

The author won the 2005 International Nonino Prize in Italy. In 2009, he was the first recipient of the University of Oklahoma’s Newman Prize for Chinese Literature.

So far, Mo has written 11 novels, and several novellas and short story collections.

A video of Mo’s speech at the Shanghai Dance Centre has gone viral on the Chinese microblogging website Weibo.

In the comments section, some people pointed out that Mo could face legal trouble for mentioning ChatGPT as the service has not yet been made available in China.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,208
Reputation
8,623
Daps
161,872

About​

Consistent Image Synthesis and Editing

ljzycmd.github.io/projects/MasaCtrl/

MasaCtrl: Tuning-free Mutual Self-Attention Control for Consistent Image Synthesis and Editing​

Pytorch implementation of MasaCtrl: Tuning-free Mutual Self-Attention Control for Consistent Image Synthesis and Editing

Mingdeng Cao
, Xintao Wang, Zhongang Qi, Ying Shan, Xiaohu Qie, Yinqiang Zheng

arXiv
Project page demo demo


MasaCtrl enables performing various consistent non-rigid image synthesis and editing without fine-tuning and optimization.

Updates​

  • [2023/5/13] The inference code of MasaCtrl with T2I-Adapter is available.
  • [2023/4/28] Hugging Face demo released.
  • [2023/4/25] Code released.
  • [2023/4/17] Paper is available here.

Introduction​

We propose MasaCtrl, a tuning-free method for non-rigid consistent image synthesis and editing. The key idea is to combine the contents from the source image and the layout synthesized from text prompt and additional controls into the desired synthesized or edited image, with Mutual Self-Attention Control.

Main Features​

1 Consistent Image Synthesis and Editing​

MasaCtrl can perform prompt-based image synthesis and editing that changes the layout while maintaining contents of source image.

The target layout is synthesized directly from the target prompt.
Consistent synthesis results Real image editing results

2 Integration to Controllable Diffusion Models​

Directly modifying the text prompts often cannot generate target layout of desired image, thus we further integrate our method into existing proposed controllable diffusion pipelines (like T2I-Adapter and ControlNet) to obtain stable synthesis and editing results.

The target layout controlled by additional guidance.
Synthesis (left part) and editing (right part) results with T2I-Adapter

3 Generalization to Other Models: Anything-V4​

Our method also generalize well to other Stable-Diffusion-based models.

Results on Anything-V4
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,208
Reputation
8,623
Daps
161,872
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,208
Reputation
8,623
Daps
161,872




💥 CoAdapter: Huggingface CoAdapter | T2I-Adapter: Huggingface T2I-Adapter
🎨Demos | ⏬Download Models | 💻How to Test | 🏰Adapter Zoo
Official implementation of T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models.


🚩 New Features/Updates


🔥🔥🔥 Support CoAdapter (Composable Adapter).
You can find the details and demos about CoAdapter from coadapter.md


Introduction​


We propose T2I-Adapter, a simple and small (~70M parameters, ~300M storage space) network that can provide extra guidance to pre-trained text-to-image models while freezing the original large text-to-image models.

T2I-Adapter aligns internal knowledge in T2I models with external control signals. We can train various adapters according to different conditions, and achieve rich control and editing effects.


⏬ Download Models​

Put the downloaded models in the T2I-Adapter/models folder.

  1. You can find the pretrained T2I-Adapters, CoAdapters, and third party models from TencentARC/T2I-Adapter · Hugging Face.
  2. A base SD model is still needed to inference. We recommend to use Stable Diffusion v1.5. But please note that the adapters should work well on other SD models which are finetuned from SD-V1.4 or SD-V1.5. You can download these models from HuggingFace or civitai, all the following tested models (e.g., Anything anime model) can be found in there.
  3. [Optional] If you want to use mmpose adapter, you need to download the pretrained keypose detection models include FasterRCNN (human detection) and HRNet (pose detection).
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,208
Reputation
8,623
Daps
161,872




A responsible path to generative AI in healthcare​

April 13, 2023
https://storage.googleapis.com/gweb-cloudblog-publish/images/GCP_x_Health_hero.max-2600x2600.jpg


Aashima Gupta​

Global Director of Healthcare Strategy & Solutions, Google Cloud

Amy Waldron​

Global Director of Health Plan Strategy & Solutions, Google Cloud


Healthcare breakthroughs change the world and bring hope to humanity through scientific rigor, human insight, and compassion. We believe AI can contribute to this, with thoughtful collaboration between researchers, healthcare organizations and the broader ecosystem.

Today, we're sharing exciting progress on these initiatives, with the announcement of limited access to Google’s medical large language model, or LLM, called Med-PaLM 2. It will be available in coming weeks to a select group of Google Cloud customers for limited testing, to explore use cases and share feedback as we investigate safe, responsible, and meaningful ways to use this technology.

Med-PaLM 2 harnesses the power of Google’s LLMs, aligned to the medical domain to more accurately and safely answer medical questions. As a result, Med-PaLM 2 was the first LLM to perform at an “expert” test-taker level performance on the MedQA dataset of US Medical Licensing Examination (USMLE)-style questions, reaching 85%+ accuracy, and it was the first AI system to reach a passing score on the MedMCQA dataset comprising Indian AIIMS and NEET medical examination questions, scoring 72.3%.

Industry-tailored LLMs like Med-PaLM 2 are part of a burgeoning family of generative AI technologies that have the potential to significantly enhance healthcare experiences. We’re looking forward to working with our customers to understand how Med-PaLM 2 might be used to facilitate rich, informative discussions, answer complex medical questions, and find insights in complicated and unstructured medical texts. They might also explore its utility to help draft short- and long-form responses and summarize documentation and insights from internal data sets and bodies of scientific knowledge.

Innovating responsibly with AI​

Since last year, we’ve been researching and evaluating Med-PaLM and Med-PaLM 2, assessing it against multiple criteria — including scientific consensus, medical reasoning, knowledge recall, bias, and likelihood of possible harm — which were evaluated by clinicians and non-clinicians from a range of backgrounds and countries.

Med-PaLM 2's impressive performance on medical exam-style questions is a promising development, but we need to learn how this can be harnessed to benefit healthcare workers, researchers, administrators, and patients. In building Med-PaLM 2, we’ve been focused on safety, equity, and evaluations of unfair bias. Our limited access for select Google Cloud customers will be an important step in furthering these efforts, bringing in additional expertise across the healthcare and life sciences ecosystem.

What’s more, when Google Cloud brings new AI advances to our products, our commitment is two-fold: to not only deliver transformative capabilities, but also ensure our technologies include proper protections for our organizations, their users, and society. To this end, our AI Principles, established in 2017, form a living constitution that guides our approach to building advanced technologies, conducting research, and drafting our product development policies.

From AI to generative AI​

Google's deep history in AI informs our work in generative AI technologies, which can find complex relationships in large sets of training data, then generalize from what they learn to create new data. Breakthroughs such as the Transformer have enabled LLMs and other large models to scale to billions of parameters, letting generative AI move beyond the limited pattern-spotting of earlier AIs and into the creation of novel expressions of content, from speech to scientific modeling.

Google Cloud is committed to bringing to market products that are informed by our research efforts across Alphabet. In 2022, we introduced a deep integration between Google Cloud and Alphabet's AI research organizations, which allows Vertex AI to run DeepMind's groundbreaking protein structure prediction system, AlphaFold.

Much more is on the way. In one sense, generative AI is revolutionary. In another, it's the familiar technology story of more and better computing creating new industries, from desktop publishing to the internet, social networks, mobile apps, and now, generative AI.

Building on AI leadership​

Additionally, today we’re announcing a new AI-enabled Claims Acceleration Suite, designed to streamline processes for health insurance prior authorization and claims processing. The Claims Acceleration Suite helps both providers of insurance plans and healthcare to create operational efficiencies and reduce administrative burdens and costs by converting unstructured data into structured data that help experts make faster decisions and improve access to timely patient care.

On the clinical side, last year we announced Medical Imaging Suite, an AI-assisted diagnosis technology being used by Hologic to improve cervical cancer diagnoses and Hackensack Meridian Health to predict metastasis in patients with prostate cancer. Elsewhere, Mayo Clinic and Google have collaborated on an AI algorithm to improve the care of head and neck cancers, and Google Health recently partnered with iCAD to improve breast cancer screening with AI.

From these examples and more, it's clear that the healthcare industry has moved from testing AI to deploying it to improve workflows, solve business problems, and speed healing. With this in mind, we expect rapid interest in and uptake of generative AI technologies. Healthcare organizations are eager to learn about generative AI and how they can use it to make a real difference.

Looking ahead​

The power of AI has reinforced Google Cloud's commitment to privacy, security, and transparency. Our platforms are designed to be flexible, including data and model lineage capabilities, integrated security and identity management services, support for third-party models, choice and transparency on models and costs, integrated billing and entitlement support, and support across many languages.

While we’ll have some innovations like Med-PaLM 2 that are tuned for healthcare, we also have products that are relevant across industries. Last month, we announced several generative AI capabilities coming to Google Cloud, including Generative AI support in Vertex AI and Generative AI App Builder, which are already being tested by a number of customers. Developers and businesses already use Vertex AI to build and deploy machine learning models and AI applications at scale, and we recently added Generative AI support in Vertex AI. This gives customers foundation models they can fine-tune with their own data, and the ability to deploy applications with this powerful new technology. We also launched Generative AI App Builder to help organizations build their own AI-powered chat interfaces and digital assistants in minutes or hours by connecting conversational AI flows with out-of-the-box search experiences and foundation models.

As AI proves its value, it's likely there will be increased focus on high-quality data collection and curation in healthcare and life sciences. Improving the flow and unification of data across health care systems, referred to as data interoperability, is one of the most important building blocks to leveraging AI, and it helps organizations run more effectively, improve patient care, and helps people live healthier lives. We expect to continue our investments in technology, infrastructure, and data governance.

We're committed to realizing the potential of this technology in healthcare. By working with a handful of trusted healthcare organizations early on, we’ll learn more about what can be achieved, and how this technology can safely advance. For all of us, the prospects are inspiring, humbling, and exciting.

If you’re interested in exploring generative AI on Cloud, you can sign-up for our Trusted Tester program or reach out to your Google Cloud sales representative.
 
Last edited:

Majestyx

Duck Season
Joined
May 2, 2012
Messages
17,079
Reputation
2,495
Daps
39,502
Reppin
Los Scandalous
my goodneess! it opened it's mouth. :mindblown:


edit:

whoa


DragGAN.gif


project page:
me and some brehs that work in animation were talking about this, brehs is looking for new jobs :francis:
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,208
Reputation
8,623
Daps
161,872

About​

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model​

Foundation models have made significant strides in various applications, including text-to-image generation, panoptic segmentation, and natural language processing. This paper presents Instruct2Act, a framework that utilizes Large Language Models to map multi-modal instructions to sequential actions for robotic manipulation tasks. Specifically, Instruct2Act employs the LLM model to generate Python programs that constitute a comprehensive perception, planning, and action loop for robotic tasks. In the perception section, pre-defined APIs are used to access multiple foundation models where the Segment Anything Model (SAM) accurately locates candidate objects, and CLIP classifies them. In this way, the framework leverages the expertise of foundation models and robotic abilities to convert complex high-level instructions into precise policy codes. Our approach is adjustable and flexible in accommodating various instruction modalities and input types and catering to specific task demands. We validated the practicality and efficiency of our approach by assessing it on robotic tasks in different scenarios within tabletop manipulation domains. Furthermore, our zero-shot method outperformed many state-of-the-art learning-based policies in several tasks.




google bard summary:

  • Researchers created a new way to make robots follow instructions.
  • The new way uses a large language model to translate instructions into code that the robot can understand.
  • The new way is more flexible and efficient than previous methods.
  • The new way was tested on a variety of tasks and it worked well.
Here's a more detailed explanation of how the new method works:

The new method uses a large language model (LLM) to translate instructions into code. The LLM is a computer program that has been trained on a massive dataset of text and code. This training allows the LLM to understand the meaning of instructions and to generate code that can be executed by a robot.

The instructions that the LLM can translate are called multi-modal instructions. Multi-modal instructions are instructions that include information from multiple sources. For example, a multi-modal instruction might include a picture of an object, the name of the object, and a description of what to do with the object.

The LLM uses the information from the multi-modal instruction to generate code that tells the robot what to do. The code that the LLM generates is specific to the task that the robot is supposed to perform. For example, if the robot is supposed to pick up a cup, the LLM will generate code that tells the robot how to move its arm and its hand to pick up the cup.

The new method is more flexible and efficient than previous methods for making robots follow instructions. Previous methods were limited to translating simple instructions. The new method can translate complex instructions, including instructions that include information from multiple sources. The new method is also more efficient than previous methods. It can translate instructions much faster than previous methods.

The new method was tested on a variety of tasks and it worked well. The method was able to successfully translate instructions and make robots follow those instructions. The method was also able to generalize to new tasks. This means that the method can be used to make robots follow instructions for tasks that it has never seen before.

The new method is a significant improvement over previous methods for making robots follow instructions. It is more flexible, efficient, and generalizable than previous methods. The new method has the potential to make robots more useful and more accessible.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,208
Reputation
8,623
Daps
161,872

FwkW0NAWIAM9cvq

Pretty sure bard is worse that Vicuna-13B. Vicuna-13B is almost as good as 3.5-turbo for most tasks except coding (verified by Microsoft as well https://github.com/microsoft/guidance/blob/8677f3aa269e05ecbb942585560a44db51d507ca/notebooks/chatgpt_vs_open_source_on_harder_tasks.ipynb…) Not sure what's plan at Goog HQ. GPT-4 is leaps and bounds better than every model rn.


Mother of all LLM benchmarks! - Use GPT-4 if you need best quality - Use claude-instant-v1 for everything else - Google PaLM2 is nowhere near OpenAI/Anthropic - OpenAI models are painfully slow compared to competition - Open-source models next source: https://github.com/kagisearch/pyllms




 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,208
Reputation
8,623
Daps
161,872

Numbers every LLM Developer should know​

At Google, there was a document put together by Jeff Dean, the legendary engineer, called Numbers every Engineer should know. It’s really useful to have a similar set of numbers for LLM developers to know that are useful for back-of-the envelope calculations. Here we share particular numbers we at Anyscale use, why the number is important and how to use it to your advantage.

Notes on the Github version​

Last updates: 2023-05-17

If you feel there's an issue with the accuracy of the numbers, please file an issue. Think there are more numbers that should be in this doc? Let us know or file a PR.

We are thinking the next thing we should add here is some stats on tokens per second of different models.

Prompts​

40-90%1: Amount saved by appending “Be Concise” to your prompt​

It’s important to remember that you pay by the token for responses. This means that asking an LLM to be concise can save you a lot of money. This can be broadened beyond simply appending “be concise” to your prompt: if you are using GPT-4 to come up with 10 alternatives, maybe ask it for 5 and keep the other half of the money.

1.3:1 -- Average tokens per word​

LLMs operate on tokens. Tokens are words or sub-parts of words, so “eating” might be broken into two tokens “eat” and “ing”. A 750 word document in English will be about 1000 tokens. For languages other than English, the tokens per word increases depending on their commonality in the LLM's embedding corpus.

Knowing this ratio is important because most billing is done in tokens, and the LLM’s context window size is also defined in tokens.

Prices2

Prices are of course subject to change, but given how expensive LLMs are to operate, the numbers in this section are critical. We use OpenAI for the numbers here, but prices from other providers you should check out (Anthropic, Cohere) are in the same ballpark.

~50:1 -- Cost Ratio of GPT-4 to GPT-3.5 Turbo3

What this means is that for many practical applications, it’s much better to use GPT-4 for things like generation and then use that data to fine tune a smaller model. It is roughly 50 times cheaper to use GPT-3.5-Turbo than GPT-4 (the “roughly” is because GPT-4 charges differently for the prompt and the generated output) – so you really need to check on how far you can get with GPT-3.5-Turbo. GPT-3.5-Turbo is more than enough for tasks like summarization for example.

5:1 -- Cost Ratio of generation of text using GPT-3.5-Turbo vs OpenAI embedding​

This means it is way cheaper to look something up in a vector store than to ask an LLM to generate it. E.g. “What is the capital of Delaware?” when looked up in an neural information retrieval system costs about 5x4 less than if you asked GPT-3.5-Turbo. The cost difference compared to GPT-4 is a whopping 250x!

10:1 -- Cost Ratio of OpenAI embedding to Self-Hosted embedding​

Note: this number is sensitive to load and embedding batch size, so please consider this approximate.
In our blog post, we noted that using a g4dn.4xlarge (on-demand price: $1.20/hr) we were able to embed at about 9000 tokens per second using Hugging Face’s SentenceTransformers (which are pretty much as good as OpenAI’s embeddings). Doing some basic math of that rate and that node type indicates it is considerably cheaper (factor of 10 cheaper) to self-host embeddings (and that is before you start to think about things like ingress and egress fees).

6:1 -- Cost Ratio of OpenAI fine tuned vs base model queries​

It costs you 6 times as much to serve a fine tuned model as it does the base model on OpenAI. This is pretty exorbitant, but might make sense because of the possible multi-tenancy of base models. It also means it is far more cost effective to tweak the prompt for a base model than to fine tune a customized model.

1:1 -- Cost Ratio of Self-Hosted base vs fine-tuned model queries​

If you’re self hosting a model, then it more or less costs the same amount to serve a fine tuned model as it does to serve a base one: the models have the same number of parameters.

Training and Fine Tuning​

~$1 million: Cost to train a 13 billion parameter model on 1.4 trillion tokens​

The LLaMa paper mentions it took them 21 days to train LLaMa using 2048 GPUs A100 80GB GPUs. We considered training our own model on the Red Pajama training set, then we ran the numbers. The above is assuming everything goes right, nothing crashes, and the calculation succeeds on the first time, etc. Plus it involves the coordination of 2048 GPUs. That’s not something most companies can do (shameless plug time: of course, we at Anyscale can – that’s our bread and butter! Contact us if you’d like to learn more). The point is that training your own LLM is possible, but it’s not cheap. And it will literally take days to complete each run. Much cheaper to use a pre-trained model.

< 0.001: Cost ratio of fine tuning vs training from scratch​

This is a bit of a generalization, but the cost of fine tuning is negligible. We showed for example that you can fine tune a 6B parameter model for about $7. Even at OpenAI’s rate for its most expensive fine-tunable model, Davinci, it is 3c per 1000 tokens. That means to fine tune on the entire works of Shakespeare (about 1 million words), you’re looking at $405. However, fine tuning is one thing and training from scratch is another …

GPU Memory​

If you’re self-hosting a model, it’s really important to understand GPU memory because LLMs push your GPU’s memory to the limit. The following statistics are specifically about inference. You need considerably more memory for training or fine tuning.

V100: 16GB, A10G: 24GB, A100: 40/80GB: GPU Memory Capacities​

It may seem strange, but it’s important to know the amount of memory different types of GPUs have. This will cap the number of parameters your LLM can have. Generally, we like to use A10Gs because they cost $1.50 to $2 per hour each at AWS on-demand prices and have 24G of GPU memory, vs the A100s which will run you about $5 each at AWS on-demand prices.

2x number of parameters: Typical GPU memory requirements of an LLM for serving​

For example, if you have a 7 billion parameter model, it takes about 14GB of GPU space. This is because most of the time, one 16-bit float (or 2 bytes) is required per parameter. There’s usually no need to go beyond 16-bit accuracy, and most of the time when you go to 8-bit accuracy you start to lose resolution (though that may be acceptable in some cases). Of course there are efforts to reduce this, notably llama.cpp which runs a 13 billion parameter model on a 6GB GPU by quantizing aggressively down to 4 bits (and 8 bits without too much impact), but that’s atypical.

~1GB: Typical GPU memory requirements of an embedding model​

Whenever you are doing sentence embedding (a very typical thing you do for clustering, semantic search and classification tasks), you need an embedding model like sentence transformers. OpenAI also has its own embeddings that they provide commercially.

You typically don’t have to worry about how much memory embeddings take on the GPU, they’re fairly small. We’ve even had the embedding and the LLM on the same GPU.

>10x: Throughput improvement from batching LLM requests​

Running an LLM query through a GPU is very high latency: it may take, say, 5 seconds, with a throughput of 0.2 queries per second. The funny thing is, though, if you run two tasks, it might only take 5.2 seconds. This means that if you can bundle 25 queries together, it would take about 10 seconds, and our throughput has improved to 2.5 queries per second. However, see the next point.

~1 MB: GPU Memory required for 1 token of output with a 13B parameter model​

The amount of memory you need is directly proportional to the maximum number of tokens you want to generate. So for example, if you want to generate outputs of up to 512 tokens (about 380 words), you need 512MB. No big deal you might say – I have 24GB to spare, what’s 512MB? Well, if you want to run bigger batches it starts to add up. So if you want to do batches of 16, you need 8GB of space. There are some techniques being developed that overcome this, but it’s still a real issue.

Cheatsheet​

Screenshot 2023-05-17 at 1 46 09 PM
 
Top