bnew

Veteran
Joined
Nov 1, 2015
Messages
51,832
Reputation
7,926
Daps
148,769

About​

A custom extension for sd-webui that allow you to generate 3D model from txt or image, basing on OpenAI Shap-E.

Stable Diffusion WebUI Txt/Img To 3D Model​

A custom extension for AUTOMATIC1111/stable-diffusion-webui that allow you to generate 3D model from txt or image, basing on OpenAI Shap-E.
overall_img_to_3d overall_txt_to_3d
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,832
Reputation
7,926
Daps
148,769

Try chatting with fine-tuned models for Falcon-7B, Falcon-40B, and the new Open-Llama-7B​



Making h2oGPT models available to everyone. For more information, visit our GitHub pages: H2O LLM Studio and h2oGPT
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,832
Reputation
7,926
Daps
148,769

About​

H2O LLM Studio - a framework and no-code GUI for fine-tuning LLMs


Welcome to H2O LLM Studio, a framework and no-code GUI designed for
fine-tuning state-of-the-art large language models (LLMs).​

home logs


With H2O LLM Studio, you can​

  • easily and effectively fine-tune LLMs without the need for any coding experience.
  • use a graphic user interface (GUI) specially designed for large language models.
  • finetune any LLM using a large variety of hyperparameters.
  • use recent finetuning techniques such as Low-Rank Adaptation (LoRA) and 8-bit model training with a low memory footprint.
  • use advanced evaluation metrics to judge generated answers by the model.
  • track and compare your model performance visually. In addition, Neptune integration can be used.
  • chat with your model and get instant feedback on your model performance.
  • easily export your model to the Hugging Face Hub and share it with the community.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,832
Reputation
7,926
Daps
148,769

OpenAI, DeepMind will open up models to UK government​


BY LAURIE CLARKE
JUNE 12, 2023 12:07 PM CET
2 MINUTES READ


LONDON — Google DeepMind, OpenAI and Anthropic have agreed to open up their AI models to the U.K. government for research and safety purposes, Prime Minister Rishi Sunak announced at London Tech Week on Monday.

The priority access will be granted in order “to help build better evaluations and help us better understand the opportunities and risks of these systems,” Sunak said.

The announcement came in a speech that championed the promise of AI to transform areas such as education and healthcare and heralded the U.K.’s potential as an “island of innovation.”

“AI is surely one of the greatest opportunities before us,” said Sunak. By combining AI models with the power of quantum, “the possibilities are extraordinary,” he marvelled.

“But we must and we will do it safely,” he continued. “I know people are concerned.”

In March, the U.K. government published an AI white paper which set out a “pro-innovation” approach, but more recently Sunak has emphasized the need for “guardrails.”

Sunak said on Monday the U.K.’s ambition was to be “not just the intellectual home, but the geographical home of global AI safety regulation.” He declined to set out specific proposals for regulation.

The lynchpin of these plans is a global summit on AI safety that will be held in the U.K. in the fall, first reported by POLITICO. Sunak likened the summit to an AI-version of UN COP climate change conferences.

A Foundation Model Taskforce will also pioneer research on AI safety and assurance techniques, backed by £100 million of funding.

In the speech to London Tech Week, Sunak also name-checked semiconductors, synthetic biology and quantum as key areas of focus for the U.K. and said the country’s agile and balanced approach to regulation would continue to make it an attractive place to invest.

Anthropic and OpenAI have recently opened up European headquarters in the U.K. Palantir announced it was setting up an AI research hub in the U.K. last week.
 

morris

Superstar
Joined
Oct 8, 2014
Messages
16,249
Reputation
4,875
Daps
35,721
Funny. I used Bing's ChatAI to formulate a writing sample for an interview.

Denied still:snoop:
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,832
Reputation
7,926
Daps
148,769

About​

Specify what you want it to build, the AI asks for clarification, and then builds it.

GPT Engineer​

Specify what you want it to build, the AI asks for clarification, and then builds it.

GPT Engineer is made to be easy to adapt, extend, and make your agent learn how you want your code to look. It generates an entire codebase based on a prompt.

Project philosophy​

  • Simple to get value
  • Flexible and easy to add new own "AI steps". See steps.py.
  • Incrementally build towards a user experience of:
    1. high level prompting
    2. giving feedback to the AI that it will remember over time
  • Fast handovers back and forth between AI and human
  • Simplicity, all computation is "resumable" and persisted to the filesystem


video demo:​

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,832
Reputation
7,926
Daps
148,769

Augmenting Language Models with Long-Term Memory​

Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei
Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history. We design a novel decoupled network architecture with the original backbone LLM frozen as a memory encoder and an adaptive residual side-network as a memory retriever and reader. Such a decoupled memory design can easily cache and update long-term past contexts for memory retrieval without suffering from memory staleness. Enhanced with memory-augmented adaptation training, LongMem can thus memorize long past context and use long-term memory for language modeling. The proposed memory retrieval module can handle unlimited-length context in its memory bank to benefit various downstream tasks. Typically, LongMem can enlarge the long-form memory to 65k tokens and thus cache many-shot extra demonstration examples as long-form memory for in-context learning. Experiments show that our method outperforms strong long-context models on ChapterBreak, a challenging long-context modeling benchmark, and achieves remarkable improvements on memory-augmented in-context learning over LLMs. The results demonstrate that the proposed method is effective in helping language models to memorize and utilize long-form contents. Our code is open-sourced at this https URL.

LongMem​


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,832
Reputation
7,926
Daps
148,769

Function calling and other API updates​

We’re announcing updates including more steerable API models, function calling capabilities, longer context, and lower prices.

June 13, 2023

Authors​

We released gpt-3.5-turbo and gpt-4 earlier this year, and in only a short few months, have seen incredible applications built by developers on top of these models.

Today, we’re following up with some exciting updates:
  • new function calling capability in the Chat Completions API
  • updated and more steerable versions of gpt-4 and gpt-3.5-turbo
  • new 16k context version of gpt-3.5-turbo (vs the standard 4k version)
  • 75% cost reduction on our state-of-the-art embeddings model
  • 25% cost reduction on input tokens for gpt-3.5-turbo
  • announcing the deprecation timeline for the gpt-3.5-turbo-0301 and gpt-4-0314 models
All of these models come with the same data privacy and security guarantees we introduced on March 1 — customers own all outputs generated from their requests and their API data will not be used for training.

Function calling​

Developers can now describe functions to gpt-4-0613 and gpt-3.5-turbo-0613, and have the model intelligently choose to output a JSON object containing arguments to call those functions. This is a new way to more reliably connect GPT's capabilities with external tools and APIs.
These models have been fine-tuned to both detect when a function needs to be called (depending on the user’s input) and to respond with JSON that adheres to the function signature. Function calling allows developers to more reliably get structured data back from the model. For example, developers can:
  • Create chatbots that answer questions by calling external tools (e.g., like ChatGPT Plugins)
Convert queries such as “Email Anya to see if she wants to get coffee next Friday” to a function call like send_email(to: string, body: string), or “What’s the weather like in Boston?” to get_current_weather(location: string, unit: 'celsius' | 'fahrenheit').
  • Convert natural language into API calls or database queries
Convert “Who are my top ten customers this month?” to an internal API call such as get_customers_by_revenue(start_date: string, end_date: string, limit: int), or “How many orders did Acme, Inc. place last month?” to a SQL query using sql_query(query: string).
  • Extract structured data from text
Define a function called extract_people_data(people: [{name: string, birthday: string, location: string}]), to extract all people mentioned in a Wikipedia article.
These use cases are enabled by new API parameters in our /v1/chat/completions endpoint, functions and function_call, that allow developers to describe functions to the model via JSON Schema, and optionally ask it to call a specific function. Get started with our developer documentation and add evals if you find cases where function calling could be improved

Function calling example​


What’s the weather like in Boston right now?

Step 1·OpenAI API
Call the model with functions and the user’s input

Step 2·Third party API
Use the model response to call your API

Step 3·OpenAI API
Send the response back to the model to summarize


The weather in Boston is currently sunny with a temperature of 22 degrees Celsius.

Since the alpha release of ChatGPT plugins, we have learned much about making tools and language models work together safely. However, there are still open research questions. For example, a proof-of-concept exploit illustrates how untrusted data from a tool’s output can instruct the model to perform unintended actions. We are working to mitigate these and other risks. Developers can protect their applications by only consuming information from trusted tools and by including user confirmation steps before performing actions with real-world impact, such as sending an email, posting online, or making a purchase.

New models​

GPT-4​

gpt-4-0613 includes an updated and improved model with function calling.
gpt-4-32k-0613 includes the same improvements as gpt-4-0613, along with an extended context length for better comprehension of larger texts.
With these updates, we’ll be inviting many more people from the waitlist to try GPT-4 over the coming weeks, with the intent to remove the waitlist entirely with this model. Thank you to everyone who has been patiently waiting, we are excited to see what you build with GPT-4!

GPT-3.5 Turbo​

gpt-3.5-turbo-0613 includes the same function calling as GPT-4 as well as more reliable steerability via the system message, two features that allow developers to guide the model's responses more effectively.
gpt-3.5-turbo-16k offers 4 times the context length of gpt-3.5-turbo at twice the price: $0.003 per 1K input tokens and $0.004 per 1K output tokens. 16k context means the model can now support ~20 pages of text in a single request.

Model deprecations​

Today, we’ll begin the upgrade and deprecation process for the initial versions of gpt-4 and gpt-3.5-turbo that we announced in March. Applications using the stable model names (gpt-3.5-turbo, gpt-4, and gpt-4-32k) will automatically be upgraded to the new models listed above on June 27th. For comparing model performance between versions, our Evals library supports public and private evals to show how model changes will impact your use cases.

Developers who need more time to transition can continue using the older models by specifying gpt-3.5-turbo-0301, gpt-4-0314, or gpt-4-32k-0314 in the ‘model’ parameter of their API request. These older models will be accessible through September 13th, after which requests specifying those model names will fail. You can stay up to date on model deprecations via our model deprecation page. This is the first update to these models; so, we eagerly welcome developer feedback to help us ensure a smooth transition.

Lower pricing​

We continue to make our systems more efficient and are passing those savings on to developers, effective today.

Embeddings​

text-embedding-ada-002 is our most popular embeddings model. Today we’re reducing the cost by 75% to $0.0001 per 1K tokens.

GPT-3.5 Turbo​

gpt-3.5-turbo is our most popular chat model and powers ChatGPT for millions of users. Today we're reducing the cost of gpt-3.5-turbo’s input tokens by 25%. Developers can now use this model for just $0.0015 per 1K input tokens and $0.002 per 1K output tokens, which equates to roughly 700 pages per dollar.
gpt-3.5-turbo-16k will be priced at $0.003 per 1K input tokens and $0.004 per 1K output tokens.
Developer feedback is a cornerstone of our platform’s evolution and we will continue to make improvements based on the suggestions we hear. We’re excited to see how developers use these latest models and new features in their applications.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,832
Reputation
7,926
Daps
148,769

The Beatles will release a final record, using John Lennon's voice via an AI assist​


June 13, 202312:41 PM ET
By

Bill Chappell

gettyimages-3165523-7a8ca3fe9d09d0c0678ccd0f0f71caec50d674f2-s800-c85.webp

The voice of John Lennon, seen here in 1963, will appear on The Beatles' new record, says his former bandmate Paul McCartney.
Hulton Archive/Getty Images


The music has analog roots, but now it's being revived by futuristic technology: The Beatles have completed a new recording using an old demo tape by John Lennon, thanks to AI tools that isolate Lennon's voice, according to Paul McCartney.

"We just finished it up, it'll be released this year," McCartney, Lennon's former bandmate, told the Today program on BBC Radio 4. It will be "the last Beatles record," said McCartney, who along with Ringo Starr is one of two surviving band members.

But if you're picturing McCartney sitting at a keyboard and telling ChatGPT, "sing a John Lennon verse," that's not what happened. Instead, they used source material from a demo recording that Lennon made before his death in 1980.

"We were able to take John's voice and get it pure through this AI, so that then we could mix the record as you would normally do. So, it gives you some sort of leeway."

McCartney says he realized technology could offer a new chance to work on the music after seeing Peter Jackson, the famously technically astute filmmaker, resurrect archival materials for Get Back, his documentary about the band making the Let It Be album.

"He was able to extricate John's voice from a ropey little bit of cassette which had John's voice and a piano," McCartney said of the director.

"He could separate them with AI. They could, they'd tell the machine, 'That's a voice. This is a guitar. Lose the guitar.' And he did that."

gettyimages-3297187-08a9733abea129c001471f6a41cda216443b4a0e-s800-c85.webp

"We were able to take John's voice and get it pure through this AI," Paul McCartney says. The Beatles are seen here celebrating after finishing their album, Sgt. Pepper's Lonely Hearts Club Band.
John Pratt/Keystone / Getty Images


McCartney didn't give details about what he says is The Beatles' final record, poised to emerge decades after Lennon was shot and killed in December 1980.

But author Keith Badman has reported that in 1994, Lennon's widow, Yoko Ono, gave McCartney several of the late singer and songwriter's home demo recordings.

The tape included Lennon's love song "Now And Then." As the BBC's Mark Savage notes, previous attempts to finish the song were abandoned due to the poor audio quality of Lennon's voice on the recording.

In the interview, McCartney also said he's concerned with how AI might be used going forward, given its ability to perform trickery like replacing one singer's vocals with another person.

"All of that is kind of scary," McCartney said, "but exciting — because it's the future."
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,832
Reputation
7,926
Daps
148,769
NZPOU1x.png


The Curse of Recursion: Training on Generated Data Makes Models Forget​

Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Yarin Gal, Nicolas Papernot, Ross Anderson
Stable Diffusion revolutionised image creation from descriptive text. GPT-2, GPT-3(.5) and GPT-4 demonstrated astonishing performance across a variety of language tasks. ChatGPT introduced such language models to the general public. It is now clear that large language models (LLMs) are here to stay, and will bring about drastic change in the whole ecosystem of online text and images. In this paper we consider what the future might hold. What will happen to GPT-{n} once LLMs contribute much of the language found online? We find that use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear. We refer to this effect as Model Collapse and show that it can occur in Variational Autoencoders, Gaussian Mixture Models and LLMs. We build theoretical intuition behind the phenomenon and portray its ubiquity amongst all learned generative models. We demonstrate that it has to be taken seriously if we are to sustain the benefits of training from large-scale data scraped from the web. Indeed, the value of data collected about genuine human interactions with systems will be increasingly valuable in the presence of content generated by LLMs in data crawled from the Internet.
Subjects:Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)


 

Orbital-Fetus

cross that bridge
Supporter
Joined
May 5, 2012
Messages
40,288
Reputation
17,645
Daps
146,066
Reppin
Humanity
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,832
Reputation
7,926
Daps
148,769

SqueezeLLM: Dense-and-Sparse Quantization​

Sehoon Kim, Coleman Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, Kurt Keutzer
Generative Large Language Models (LLMs) have demonstrated remarkable results for a wide range of tasks. However, deploying these models for inference has been a significant challenge due to their unprecedented resource requirements. This has forced existing deployment frameworks to use multi-GPU inference pipelines, which are often complex and costly, or to use smaller and less performant models. In this work, we demonstrate that the main bottleneck for generative inference with LLMs is memory bandwidth, rather than compute, specifically for single batch inference. While quantization has emerged as a promising solution by representing model weights with reduced precision, previous efforts have often resulted in notable performance degradation. To address this, we introduce SqueezeLLM, a post-training quantization framework that not only enables lossless compression to ultra-low precisions of up to 3-bit, but also achieves higher quantization performance under the same memory constraint. Our framework incorporates two novel ideas: (i) sensitivity-based non-uniform quantization, which searches for the optimal bit precision assignment based on second-order information; and (ii) the Dense-and-Sparse decomposition that stores outliers and sensitive weight values in an efficient sparse format. When applied to the LLaMA models, our 3-bit quantization significantly reduces the perplexity gap from the FP16 baseline by up to 2.1x as compared to the state-of-the-art methods with the same memory requirement. Furthermore, when deployed on an A6000 GPU, our quantized models achieve up to 2.3x speedup compared to the baseline. Our code is open-sourced and available online.

About​

SqueezeLLM: Dense-and-Sparse Quantization




Google bard summary:​

SqueezeLLM is a new way to make language models smaller and faster. It does this by using a technique called quantization. Quantization is like taking a picture of a big object and then making it smaller. When you do this, you lose some of the detail, but the object is still recognizable.

SqueezeLLM is able to quantize language models without losing too much detail. This means that they can be used on smaller devices without sacrificing performance.

SqueezeLLM works by first identifying the most important weights in the language model. These weights are then stored in full precision, while the less important weights are stored in lower precision. This allows SqueezeLLM to reduce the size of the language model without sacrificing accuracy.

SqueezeLLM also uses a technique called sparsity to further reduce the size of the language model. Sparsity is a technique that identifies and removes redundant weights from the language model. This can further reduce the size of the language model without sacrificing accuracy.

SqueezeLLM has been shown to be effective in reducing the size and improving the performance of language models. It has been shown to reduce the size of language models by up to 90% while only reducing accuracy by a small margin.

SqueezeLLM is still under development, but it has the potential to make language models more accessible to a wider range of people.
 
Last edited:
Top