bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,739
@bnew What do you think of George Hotz' claim that GPT-4 is a mixture of experts with 8x 220B models? That would explain the all the recent Microsoft research papers on optimizing for smaller models.

who knows :yeshrug: if they did do that I wonder what expertise they chose for their models.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,739

Extending Context Window of Large Language Models via Positional Interpolation​

Shouyuan Chen, Sherman Wong, Liangjian Chen, Yuandong Tian
We present Position Interpolation (PI) that extends the context window sizes of RoPE-based pretrained LLMs such as LLaMA models to up to 32768 with minimal fine-tuning (within 1000 steps), while demonstrating strong empirical results on various tasks that require long context, including passkey retrieval, language modeling, and long document summarization from LLaMA 7B to 65B. Meanwhile, the extended model by Position Interpolation preserve quality relatively well on tasks within its original context window. To achieve this goal, Position Interpolation linearly down-scales the input position indices to match the original context window size, rather than extrapolating beyond the trained context length which may lead to catastrophically high attention scores that completely ruin the self-attention mechanism. Our theoretical study shows that the upper bound of interpolation is at least ∼600× smaller than that of extrapolation, further demonstrating its stability. Models extended via Position Interpolation retain its original architecture and can reuse most pre-existing optimization and infrastructure.




snippet:

I should clarify that while it only takes 2 lines to implement, it took me over a month of poring over papers and articles to figure out these two lines are all we need to change. I am still testing other modifications which could improve the effect more, but with this, I was able to train LLaMa 13B, which is pretrained on 2048 tokens, on SuperHOT (which is only 1200 samples) and push it to at least 6K tokens. If you naively train with longer sequence, it just won’t work unless you feed it an astronomical amount of data, but with only ~400 samples >4096, the model is able to push beyond 6K. The scale field here should be treated as a hyperparameter: you need to perform inference with this change with the same scale used during training. I did not test beyond 1/4 yet, as there is certainly a limit to this effect and I was very happy with pushing it to 6K. I will perform more testing in the meantime and follow-up.


 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,739

4yv2sVG.png

Understanding Social Reasoning in Language Models with Language Models​

Kanishk Gandhi, Jan-Philipp Fränken, Tobias Gerstenberg, Noah D. Goodman
As Large Language Models (LLMs) become increasingly integrated into our everyday lives, understanding their ability to comprehend human mental states becomes critical for ensuring effective interactions. However, despite the recent attempts to assess the Theory-of-Mind (ToM) reasoning capabilities of LLMs, the degree to which these models can align with human ToM remains a nuanced topic of exploration. This is primarily due to two distinct challenges: (1) the presence of inconsistent results from previous evaluations, and (2) concerns surrounding the validity of existing evaluation methodologies. To address these challenges, we present a novel framework for procedurally generating evaluations with LLMs by populating causal templates. Using our framework, we create a new social reasoning benchmark (BigToM) for LLMs which consists of 25 controls and 5,000 model-written evaluations. We find that human participants rate the quality of our benchmark higher than previous crowd-sourced evaluations and comparable to expert-written evaluations. Using BigToM, we evaluate the social reasoning capabilities of a variety of LLMs and compare model performances with human performance. Our results suggest that GPT4 has ToM capabilities that mirror human inference patterns, though less reliable, while other LLMs struggle.
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,739

Can LLMs generate mathematical proofs that can be rigorously checked?

We release LeanDojo (LeanDojo: Theorem Proving with Retrieval-Augmented Language Models): an open-source playground consisting of toolkits, benchmarks, and models for LLMs to prove formal theorems in the Lean proof assistant.

Key features:
- Tools for data extraction and interacting with Lean.
- Fine-grained annotations of premises (e.g., existing lemmas) in proofs: where they are used and defined.
- LeanDojo Benchmark: 97K human-written theorems/proofs for developing machine learning models on theorem proving.
- ReProver (Retrieval-Augmented Prover): the first LLM-based prover augmented with retrieval for premise selection.

We open-source everything, providing the first set of open-source LLM-based theorem provers without any proprietary data, model, or code.

Machine learning, especially large language models (LLMs), has shown promise in proving formal theorems using proof assistants such as Lean. Formal proofs are computer programs whose correctness can be verified. Therefore, theorem proving is a form of code generation with rigorous evaluation and no room for the model to hallucinate, opening up a new avenue for addressing LLMs’ flaws in factuality and hallucination.





LeanDojo: Theorem Proving with Retrieval-Augmented Language Models​

Kaiyu Yang 1, Aidan Swope2, Alex Gu3, Rahul Chalamala1, Peiyang Song4, Shixing Yu5,
Saad Godil2, Ryan Prenger2, Anima Anandkumar1 2
1Caltech; 2NVIDIA; 3MIT; 4UC Santa Barbara; 5UT Austin
Correspondence to: kaiyuy@caltech.edu
arXiv Code Docs Models Dataset (Lean 3) Dataset (Lean 4)
LeanDojo.jpg

Top right: LeanDojo extracts proofs in Lean into datasets for training machine learning models. It also enables the trained model to prove theorems by interacting with Lean's proof environment.
Top left: The proof tree of a Lean theorem ∀n∈N, gcd n n = n, where gcd is the greatest common divisor. When proving the theorem, we start from the original theorem as the initial state (the root) and repeatedly apply tactics (the edges) to decompose states into simpler sub-states, until all states are solved (the leaf nodes). Tactics may rely on premises such as mod_self and gcd_zero_left defined in a large math library. E.g., mod_self is an existing theorem ∀n∈N, n % n = 0 used in the proof to simplify the goal.
Bottom: Our ReProver model. Given a state, it retrieves premises from the math library, which are concatenated with the state and fed into an encoder-decoder Transformer to generate the next tactic.

Abstract​

Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean. However, existing methods are difficult to reproduce or build on, due to private code, data, and large compute requirements. This has created substantial barriers to research on machine learning methods for theorem proving. This paper removes these barriers by introducing LeanDojo: an open-source Lean playground consisting of toolkits, data, models, and benchmarks. LeanDojo extracts data from Lean and enables interaction with the proof environment programmatically. It contains fine-grained annotations of premises in proofs, providing valuable data for premise selection— a key bottleneck in theorem proving. Using this data, we develop ReProver (Retrieval-Augmented Prover): the first LLM-based prover that is augmented with retrieval for selecting premises from a vast math library. It is inexpensive and needs only one GPU week of training. Our retriever leverages LeanDojo's program analysis capability to identify accessible premises and hard negative examples, which makes retrieval much more effective. Furthermore, we construct a new benchmark consisting of 96,962 theorems and proofs extracted from Lean's math library. It features challenging data split requiring the prover to generalize to theorems relying on novel premises that are never used in training. We use this benchmark for training and evaluation, and experimental results demonstrate the effectiveness of ReProver over non-retrieval baselines and GPT-4. We thus provide the first set of open-source LLM-based theorem provers without any proprietary datasets and release it under a permissive MIT license to facilitate further research.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,739

Meta’s new AI lets people make chatbots. They’re using it for sex.​

From X-rated chats to cancer research, “open-source” models are challenging tech giants’ control over the AI revolution — to the promise or peril of society​

By Pranshu Verma
and
Will Oremus
June 26, 2023 at 6:00 a.m. EDT



Allie is an 18-year old with long brown hair who boasts “tons of sexual experience.” Because she “lives for attention,” she’ll “share details of her escapades” with anyone for free.
But Allie is fake, an artificial intelligence chatbot created for sexual play — which sometimes carries out graphic rape and abuse fantasies.


Tech is not your friend. We are. Sign up for The Tech Friend newsletter.

While firms like OpenAI, Microsoft and Google rigorously train their AI models to avoid a host of taboos, including overly intimate conversations, Allie was built using open-source technology — code that’s freely available to the public and has no such restrictions. Based on a model created by Meta, called LLaMA, Allie is part of a rising tide of specialized AI products anyone can build, from writing tools to chatbots to data analysis applications.

Advocates see open-source AI as a way around corporate control, a boon to entrepreneurs, academics, artists and activists who can experiment freely with transformative technology.

“The overall argument for open-source is that it accelerates innovation in AI,” said Robert Nishihara, CEO and co-founder of the start-up Anyscale, which helps companies run open-source AI models.
A curious person’s guide to artificial intelligence
Anyscale’s clients use AI models to discover new pharmaceuticals, reduce the use of pesticides in farming, and identify fraudulent goods sold online, he said. Those applications would be pricier and more difficult, if not impossible, if they relied on the handful of products offered by the largest AI firms.

Yet that same freedom could also be exploited by bad actors. Open-source models have been used to create artificial child pornography using images of real children as source material. Critics worry it could also enable fraud, cyber hacking and sophisticated propaganda campaigns.
Earlier this month, a pair of U.S. senators, Richard Blumenthal (D-Conn.) and Josh Hawley (R-Mo.) sent a letter to Meta CEO Mark Zuckerberg warning that the release of LLaMA might lead to “its misuse in spam, fraud, malware, privacy violations, harassment, and other wrongdoing and harms.” They asked what steps Meta was taking to prevent such abuse.

Allie’s creator, who spoke on the condition of anonymity for fear of harming his professional reputation, said commercial chatbots such as Replika and ChatGPT are “heavily censored” and can’t offer the type of sexual conversations he desires. With open-source alternatives, many based on Meta’s LLaMA model, the man said he can build his own, uninhibited conversation partners.

“It’s rare to have the opportunity to experiment with ‘state of the art’ in any field,” he said in an interview.
Allie’s creator argued that open-source technology benefits society by allowing people to build products that cater to their preferences without corporate guardrails.
“I think it’s good to have a safe outlet to explore,” he said. “Can’t really think of anything safer than a text-based role-play against a computer, with no humans actually involved.”

On YouTube, influencers offer tutorials on how to build “uncensored” chatbots. Some are based on a modified version of LLaMA, called Alpaca AI, which Stanford University researchers released in March, only to remove it a week later over concerns of cost and “the inadequacies of our content filters.”

Nisha Deo, a spokeswoman for Meta, said the particular model referenced in the YouTube videos, called GPT-4 x Alpaca, “was obtained and made public outside of our approval process.” Representatives from Stanford did not return a request for comment.
AI-generated child sex images spawn new nightmare for the web
Open-source AI models, and the creative applications that build on them, are often published on Hugging Face, a platform for sharing and discussing AI and data science projects.
During a Thursday House science committee hearing, Clem Delangue, Hugging Face’s CEO, urged Congress to consider legislation supporting and incentivizing open-source models, which he argued are “extremely aligned with American values.”


In an interview after the hearing, Delangue acknowledged that open-source tools can be abused. He noted a model intentionally trained on toxic content, GPT-*****, that Hugging Face had removed. But he said he believes open-source approaches allow for both greater innovation and more transparency and inclusivity than corporate-controlled models.
“I would argue that actually most of the harm today is done by black boxes,” Delangue said, referring to AI systems whose inner workings are opaque, “rather than open-source.”
Hugging Face’s rules don’t prohibit AI projects that produce sexually explicit outputs. But they do prohibit sexual content that involves minors, or that is “used or created for harassment, bullying, or without explicit consent of the people represented.” Earlier this month, the New York-based company published an update to its content policies, emphasizing “consent” as a “core value” guiding how people can use the platform.


As Google and OpenAI have grown more secretive about their most powerful AI models, Meta has emerged as a surprising corporate champion of open-source AI. In February it released LLaMA, a language model that’s less powerful than GPT-4, but more customizable and cheaper to run. Meta initially withheld key parts of the model’s code and planned to limit access to authorized researchers. But by early March those parts, known as the model’s “weights,” had leaked onto public forums, making LLaMA freely accessible to all.
“Open source is a positive force to advance technology,” Meta’s Deo said. “That’s why we shared LLaMA with members of the research community to help us evaluate, make improvements and iterate together.”
Since then, LLaMA has become perhaps the most popular open-source model for technologists looking to develop their own AI applications, Nishihara said. But it’s not the only one. In April, the software firm Databricks released an open-source model called Dolly 2.0. And last month, a team based in Abu Dhabi released an open-source model called Falcon that rivals LLaMA in performance.


Marzyeh Ghassemi, an assistant professor of computer science at MIT, said she’s an advocate for open-source language models, but with limits.
Ghassemi said it’s important to make the architecture behind powerful chatbots public, because that allows people to scrutinize how they’re built. For example, if a medical chatbot was created on open-source technology, she said, researchers could see if the data it’s trained on incorporated sensitive patient information, something that would not be possible on chatbots using closed software.
But she acknowledges this openness comes with risk. If people can easily modify language models, they can quickly create chatbots and image makers that churn out disinformation, hate speech and inappropriate material of high quality.

Ghassemi said there should be regulations governing who can modify these products, such as a certifying or credentialing process.

“Like we license people to be able to use a car,” she said, “we need to think about similar framings [for people] … to actually create, improve, audit, edit these open-trained language models.”
Some leaders at companies like Google, which keeps its chatbot Bard under lock and key, see open-source software as an existential threat to their business, because the large language models that are available to the public are becoming nearly as proficient as theirs.
“We aren’t positioned to win this [AI] arms race and neither is OpenAI,” a Google engineer wrote in a memo posted by the tech site Semianalysis in May. “I’m talking, of course, about open source. Plainly put, they are lapping us … While our models still hold a slight edge in terms of quality, the gap is closing astonishingly quickly.”
The debate over whether AI will destroy us is dividing Silicon Valley
Nathan Benaich, a general partner at Air Street Capital, a London-based venture investing firm focused on AI, noted that many of the tech industry’s greatest advances over the decades have been made possible by open-source technologies — including today’s AI language models.

“If there’s only a few companies” building the most powerful AI models, “they’re only going to be targeting the biggest-use cases,” Benaich said, adding that the diversity of inquiry is an overall boon for society.
Gary Marcus, a cognitive scientist who testified to Congress on AI regulation in May, countered that accelerating AI innovation might not be a good thing, considering the risks the technology could pose to society.
“We don’t open-source nuclear weapons,” Marcus said. “Current AI is still pretty limited, but things might change.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,739

FzywLY1WcAIEehB

FzywLnTWwAIj4p-

FzywLxLXwAIJH97


Fzri1OVXgAEZCL7

Kosmos-2: Grounding Multimodal Large Language Models to the World​

Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei
We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e.g., bounding boxes) and grounding text to the visual world. Specifically, we represent refer expressions as links in Markdown, i.e., ``[text span](bounding boxes)'', where object descriptions are sequences of location tokens. Together with multimodal corpora, we construct large-scale data of grounded image-text pairs (called GrIT) to train the model. In addition to the existing capabilities of MLLMs (e.g., perceiving general modalities, following instructions, and performing in-context learning), Kosmos-2 integrates the grounding capability into downstream applications. We evaluate Kosmos-2 on a wide range of tasks, including (i) multimodal grounding, such as referring expression comprehension, and phrase grounding, (ii) multimodal referring, such as referring expression generation, (iii) perception-language tasks, and (iv) language understanding and generation. This work lays out the foundation for the development of Embodiment AI and sheds light on the big convergence of language, multimodal perception, action, and world modeling, which is a key step toward artificial general intelligence. Data, demo, and pretrained models are available at this https URL.






Kosmos-2: Grounding Multimodal Large Language Models to the World​

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,739




vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention​

By Woosuk Kwon*, Zhuohan Li*, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Yu, Joey Gonzalez, Hao Zhang, and Ion Stoica (* Equal Contribution). June 20th, 2023

GitHub | Documentation | Paper (Stay Tuned)


LLMs promise to fundamentally change how we use AI across all industries. However, actually serving these models is challenging and can be surprisingly slow even on expensive hardware. Today we are excited to introduce vLLM, an open-source library for fast LLM inference and serving. vLLM utilizes PagedAttention, our new attention algorithm that effectively manages attention keys and values. vLLM equipped with PagedAttention redefines the new state of the art in LLM serving: it delivers up to 24x higher throughput than HuggingFace Transformers, without requiring any model architecture changes.

vLLM has been developed at UC Berkeley and deployed at Chatbot Arena and Vicuna Demo for the past two months. It is the core technology that makes LLM serving affordable even for a small research team like LMSYS with limited compute resources. Try out vLLM now with a single command at our GitHub repository.

Beyond State-of-the-art Performance​

We compare the throughput of vLLM with HuggingFace Transformers (HF), the most popular LLM library and HuggingFace Text Generation Inference (TGI), the previous state of the art. We evaluate in two settings: LLaMA-7B on an NVIDIA A10G GPU and LLaMA-13B on an NVIDIA A100 GPU (40GB). We sample the requests’ input/output lengths from the ShareGPT dataset. In our experiments, vLLM achieves up to 24x higher throughput compared to HF and up to 3.5x higher throughput than TGI.

perf_a100_n1_light.png
perf_a10g_n1_light.png

Serving throughput when each request asks for one output completion. vLLM achieves 14x - 24x higher throughput than HF and 2.2x - 2.5x higher throughput than TGI.

perf_a100_n3_light.png
perf_a10g_n3_light.png

Serving throughput when each request asks for three parallel output completions. vLLM achieves 8.5x - 15x higher throughput than HF and 3.3x - 3.5x higher throughput than TGI.​


About​

A high-throughput and memory-efficient inference and serving engine for LLMs

vllm.readthedocs.io
 
Top