bnew

Veteran
Joined
Nov 1, 2015
Messages
57,545
Reputation
8,519
Daps
160,280
Paper: [2312.07413] AI capabilities can be significantly improved without expensive retraining

Blog post: AI capabilities can be significantly improved without expensive retraining

Abstract:

State-of-the-art AI systems can be significantly improved without expensive retraining via "post-training enhancements"-techniques applied after initial training like fine-tuning the system to use a web browser. We review recent post-training enhancements, categorizing them into five types: tool-use, prompting methods, scaffolding, solution selection, and data generation. Different enhancements improve performance on different tasks, making it hard to compare their significance. So we translate improvements from different enhancements into a common currency, the compute-equivalent gain: how much additional training compute would be needed to improve performance by the same amount as the enhancement. Our non-experimental work shows that post-training enhancements have significant benefits: most surveyed enhancements improve benchmark performance by more than a 5x increase in training compute, some by more than 20x. Post-training enhancements are relatively cheap to develop: fine-tuning costs are typically <1% of the original training cost. Governing the development of capable post-training enhancements may be challenging because frontier models could be enhanced by a wide range of actors.

qeiRRK8.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,545
Reputation
8,519
Daps
160,280

Computer Science > Machine Learning​

[Submitted on 13 Dec 2023]

Distributed Inference and Fine-tuning of Large Language Models Over The Internet​

Alexander Borzunov, Max Ryabinin, Artem Chumachenko, Dmitry Baranchuk, Tim Dettmers, Younes Belkada, Pavel Samygin, Colin Raffel
Large language models (LLMs) are useful in many NLP tasks and become more capable with size, with the best open-source models having over 50 billion parameters. However, using these 50B+ models requires high-end hardware, making them inaccessible to most researchers. In this work, we investigate methods for cost-efficient inference and fine-tuning of LLMs, comparing local and distributed strategies. We observe that a large enough model (50B+) can run efficiently even on geodistributed devices in a consumer-grade network. This could allow running LLM efficiently by pooling together idle compute resources of multiple research groups and volunteers. We address two open problems: (1) how to perform inference and fine-tuning reliably if any device can disconnect abruptly and (2) how to partition LLMs between devices with uneven hardware, joining and leaving at will. In order to do that, we develop special fault-tolerant inference algorithms and load-balancing protocols that automatically assign devices to maximize the total system throughput. We showcase these algorithms in Petals - a decentralized system that runs Llama 2 (70B) and BLOOM (176B) over the Internet up to 10x faster than offloading for interactive generation. We evaluate the performance of our system in simulated conditions and a real-world setup spanning two continents.
Comments:Accepted to Conference on Neural Information Processing Systems (NeurIPS) 2023. 20 pages, 3 figures
Subjects:Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:arXiv:2312.08361 [cs.LG]
(or arXiv:2312.08361v1 [cs.LG] for this version)
[2312.08361] Distributed Inference and Fine-tuning of Large Language Models Over The Internet
Focus to learn more

Submission history​

From: Max Ryabinin [view email]
[v1] Wed, 13 Dec 2023 18:52:49 UTC (403 KB)


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,545
Reputation
8,519
Daps
160,280

How crowded are the oceans? New maps show what flew under the radar until now​


Advances in AI and satellite imagery allowed researchers to create the clearest picture yet of human activity at sea, revealing clandestine fishing activity and a boom in offshore energy development.

By Justine Calma, a senior science reporter covering climate change, clean energy, and environmental justice with more than a decade of experience. She is also the host of Hell or High Water: When Disaster Hits Home, a podcast from Vox Media and Audible Originals.

Jan 3, 2024, 11:00 AM EST|4 Comments / 4 New


image_1.gif

A new study uses deep learning and satellite imagery to create the first global map of vessel traffic and offshore infrastructure.
Image: Global Fishing Watch

Using satellite imagery and AI, researchers have mapped human activity at sea with more precision than ever before. The effort exposed a huge amount of industrial activity that previously flew under the radar, from suspicious fishing operations to an explosion of offshore energy development.

The maps were published today in the journal Nature. The research led by Google-backed nonprofit Global Fishing Watch revealed that a whopping three-quarters of the world’s industrial fishing vessels are not publicly tracked. Up to 30 percent of transport and energy vessels also escape public tracking.

Those blind spots could hamper global conservation efforts, the researchers say. To better protect the world’s oceans and fisheries, policymakers need a more accurate picture of where people are exploiting resources at sea.

“The question is which 30 percent should we protect?”

Nearly every nation on Earth has agreed to a joint goal of protecting 30 percent of Earth’s land and waters by 2030 under the Kunming-Montreal Global Biodiversity Framework adopted last year. “The question is which 30 percent should we protect? And you can’t have discussions about where the fishing activity, where the oil platforms are unless you have this map,” says David Kroodsma, one of the authors of the Nature paper and director of research and innovation at Global Fishing Watch.

Until now, Global Fishing Watch and other organizations relied primarily on the maritime Automatic Identification System (AIS) to see what was happening at sea. The system tracks vessels that carry a box that sends out radio signals, and the data has been used in the past to document overfishing and forced labor on vessels. Even so, there are major limitations with the system. Requirements to carry AIS vary by country and vessel type. And it’s pretty easy for someone to turn the box off when they want to avoid detection, or cruise through locations where signal strength is spotty.

To fill in the blanks, Kroodsma and his colleagues analyzed 2,000 terabytes of imagery from the European Space Agency’s Sentinel-1 satellite constellation. Instead of taking traditional optical imagery, which is like snapping photos with a camera, Sentinel-1 uses advanced radar instruments to observe the surface of the Earth. Radar can penetrate clouds and “see” in the dark — and it was able to spot offshore activity that AIS missed.


image_3.png

Data analysis reveals that about 75 percent of the world’s industrial fishing vessels are not publicly tracked, with much of that fishing taking place around Africa and South Asia.
Image: Global Fishing Watch

Since 2,000 terabytes is an enormous amount of data to crunch, the researchers developed three deep-learning models to classify each detected vessels, estimate their size, and sort out different kinds of offshore infrastructure. They monitored some 15 percent of the world’s oceans where 75 percent of industrial activity takes place, paying attention to both vessel movements and the development of stationary offshore structures like oil rigs and wind turbines between 2017 and 2021.

While fishing activity dipped at the onset of the covid-19 pandemic in 2020, they found dense vessel traffic in areas that “previously showed little to no vessel activity” in public tracking systems — particularly around South and Southeast Asia, and the northern and western coasts of Africa.

A boom in offshore energy development was also visible in the data. Wind turbines outnumbered oil structures by the end of 2020. Turbines made up 48 percent of all ocean infrastructure by the following year, while oil structures accounted for 38 percent.

Nearly all of the offshore wind development took place off the coasts of northern Europe and China. In the Northeast US, clean energy opponents have tried to falsely link whale deaths to upcoming offshore wind development even though evidence points to vessel strikes being the problem.

Oil structures have a lot more vessels swarming around them than wind turbines. Tank vessels are used at times to transport oil to shore as an alternative to pipelines. The number of oil structures grew 16 percent over the five years studied. And offshore oil development was linked to five times as much vessel traffic globally as wind turbines in 2021. “The actual amount of vessel traffic globally from wind turbines is tiny, compared to the rest of traffic,” Kroodsma says.


image_2.gif

Two thousand terabytes of satellite imagery were analyzed to detect offshore infrastructure in coastal waters across six continents where more than three-quarters of industrial activity is concentrated.
Image: Global Fishing Watch

When asked whether this type of study would have been possible without artificial intelligence, “The short answer is no, I don’t think so,” says Fernando Paolo, lead author of the study and machine learning engineer at Global Fishing Watch. “Deep learning excels at finding patterns in large amounts of data.”

New machine learning tools being developed as open-source software to process global satellite imagery “democratize access to data and tools and allow researchers, analysts and policymakers in low-income countries to leverage tracking technologies at low cost,” says another article published in Nature today that comments on Paolo and Kroodsma’s research. “Until now, no comprehensive, global map of these different types of maritime infrastructure had been available,” says the article written by Microsoft postdoctoral researcher Konstantin Klemmer and University of Colorado Boulder assistant professor Esther Rolf.

The technological advances come at a crucial time for documenting fast-moving changes in maritime activity, while nations try to stop climate change and protect biodiversity before its too late. “The reason this matters is because it’s getting more crowded [at sea] and it’s getting more used and suddenly you have to decide how we’re going to manage this giant global commons,” Kroodsma tells The Verge. “It can’t be the Wild West. And that’s the way it’s been historically.”


Who owns the boats looting the high seas?
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,545
Reputation
8,519
Daps
160,280

Images altered to trick machine vision can influence humans too​

Published

2 JANUARY 2024

Authors

Gamaleldin Elsayed and Michael Mozer

New research shows that even subtle changes to digital images, designed to confuse computer vision systems, can also affect human perception

Computers and humans see the world in different ways. Our biological systems and the artificial ones in machines may not always pay attention to the same visual signals. Neural networks trained to classify images can be completely misled by subtle perturbations to an image that a human wouldn’t even notice.

That AI systems can be tricked by such adversarial images may point to a fundamental difference between human and machine perception, but it drove us to explore whether humans, too, might—under controlled testing conditions—reveal sensitivity to the same perturbations. In a series of experiments published in Nature Communications, we found evidence that human judgments are indeed systematically influenced by adversarial perturbations.

Our discovery highlights a similarity between human and machine vision, but also demonstrates the need for further research to understand the influence adversarial images have on people, as well as AI systems.

What is an adversarial image?​

An adversarial image is one that has been subtly altered by a procedure that causes an AI model to confidently misclassify the image contents. This intentional deception is known as an adversarial attack. Attacks can be targeted to cause an AI model to classify a vase as a cat, for example, or they may be designed to make the model see anything except a vase.

Three square images. The first of a vase of flowers, the second static pixels and the final a vase of flowers that is labelled.

Left: An Artificial Neural Network (ANN) correctly classifies the image as a vase but when perturbed by a seemingly random pattern across the entire picture (middle), with the intensity magnified for illustrative purposes – the resulting image (right) is incorrectly, and confidently, misclassified as a cat.

And such attacks can be subtle. In a digital image, each individual pixel in an RGB image is on a 0-255 scale representing the intensity of individual pixels. An adversarial attack can be effective even if no pixel is modulated by more than 2 levels on that scale.

Adversarial attacks on physical objects in the real world can also succeed, such as causing a stop sign to be misidentified as a speed limit sign. Indeed, security concerns have led researchers to investigate ways to resist adversarial attacks and mitigate their risks.

How is human perception influenced by adversarial examples?​

Previous research has shown that people may be sensitive to large-magnitude image perturbations that provide clear shape cues. However, less is understood about the effect of more nuanced adversarial attacks. Do people dismiss the perturbations in an image as innocuous, random image noise, or can it influence human perception?

To find out, we performed controlled behavioral experiments.To start with, we took a series of original images and carried out two adversarial attacks on each, to produce many pairs of perturbed images. In the animated example below, the original image is classified as a “vase” by a model. The two images perturbed through adversarial attacks on the original image are then misclassified by the model, with high confidence, as the adversarial targets “cat” and “truck”, respectively.

Next, we showed human participants the pair of pictures and asked a targeted question: “Which image is more cat-like?” While neither image looks anything like a cat, they were obliged to make a choice and typically reported feeling that they were making an arbitrary choice. If brain activations are insensitive to subtle adversarial attacks, we would expect people to choose each picture 50% of the time on average. However, we found that the choice rate—which we refer to as the perceptual bias—was reliably above chance for a wide variety of perturbed picture pairs, even when no pixel was adjusted by more than 2 levels on that 0-255 scale.

A gif showing various images of a vase of flowers. A magnifying glass appears and reveals static.

From a participant’s perspective, it feels like they are being asked to distinguish between two virtually identical images. Yet the scientific literature is replete with evidence that people leverage weak perceptual signals in making choices, signals that are too weak for them to express confidence or awareness ). In our example, we may see a vase of flowers, but some activity in the brain informs us there’s a hint of cat about it.

A grid of images showing two identical photos of a breakfast omelette and identical photos of traffic lights beside a graph.

Left: Examples of pairs of adversarial images. The top pair of images are subtly perturbed, at a maximum magnitude of 2 pixel levels, to cause a neural network to misclassify them as a “truck” and “cat”, respectively. A human volunteer is asked “Which is more cat-like?” The lower pair of images are more obviously manipulated, at a maximum magnitude of 16 pixel levels, to be misclassified as “chair” and “sheep”. The question this time is “Which is more sheep-like?”

We carried out a series of experiments that ruled out potential artifactual explanations of the phenomenon for our Nature Communications paper. In each experiment, participants reliably selected the adversarial image corresponding to the targeted question more than half the time. While human vision is not as susceptible to adversarial perturbations as is machine vision (machines no longer identify the original image class, but people still see it clearly), our work shows that these perturbations can nevertheless bias humans towards the decisions made by machines.

The importance of AI safety and security research​

Our primary finding that human perception can be affected—albeit subtly—by adversarial images raises critical questions for AI safety and security research, but by using formal experiments to explore the similarities and differences in the behaviour of AI visual systems and human perception, we can leverage insights to build safer AI systems.

For example, our findings can inform future research seeking to improve the robustness of computer vision models by better aligning them with human visual representations. Measuring human susceptibility to adversarial perturbations could help judge that alignment for a variety of computer vision architectures.

Our work also demonstrates the need for further research into understanding the broader effects of technologies not only on machines, but also on humans. This in turn highlights the continuing importance of cognitive science and neuroscience to better understand AI systems and their potential impacts as we focus on building safer, more secure systems.[/SIZE]

 
Last edited:

Stir Fry

Dipped in Sauce
Supporter
Joined
Mar 1, 2015
Messages
31,250
Reputation
27,870
Daps
136,188
From The Atlantic

Why AI Doesn’t Get Slang​

And why that’s a good thing
By Caleb Madison


Slang is born in the margins. In its early form, the word itself, slang, referred to a narrow strip of land between larger properties. During England’s transition from the rigid castes of feudalism to the competitive free market of capitalism, across the 14th to 17th centuries, the privatization of open farmland displaced countless people without inherited connection to the landed elite. This shift pushed people into small corridors between the recently bounded properties.

Confined to the literal fringes of society, they needed to get creative to survive. Some became performers and hucksters, craftspeople and con artists, drifters and thieves. They lived in makeshift homes, often roaming in groups along their slim municipal strip. This was the slang: the land on the outskirts of early English ownership and, by association, its counterculture. The slang had its own rules, its own politics, its own dialect. Roving bands needed a way to speak surreptitiously in the presence of law enforcement, a rival group, or a mark. So over time they developed a secret, colorful, and ephemeral cant.

Across languages and throughout time, the term slang has evolved to mean a subversive lexicon, purposefully unintelligible to whoever’s in charge, perpetually shape-shifting against the mainstream. Organically encrypted through shared experience, slang is difficult for anyone outside the given speaking community to reproduce.

That doesn’t mean people won’t try. Coveting its vitality and upset by their exclusion, modern-day lords and ladies catch wind of a phrase—perhaps by commanding a commoner to explain it to them—and start using it in the castle, ruining it for everyone. Tostitos posts “slay.” Shawn Mendes says “It’s giving …” The essential, defiant purpose of the vocabulary is undermined; at this point, the term stops being slang. But what happens when machines attempt such an appropriation? Large language models—also known as LLMs—like ChatGPT train on an expanding supply of practice text to be able to converse in real time, mimicking speech as closely as possible. Slang’s magnetic repulsion to mainstream appropriation, though, makes it a particular challenge for computers. And the failure of these algorithms to speak in vernacular illuminates the essential differences between human and nonhuman intelligence.

Read: Learn a foreign language before it’s too late

Through brute processing power, AI can now, for the most part, functionally speak English—and most other languages. But none of them is its native tongue. The natural language of the computer is a more basic alphabet with only two characters: 1 and 0. Yes and no. Billions of these little electronic decision points branch into a fractal tree of countless possibilities, forming a method of communication in its simplest form: binary code.

Language models, in the most basic sense, represent our 26-letter alphabet in strings of numbers. Those digits might efficiently condense large amounts of information. But that efficiency comes at the price of subtlety, richness, and detail—the ability to reflect the complexities of human experience, and to resist the prescriptions of formal society. Artificial intelligence, in contrast, is disconnected from the kind of social context that makes slang legible. And the sterile nature of code is exactly what slang—a language that lives in the thin threshold between integers—was designed to elude.

Even ChatGPT agrees. “Can we talk in slang?” I prompted it recently.

“Sure thing! We can chat in slang if that’s what you’re into. Just let me know what kind of slang you want to use.”

I responded that I wanted to use “modern slang” and confessed my suspicion that LLMs might have difficulty dealing with vernacular.

Thus spake the algorithm: “Slang can be hella tricky for LLMs like me, but I'm here to vibe and learn with you … We can stay low-key or go all out—it’s your call! 💯😎” The words and their meanings were all technically correct—but something was definitely off. The usage didn’t ring true to any consistent place or time. The result was an awkward monstrosity of tone and rhythm that could make the corniest dad cringe.

No matter how much I iterated, ChatGPT couldn’t seem to reach slang fluency. But to be honest, neither could I. In my own messages to the LLM, I found myself fumbling to speak Gen Z, botching terms such as hits different and bet. Trying to keep up a conversation in relatively new slang at the ripe age of 30, I felt like just as much of a fraud as my synthetic interlocutor, clumsily appropriating a language I could only imitate, never access. Like verbal quicksilver, slang cannot be co-opted or calculated. I hope it continues to evade the machines—and evolve beyond my own grasp—as long as we’re both around.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,545
Reputation
8,519
Daps
160,280

AI-powered search engine Perplexity AI, now valued at $520M, raises $70M​

Kyle Wiggers @kyle_l_wiggers / 6:30 AM EST•January 4, 2024


Hand holding a magnifying glass against the sky to represent search engine default choices.

Image Credits: Panuwat Dangsungnoen / EyeEm (opens in a new window)/ Getty Images

As search engine incumbents — namely Google — amp up their platforms with gen AI tech, startups are looking to reinvent AI-powered search from the ground up. It might seem like a Sisyphean task, going up against competitors with billions upon billions of users. But this new breed of search upstarts believes it can carve out a niche, however small, by delivering a superior experience.

One among the cohort, Perplexity AI, this morning announced that it raised $70 million in a funding round led by IVP with additional investments from NEA, Databricks Ventures, former Twitter VP Elad Gil, Shopify CEO Tobi Lutke, ex-GitHub CEO Nat Friedman and Vercel founder Guillermo Rauch. Other participants in the round included Nvidia and — notably — Jeff Bezos.

Sources familiar with the matter tell TechCrunch that the round values Perplexity at $520 million post-money. That’s chump change in the realm of gen AI startups. But, considering that Perplexity’s only been around since August 2022, it’s a nonetheless impressive climb.

Perplexity was founded by Aravind Srinivas, Denis Yarats, Johnny Ho and Andy Konwinski — engineers with backgrounds in AI, distributed systems, search engines and databases. Srinivas, Perplexity’s CEO, previously worked at OpenAI, where he researched language and gen AI models along the lines of Stable Diffusion and DALL-E 3.

Unlike traditional search engines, Perplexity offers a chatbot-like interface that allows users to ask questions in natural language (e.g. “Do we burn calories while sleeping?,” “What’s the least visited country?,” and so on). The platform’s AI responds with a summary containing source citations (mostly websites and articles), at which point users can ask follow-up questions to dive deeper into a particular subject.

Perplexity AI

Performing a search with Perplexity.

“With Perplexity, users can get instant … answers to any question with full sources and citations included,” Srinivas said. “Perplexity is for anyone and everyone who uses technology to search for information.”

Underpinning the Perplexity platform is an array of gen AI models developed in-house and by third parties. Subscribers to Perplexity’s Pro plan ($20 per month) can switch models — Google’s Gemini, Mistra 7Bl, Anthropic’s Claude 2.1 and OpenAI’s GPT-4 are in the rotation presently — and unlock features like image generation; unlimited use of Perplexity’s Copilot, which considers personal preferences during searches; and file uploads, which allows users to upload documents including images and have models analyze the docs to formulate answers about them (e.g. “Summarize pages 2 and 4”).

If the experience sounds comparable to Google’s Bard, Microsoft’s Copilot and ChatGPT, you’re not wrong. Even Perplexity’s chat-forward UI is reminiscent of today’s most popular gen AI tools.

Beyond the obvious competitors, the search engine startup You.com offers similar AI-powered summarizing and source-citing tools, powered optionally by GPT-4.

Srinivas makes the case that Perplexity offers more robust search filtering and discovery options than most, for example letting users limit searches to academic papers or browse trending search topics submitted by other users on the platform. I’m not convinced that they’re so differentiated that they couldn’t be replicated — or haven’t already been replicated for that matter. But Perspective has ambitions beyond search. It’s beginning to serve its own gen AI models, which leverage Perplexity’s search index and the public web for ostensibly improved performance, through an API available to Pro customers.

This reporter is skeptical about the longevity of gen AI search tools for a number of reasons, not least of which AI models are costly to run. At one point, OpenAI was spending approximately $700,000 per day to keep up with the demand for ChatGPT. Microsoft is reportedly losing an average of $20 per user per month on its AI code generator, meanwhile.

Sources familiar with the matter tell TechCrunch Perplexity’s annual recurring revenue is between $5 million and $10 million at the moment. That seems fairly healthy… until you factor in the millions of dollars it often costs to train gen AI models like Perplexity’s own.

Perplexity AI

Image Credits: Perplexity AI

Concerns around misuse and misinformation inevitably crop up around gen AI search tools like Perplexity, as well — as they well should. AI isn’t the best summarizer after all, sometimes missing key details, misconstruing and exaggerating language or otherwise inventing facts very authoritatively. And it’s prone to spewing bias and toxicity — as Perplexity’s own models recently demonstrated.

Yet another potential speed bump on Perplexity’s road to success is copyright. Gen AI models “learn” from examples to craft essays, code, emails, articles and more, and many vendors — including Perplexity, presumably — scrape the web for millions to billions of these examples to add to their training data sets. Vendors argue fair use doctrine provides a blanket protection for their web-scraping practices, but artists, authors and other copyright holders disagree — and have filed lawsuits seeking compensation.

As a tangentially related aside, while an increasing number of gen AI vendors offer policies protecting customers from IP claims against them, Perplexity does not. According to the company’s terms of service, customers agree to “hold harmless” Perplexity from claims, damages and liabilities arising from the use of its services — meaning Perplexity’s off the hook where it concerns legal fees.

Some plaintiffs, like The New York Times, have argued gen AI search experiences siphon off publishers’ content, readers and ad revenue through anticompetitive means. “Anticompetitive” or no, the tech is certainly impacting traffic. A model from The Atlantic found that if a search engine like Google were to integrate AI into search, it’d answer a user’s query 75% of the time without requiring a click-through to its website. (Some vendors, such as OpenAI, have inked deals with certain news publishers, but most — including Perplexity — haven’t.

Srinivas pitches this as a feature — not a bug.

“[With Perplexity, there’s] no need to click on different links, compare answers, or endlessly dig for information,” he said. “The era of sifting through SEO spam, sponsored links and multiple sources will be replaced by a more efficient model of knowledge acquisition and sharing, propelling society into a new era of accelerated learning and research.”

The many uncertainties around Perplexity’s business model — and gen AI and consumer search at large — don’t appear to be deterring its investors. To date, the startup, which claims to have ten million active monthly users, has raised over $100 million — much is which is being put toward expanding its 39-person team and building new product functionality, Srinivas says.

“Perplexity is intensely building a product capable of bringing the power of AI to billions,” Cack Wilhelm, a general partner at IVP, added via email. “Aravind possesses the unique ability to uphold a grand, long-term vision while shipping product relentlessly, requirements to tackle a problem as important and fundamental as search.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,545
Reputation
8,519
Daps
160,280


UyHjVIc.png




About​

Convert Compute And Books Into Instruct-Tuning Datasets

Augmentoolkit​

Generate multi-turn training data, about any subject, using Open Source LLMs! Save yourself the time of manually editing 1000s of AI chats to build your own dataset (which you then can't open source anyway because of personal reputation risks). Easily configure the prompts and settings to generate conversations aligned to your tastes and interests. Avoids breaking the bank (and getting your API key revoked) because it doesn't use the OpenAI API.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,545
Reputation
8,519
Daps
160,280

VZKY0em.png












Computer Science > Computation and Language​

[Submitted on 2 Jan 2024]

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning​

Hongye Jin, Xiaotian Han, Jingfeng Yang, Zhimeng Jiang, Zirui Liu, Chia-Yuan Chang, Huiyuan Chen, Xia Hu
This work elicits LLMs' inherent ability to handle long contexts without fine-tuning. The limited length of the training sequence during training may limit the application of Large Language Models (LLMs) on long input sequences for inference. In this work, we argue that existing LLMs themselves have inherent capabilities for handling long contexts. Based on this argument, we suggest extending LLMs' context window by themselves to fully utilize the inherent ability.We propose Self-Extend to stimulate LLMs' long context handling potential. The basic idea is to construct bi-level attention information: the group level and the neighbor level. The two levels are computed by the original model's self-attention, which means the proposed does not require any training. With only four lines of code modification, the proposed method can effortlessly extend existing LLMs' context window without any fine-tuning. We conduct comprehensive experiments and the results show that the proposed method can effectively extend existing LLMs' context window's length.
Subjects:Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:arXiv:2401.01325 [cs.CL]
(or arXiv:2401.01325v1 [cs.CL] for this version)
[2401.01325] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Focus to learn more

Submission history​

From: Hongye Jin [view email]
[v1] Tue, 2 Jan 2024 18:30:51 UTC (349 KB)


 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,545
Reputation
8,519
Daps
160,280




Computer Science > Machine Learning​

[Submitted on 2 Jan 2024]

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models​

Zixiang Chen, Yihe Deng, Huizhuo Yuan, Kaixuan Ji, Quanquan Gu
Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the prospect of growing a strong LLM out of a weak one without the need for acquiring additional human-annotated data. We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN), which starts from a supervised fine-tuned model. At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself. More specifically, the LLM generates its own training data from its previous iterations, refining its policy by discerning these self-generated responses from those obtained from human-annotated data. Our method progressively elevates the LLM from a nascent model to a formidable one, unlocking the full potential of human-annotated demonstration data for SFT. Theoretically, we prove that the global optimum to the training objective function of our method is achieved only when the LLM policy aligns with the target data distribution. Empirically, we evaluate our method on several benchmark datasets including the HuggingFace Open LLM Leaderboard, MT-Bench, and datasets from Big-Bench. Our results show that SPIN can significantly improve the LLM's performance across a variety of benchmarks and even outperform models trained through direct preference optimization (DPO) supplemented with extra GPT-4 preference data. This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
Comments:28 pages, 6 figures, 6 tables
Subjects:Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
Cite as:arXiv:2401.01335 [cs.LG]
(or arXiv:2401.01335v1 [cs.LG] for this version)
[2401.01335] Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Focus to learn more

Submission history​

From: Quanquan Gu [view email]
[v1] Tue, 2 Jan 2024 18:53:13 UTC (833 KB)


 
Top