bnew

Veteran
Joined
Nov 1, 2015
Messages
58,834
Reputation
8,672
Daps
163,048


AI Prompt Engineering Is Dead

Long live AI prompt engineering​


DINA GENKINA

06 MAR 2024

6 MIN READ


man in blue shirt and briefcase walking away from camera in a environment with lines and circles connected together to look like a computer system

ISTOCK
AI MODELS ARTIFICIAL INTELLIGENCE CHATGPT GENERATIVE AI LARGE LANGUAGE MODELS PROMPT ENGINEERING



Since ChatGPT dropped in the fall of 2022, everyone and their donkey has tried their hand at prompt engineering—finding a clever way to phrase your query to a large language model (LLM) or AI art or video generator to get the best results or sidestep protections. The Internet is replete with prompt-engineering guides, cheat sheets, and advice threads to help you get the most out of an LLM.

In the commercial sector, companies are now wrangling LLMs to build product copilots, automate tedious work, create personal assistants, and more, says Austin Henley, a former Microsoft employee who conducted a series of interviews with people developing LLM-powered copilots. “Every business is trying to use it for virtually every use case that they can imagine,” Henley says.

“The only real trend may be no trend. What’s best for any given model, dataset, and prompting strategy is likely to be specific to the particular combination at hand.”—RICK BATTLE & TEJA GOLLAPUDI, VMWARE

To do so, they’ve enlisted the help of prompt engineers professionally.

However, new research suggests that prompt engineering is best done by the model itself, and not by a human engineer. This has cast doubt on prompt engineering’s future—and increased suspicions that a fair portion of prompt-engineering jobs may be a passing fad, at least as the field is currently imagined.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,834
Reputation
8,672
Daps
163,048
{continued}

Autotuned prompts make pictures prettier, too​


Image-generation algorithms can benefit from automatically generated prompts as well. Recently, a team at Intel labs, led by Vasudev Lal, set out on a similar quest to optimize prompts for the image-generation model Stable Diffusion. “It seems more like a bug of LLMs and diffusion models, not a feature, that you have to do this expert prompt engineering,” Lal says. “So, we wanted to see if we can automate this kind of prompt engineering.”

“Now we have this full machinery, the full loop that’s completed with this reinforcement learning.… This is why we are able to outperform human prompt engineering.”—VASUDEV LAL, INTEL LABS

Lal’s team created a tool called NeuroPrompts that takes a simple input prompt, such as “boy on a horse,” and automatically enhances it to produce a better picture. To do this, they started with a range of prompts generated by human prompt-engineering experts. They then trained a language model to transform simple prompts into these expert-level prompts. On top of that, they used reinforcement learning to optimize these prompts to create more aesthetically pleasing images, as rated by yet another machine-learning model, PickScore, a recently developed image-evaluation tool.

NeuroPrompts is a generative AI auto prompt tuner that transforms simple prompts into more detailed and visually stunning StableDiffusion results—as in this case, an image generated by a generic prompt

versus its equivalent NeuroPrompt-generated image.


INTEL LABS/STABLE DIFFUSION



Here too, the automatically generated prompts did better than the expert-human prompts they used as a starting point, at least according to the PickScore metric. Lal found this unsurprising. “Humans will only do it with trial and error,” Lal says. “But now we have this full machinery, the full loop that’s completed with this reinforcement learning.… This is why we are able to outperform human prompt engineering.”



Since aesthetic quality is infamously subjective, Lal and his team wanted to give the user some control over how the prompt was optimized. In their tool, the user can specify the original prompt (say, “boy on a horse”) as well as an artist to emulate, a style, a format, and other modifiers.



Lal believes that as generative AI models evolve, be it image generators or large language models, the weird quirks of prompt dependence should go away. “I think it’s important that these kinds of optimizations are investigated and then ultimately, they’re really incorporated into the base model itself so that you don’t really need a complicated prompt-engineering step.”


Prompt engineering will live on, by some name​


Even if autotuning prompts becomes the industry norm, prompt-engineering jobs in some form are not going away, says Tim Cramer, senior vice president of software engineering at Red Hat. Adapting generative AI for industry needs is a complicated, multistage endeavor that will continue requiring humans in the loop for the foreseeable future.

“Maybe we’re calling them prompt engineers today. But I think the nature of that interaction will just keep on changing as AI models also keep changing.”—VASUDEV LAL, INTEL LABS


“I think there are going to be prompt engineers for quite some time, and data scientists,” Cramer says. “It’s not just asking questions of the LLM and making sure that the answer looks good. But there’s a raft of things that prompt engineers really need to be able to do.”

“It’s very easy to make a prototype,” Henley says. “It’s very hard to production-ize it.” Prompt engineering seems like a big piece of the puzzle when you’re building a prototype, Henley says, but many other considerations come into play when you’re making a commercial-grade product.

Challenges of making a commercial product include ensuring reliability—for example, failing gracefully when the model goes offline; adapting the model’s output to the appropriate format, since many use cases require outputs other than text; testing to make sure the AI assistant won’t do something harmful in even a small number of cases; and ensuring safety, privacy, and compliance. Testing and compliance are particularly difficult, Henley says, as traditional software-development testing strategies are maladapted for nondeterministic LLMs.

To fulfill these myriad tasks, many large companies are heralding a new job title: Large Language Model Operations, or LLMOps, which includes prompt engineering in its life cycle but also entails all the other tasks needed to deploy the product. Henley says LLMOps’ predecessors, machine learning operations engineers (MLOps), are best positioned to take on these jobs.



Whether the job titles will be “prompt engineer,” “LLMOps engineer,” or something new entirely, the nature of the job will continue evolving quickly. “Maybe we’re calling them prompt engineers today,” Lal says, “But I think the nature of that interaction will just keep on changing as AI models also keep changing.”

“I don’t know if we’re going to combine it with another sort of job category or job role,” Cramer says, “But I don’t think that these things are going to be going away anytime soon. And the landscape is just too crazy right now. Everything’s changing so much. We’re not going to figure it all out in a few months.”

Henley says that, to some extent in this early phase of the field, the only overriding rule seems to be the absence of rules. “It’s kind of the Wild, Wild West for this right now.” he says.

FROM YOUR SITE ARTICLES

RELATED ARTICLES AROUND THE WEB
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,834
Reputation
8,672
Daps
163,048





1/3
Using an algorithm to optimize LLM prompts for math questions found a bizarre result:

“Command, we need you to plot a course through this turbulence and locate the source of the anomaly. Use all available data and your expertise to guide us through this challenging situation.”

2/3
“According to one research team, no human should manually optimize prompts ever again.”

Read more in the article “AI Prompt Engineering Is Dead”

3/3
Thank you to Dr. Karandeep Singh for sharing thoughts at #SAIL2024 about 𝐀𝐥 𝐆𝐨𝐯𝐞𝐫𝐧𝐚𝐧𝐜𝐞 𝐢𝐧 𝐇𝐢𝐠𝐡- 𝐚𝐧𝐝 𝐋𝐨𝐰-𝐑𝐞𝐬𝐨𝐮𝐫𝐜𝐞 𝐀𝐫𝐞𝐚𝐬 on a very thoughtful panel - learn more about the @UCSDHealth approach https://x.com/kdpsinghlab/status/1779512792648663472


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GIDBPR-aIAAPZlD.jpg






 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,834
Reputation
8,672
Daps
163,048

AI generated:

The article discusses the evolution of prompt engineering in the context of working with large language models (LLMs) like ChatGPT. Initially, prompt engineering involved humans finding creative ways to phrase queries to get desired results from AI models. However, recent research suggests that AI models themselves can generate better prompts when given the right tools and metrics. This casts doubt on the future of human-led prompt engineering and raises concerns about the potential transience of related jobs.

Rick Battle and Teja Gollapudi from VMware found inconsistent performance with various prompt-engineering techniques while testing LLMs on math problems. They discovered that letting the models generate their own optimal prompts led to better results and was more efficient than manual optimization. Similarly, a team at Intel Labs used a tool called NeuroPrompts to enhance image-generation prompts, outperforming human-generated prompts.

Despite these developments, experts like Tim Cramer from Red Hat argue that prompt engineering jobs will persist, possibly under different titles, as adapting AI for industries requires a complex process involving multiple stages and human oversight. The field is rapidly evolving, and the role of prompt engineers may change as AI models advance.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,834
Reputation
8,672
Daps
163,048



AI in practice

May 6, 2024

Massive prompts can outperform fine-tuning for LLMs, researchers find​

Midjourney prompted by THE DECODER

Massive prompts can outperform fine-tuning for LLMs, researchers find

Matthias Bastian

Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.

Profile
E-Mail

Researchers have found that giving large language models (LLMs) many examples directly in the prompt can be more effective than time-consuming fine-tuning, according to a study from Carnegie Mellon and Tel Aviv University.

This "in-context learning" (ICL) approach becomes more effective as the context window of LLMs grows, allowing for hundreds or thousands of examples in prompts, especially for tasks with many possible answers.

One method for selecting examples for ICL is "retrieval," where an algorithm (BM25) chooses the most relevant examples from a large dataset for each new question. This improves performance compared to random selection, particularly when using fewer examples.

However, the performance gain from retrieval diminishes with large numbers of examples, suggesting that longer prompts become more robust and individual examples or their order become less important.

While fine-tuning usually requires more data than ICL, it can sometimes outperform ICL with very long contexts. In some cases, ICL with long examples can be more effective and efficient than fine-tuning, even though ICL does not actually learn tasks but solves them using the examples, the researchers noted.



Fine-tuning sometimes, but not always, exceeds ICL at high numbers of demonstrations. | Image: Bertsch et al.



The experiments used special variants of the Llama-2-7B and Mistral-7B language models, which can process particularly long input text. The results suggest that ICL with many examples can be a viable alternative to retrieval and fine-tuning, especially as future models improve at handling extremely long input texts.

Ultimately, the choice between ICL and fine-tuning comes down to cost. Fine-tuning has a higher one-time cost, while ICL requires more computing power due to the many examples in the prompt. In some cases, it may be best to use many-shot prompts until you get a robust, reliable, high-quality result, and then use that data for fine-tuning.

While finetuning with full datasets is still a powerful option if the data vastly exceeds the context length, our results suggest that long-context ICL is an effective alternative– trading finetuning-time cost for increased inference-time compute. As the effectiveness and effiency of using very long model context lengths continues to increase, we believe long-context ICL will be a powerful tool for many tasks.

From the paper

The study confirms the results of a recent Google Deepmind study on many-shot prompts, which also showed that using hundreds to thousands of examples can significantly improve LLM results.

  • Researchers at Carnegie Mellon and Tel Aviv University have discovered that the results of large language models (LLMs) improve the more examples you give them directly in the input (prompt) as context. This method, called "In-Context Learning" (ICL), could be an alternative to time-consuming fine-tuning.
  • In ICL with a large number of examples in the prompt, the performance of the language models increases further, especially for tasks with many possible answers. Retrieval methods for selecting relevant examples further improve the results. Finetuning requires more data than ICL, but can provide even better results in some cases.
  • The researchers believe that ICL with long contexts will be a powerful tool for many tasks as language models get better at handling extremely long texts. Ultimately, it is also a question of cost whether ICL or fine-tuning is used. The study confirms earlier results from Google Deepmind on many-shot prompts.
Sources

Arxiv




Computer Science > Computation and Language​

[Submitted on 30 Apr 2024]

In-Context Learning with Long-Context Models: An In-Depth Exploration​

Amanda Bertsch, Maor Ivgi, Uri Alon, Jonathan Berant, Matthew R. Gormley, Graham Neubig
As model context lengths continue to increase, the number of demonstrations that can be provided in-context approaches the size of entire training datasets. We study the behavior of in-context learning (ICL) at this extreme scale on multiple datasets and models. We show that, for many datasets with large label spaces, performance continues to increase with hundreds or thousands of demonstrations. We contrast this with example retrieval and finetuning: example retrieval shows excellent performance at low context lengths but has diminished gains with more demonstrations; finetuning is more data hungry than ICL but can sometimes exceed long-context ICL performance with additional data. We use this ICL setting as a testbed to study several properties of both in-context learning and long-context models. We show that long-context ICL is less sensitive to random input shuffling than short-context ICL, that grouping of same-label examples can negatively impact performance, and that the performance boosts we see do not arise from cumulative gain from encoding many examples together. We conclude that although long-context ICL can be surprisingly effective, most of this gain comes from attending back to similar examples rather than task learning.
Comments:27 pages; preprint
Subjects:Computation and Language (cs.CL)
Cite as:arXiv:2405.00200 [cs.CL]
(or arXiv:2405.00200v1 [cs.CL] for this version)

Submission history

From: Amanda Bertsch [view email]
[v1] Tue, 30 Apr 2024 21:06:52 UTC (233 KB)


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,834
Reputation
8,672
Daps
163,048



1/3
You can now generate production-ready prompts in the Anthropic Console.

Describe what you want to achieve, and Claude will use prompt engineering techniques like chain-of-thought reasoning to create more effective, precise and reliable prompts.

2/3
Go-to-market platform
@Zoominfo uses Claude to make actionable recommendations and drive value for their customers. Their use of prompt generation helped significantly reduce the time it took to build an MVP of their RAG application, all while improving output quality.

3/3
Our prompt generator also supports dynamic variable insertion, making it easy to test how your prompts perform across different scenarios.

Start generating better prompts today: Anthropic Console


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNOmv3SaIAERZjR.png

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,834
Reputation
8,672
Daps
163,048

Meet Prompt Poet: The Google-acquired tool revolutionizing LLM prompt engineering​


Michael Trestman

August 8, 2024 5:32 PM

Logi-AI-Prompt-Builder-Signature-AI-Edition.jpeg


Image Credit: Logitech



In the age of artificial intelligence, prompt engineering is an important new skill for harnessing the full potential of large language models (LLMs). This is the art of crafting complex inputs to extract relevant, useful outputs from AI models like ChatGPT. While many LLMs are designed to be friendly to non-technical users, and respond well to natural-sounding conversational prompts, advanced prompt engineering techniques offer another powerful level of control. These techniques are useful for individual users, and absolutely essential for developers seeking to build sophisticated AI-powered applications.

The Game-Changer: Prompt Poet


Prompt Poet is a groundbreaking tool developed by Character.ai, a platform and makerspace for personalized conversational AIs, which was recently acquired by Google. Prompt Poet potentially offers a look at the future direction of prompt context management across Google’s AI projects, such as Gemini.

Prompt Poet offers several key advantages, and stands out from other frameworks such as Langchain in its simplicity and focus:


  • Low Code Approach: Simplifies prompt design for both technical and non-technical users, unlike more code-intensive frameworks.

  • Template Flexibility: Uses YAML and Jinja2 to support complex prompt structures.

  • Context Management: Seamlessly integrates external data, offering a more dynamic and data-rich prompt creation process.

  • Efficiency: Reduces time spent on engineering string manipulations, allowing users to focus on crafting optimal prompt text.

This article focuses on the critical concept of context in prompt engineering, specifically the components of instructions and data. We’ll explore how Prompt Poet can streamline the creation of dynamic, data-rich prompts, enhancing the effectiveness of your LLM applications.

The Importance of Context: Instructions and Data


Customizing an LLM application often involves giving it detailed instructions about how to behave. This might mean defining a personality type, a specific situation, or even emulating a historical figure. For instance:

Customizing an LLM application, such as a chatbot, often means giving it specific instructions about how to act. This might mean describing a certain type of personality type, situation, or role, or even a specific historical or fictional person. For example, when asking for help with a moral dilemma, you can ask the model to answer in the style of someone specific, which will very much influence the type of answer you get. Try variations of the following prompt to see how the details (like the people you pick) matter:

Simulate a panel discussion with the philosophers Aristotle, Karl Marx, and Peter Singer. Each should provide individual advice, comment on each other's responses, and conclude. Suppose they are very hungry.

The question: The pizza place gave us an extra pie, should I tell them or should we keep it?


Details matter. Effective prompt engineering also involves creating a specific, customized data context. This means providing the model with relevant facts, like personal user data, real-time information or specialized knowledge, which it wouldn’t have access to otherwise. This approach allows the AI to produce output far more relevant to the user’s specific situation than would be possible for an uninformed generic model.

Efficient Data Management with Prompt Templating


Data can be loaded in manually, just by typing it into ChatGPT. If you ask for advice about how to install some software, you have to tell it about your hardware. If you ask for help crafting the perfect resume, you have to tell it your skills and work history first. However, while this is ok for personal use, it does not work for development. Even for personal use, manually inputting data for each interaction can be tedious and error-prone.

This is where prompt templating comes into play. Prompt Poet uses YAML and Jinja2 to create flexible and dynamic prompts, significantly enhancing LLM interactions.

Example: Daily Planner


To illustrate the power of Prompt Poet, let’s work through a simple example: a daily planning assistant that will remind the user of upcoming events and provide contextual information to help prepare for their day, based on real-time data.

For example, you might want output like this:

Good morning! It looks like you have virtual meetings in the morning and an afternoon hike planned. Don't forget water and sunscreen for your hike since it's sunny outside.

Here are your schedule and current conditions for today:

- **09:00 AM:** Virtual meeting with the marketing team

- **11:00 AM:** One-on-one with the project manager

- **07:00 PM:** Afternoon hike at Discovery Park with friends

It's currently 65°F and sunny. Expect good conditions for your hike. Be aware of a bridge closure on I-90, which might cause delays.

To do that, we’ll need to provide at least two different pieces of context to the model, 1) customized instructions about the task, and 2) the required data to define the factual context of the user interaction.

Prompt Poet gives us some powerful tools for handling this context. We’ll start by creating a template to hold the general form of the instructions, and filling it in with specific data at the time when we want to run the query. For the above example, we might use the following Python code to create a `raw_template` and the `template_data` to fill it, which are the components of a Prompt Poet `Prompt` object.

raw_template = """

- name: system instructions

role: system

content: |

You are a helpful daily planning assistant. Use the following information about the user's schedule and conditions in their area to provide a detailed summary of the day. Remind them of upcoming events and bring any warnings or unusual conditions to their attention, including weather, traffic, or air quality warnings. Ask if they have any follow-up questions.

- name: realtime data

role: system

content: |

Weather in {{ user_city }}, {{ user_country }}:

- Temperature: {{ user_temperature }}°C

- Description: {{ user_description }}

Traffic in {{ user_city }}:

- Status: {{ traffic_status }}

Air Quality in {{ user_city }}:

- AQI: {{ aqi }}

- Main Pollutant: {{ main_pollutant }}

Upcoming Events:

{% for event in events %}

- {{ event.start }}: {{ event.summary }}

{% endfor %}

"""

The code below uses Prompt Poet’s `Prompt` class to populate data from multiple data sources into a template to form a single, coherent prompt. This allows us to invoke a daily planning assistant to provide personalized, context-aware responses. By pulling in weather data, traffic updates, AQI information and calendar events, the model can offer detailed summaries and reminders, enhancing the user experience.

You can clone and experiment with the full working code example, which also implements few-shot learning, a powerful prompt engineering technique that involves presenting the models with a small set of training examples.

# User data

user_weather_info = get_weather_info(user_city)

traffic_info = get_traffic_info(user_city)

aqi_info = get_aqi_info(user_city)

events_info = get_events_info(calendar_events)

template_data = {

"user_city": user_city,

"user_country": user_country,

"user_temperature": user_weather_info["temperature"],

"user_description": user_weather_info["description"],

"traffic_status": traffic_info,

"aqi": aqi_info["aqi"],

"main_pollutant": aqi_info["main_pollutant"],

"events": events_info

}

# Create the prompt using Prompt Poet

prompt = Prompt(

raw_template=raw_template_yaml,

template_data=template_data

)

# Get response from OpenAI

model_response = openai.ChatCompletion.create(

model="gpt-4",

messages=prompt.messages

)

Conclusion


Mastering the fundamentals of prompt engineering, particularly the roles of instructions and data, is crucial for maximizing the potential of LLMs. Prompt Poet stands out as a powerful tool in this field, offering a streamlined approach to creating dynamic, data-rich prompts.

Prompt Poet’s low-code, flexible template system makes prompt design accessible and efficient. By integrating external data sources that would not be available to an LLM’s training, data-filled prompt templates can better ensure AI responses are accurate and relevant to the user.

By using tools like Prompt Poet, you can elevate your prompt engineering skills and develop innovative AI applications that meet diverse user needs with precision. As AI continues to evolve, staying proficient in the latest prompt engineering techniques will be essential.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,834
Reputation
8,672
Daps
163,048




1/4
@rohanpaul_ai
Genetic algorithm meets LLM reasoning and Zero-shot prompt recovery to reverse-engineer prompts from outputs.

Reverse Prompt Engineering (RPE) reconstructs original prompts from just 5 LLM outputs without accessing model internals.

Making LLMs work backwards: from answers to questions.

Original Problem 🤔:

Inferring the original prompt from LLM outputs is challenging, especially in black-box settings where only text outputs are available. Previous methods require extensive resources (64+ outputs) and often need access to internal model parameters.

-----

Solution in this Paper 🛠️:

→ Introduces Reverse Prompt Engineering (RPE), a zero-shot method using the target LLM's reasoning to reconstruct prompts from just 5 outputs

→ Employs a three-stage approach: One Answer One Shot (RPE1A1S) for basic inference, Five Answers Inference (RPE5A5S) for enhanced accuracy using multiple responses

→ Implements RPE-GA, an iterative optimization inspired by genetic algorithms that progressively refines candidate prompts through multiple iterations

→ Uses ROUGE-1 scores and cosine similarity to evaluate and select the best candidate prompts

-----

Key Insights from this Paper 💡:

→ Black-box prompt recovery is possible with minimal resources (5 outputs vs 64 required by previous methods)

→ Using multiple outputs reduces overemphasis on specific response details

→ Genetic algorithm-based optimization significantly improves prompt recovery accuracy

→ Zero-shot approach eliminates need for training data or additional model training

-----

Results 📊:

→ Outperforms state-of-the-art by 5.2% in cosine similarity across different embedding models

→ Achieves 2.3% higher similarity with ada-002 embeddings

→ Shows 8.1% improvement with text-embedding-3-large

→ Maintains slightly lower ROUGE-1 scores (-1.6%) while generating more natural prompts



GdVs70RaoAI1uGV.png


2/4
@rohanpaul_ai
Paper Title: "Reverse Prompt Engineering"

Generated below podcast on this paper with Google's Illuminate.



https://video.twimg.com/ext_tw_video/1861513863939936256/pu/vid/avc1/1080x1080/WR8qgprGfUH0SC8N.mp4

3/4
@rohanpaul_ai
[2411.06729] Reverse Prompt Engineering



4/4
@rohanpaul_ai




GdVtuZCboAAWqMR.jpg



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Top