PROMPTBREEDER - SELF-REFERENTIAL SELF-IMPROVEMENT VIA PROMPT EVOLUTION
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
Chrisantha Fernando, Dylan Banarse, Henryk Michalewski, Simon Osindero, Tim RocktäschelPopular prompt strategies like Chain-of-Thought Prompting can dramatically improve the reasoning abilities of Large Language Models (LLMs) in various domains. However, such hand-crafted prompt-strategies are often sub-optimal. In this paper, we present Promptbreeder, a general-purpose self-referential self-improvement mechanism that evolves and adapts prompts for a given domain. Driven by an LLM, Promptbreeder mutates a population of task-prompts, and subsequently evaluates them for fitness on a training set. Crucially, the mutation of these task-prompts is governed by mutation-prompts that the LLM generates and improves throughout evolution in a self-referential way. That is, Promptbreeder is not just improving task-prompts, but it is also improving the mutationprompts that improve these task-prompts. Promptbreeder outperforms state-of-the-art prompt strategies such as Chain-of-Thought and Plan-and-Solve Prompting on commonly used arithmetic and commonsense reasoning benchmarks. Furthermore, Promptbreeder is able to evolve intricate task-prompts for the challenging problem of hate speech classification.
Subjects: | Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE) |
Cite as: | arXiv:2309.16797 [cs.CL] |
(or arXiv:2309.16797v1 [cs.CL] for this version) | |
[2309.16797] Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution Focus to learn more |
Submission history
From: Chrisantha Fernando Dr [view email][v1] Thu, 28 Sep 2023 19:01:07 UTC (697 KB)
Optimizing Language Model Prompts with Promptbreeder
Overview and Objective:
The objective is to automate the prompt engineering process to enhance the performance of Large Language Models (LLMs) across various domains. This will be achieved by utilizing Promptbreeder's genetic algorithm approach to evolve increasingly effective task-prompts and mutation-prompts.
Steps and Intermediate Deliverables:
1. Initial Setup
- Objective: Prepare the environment and identify the domain for which you want to optimize prompts.
- Deliverable: A list of domains and initial task-prompts.
- Example: Domains like solving math word problems or classifying hate speech.
2. Selection of Thinking Styles
- Objective: Choose the thinking styles that will guide the prompt generation process.
- Deliverable: A list of selected thinking styles.
- Example: "Let's think step-by-step," "Focus on the creative approach," "Consider the problem from multiple angles."
3. Initial Mutation Prompts
- Objective: Generate initial mutation prompts that will be used to modify task-prompts.
- Deliverable: A list of initial mutation prompts.
- Example: "Rephrase the instruction to make it more engaging," "Explain the instruction as if you're talking to a beginner," "Condense and refine the following instruction."
4. Final Deliverable:
- A set of highly optimized task-prompts and mutation-prompts for the selected domain, along with performance metrics demonstrating their effectiveness.
5. Task Prompt Structure
- Objective: Create the initial task prompts using the problem description and selected thinking styles.
- Deliverable: Initial task prompts.
- Example: Start with a problem description, choose a thinking style, apply an initial mutation prompt to generate the first task prompt, and then use a second mutation prompt to generate a second task prompt.
6. Fitness Evaluation
- Objective: Evaluate the effectiveness of the generated prompts.
- Deliverable: Performance metrics of the prompt pairs on a subset of training data.
- Example: Evaluate the generated prompt pairs on a random subset of training data to measure their effectiveness.
7. Mutation Operators
- Objective: Apply mutation operators to create variations in the prompts.
- Deliverable: A new set of mutated prompts.
- Example: Zero-order prompt generation, First-order prompt generation, Lineage-based mutation, Hypermutation of mutation prompts, Context shuffling, Lamarckian mutation.
8. Evolution Process
- Objective: Iteratively refine the prompts based on their performance metrics.
- Deliverable: An optimized set of prompts.
- Example: Iteratively evaluate the effectiveness of task and mutation prompts, select the most effective ones, and generate a new set through crossover and mutation.
9. Diversity Maintenance
- Objective: Ensure that the prompt set remains diverse to avoid local optima.
- Deliverable: A diverse set of optimized prompts.
- Example: Introduce random mutations, use fitness proportionate selection, apply population distribution-based mutation to maintain diversity.
Final Deliverable:
A set of highly optimized task-prompts and mutation-prompts for the selected domain, along with performance metrics demonstrating their effectiveness
Overview and Objective:
The objective is to automate the prompt engineering process to enhance the performance of Large Language Models (LLMs) across various domains. This will be achieved by utilizing Promptbreeder's genetic algorithm approach to evolve increasingly effective task-prompts and mutation-prompts.
Steps and Intermediate Deliverables:
1. Initial Setup
- Objective: Prepare the environment and identify the domain for which you want to optimize prompts.
- Deliverable: A list of domains and initial task-prompts.
- Example: Domains like solving math word problems or classifying hate speech.
2. Selection of Thinking Styles
- Objective: Choose the thinking styles that will guide the prompt generation process.
- Deliverable: A list of selected thinking styles.
- Example: "Let's think step-by-step," "Focus on the creative approach," "Consider the problem from multiple angles."
3. Initial Mutation Prompts
- Objective: Generate initial mutation prompts that will be used to modify task-prompts.
- Deliverable: A list of initial mutation prompts.
- Example: "Rephrase the instruction to make it more engaging," "Explain the instruction as if you're talking to a beginner," "Condense and refine the following instruction."
4. Final Deliverable:
- A set of highly optimized task-prompts and mutation-prompts for the selected domain, along with performance metrics demonstrating their effectiveness.
5. Task Prompt Structure
- Objective: Create the initial task prompts using the problem description and selected thinking styles.
- Deliverable: Initial task prompts.
- Example: Start with a problem description, choose a thinking style, apply an initial mutation prompt to generate the first task prompt, and then use a second mutation prompt to generate a second task prompt.
6. Fitness Evaluation
- Objective: Evaluate the effectiveness of the generated prompts.
- Deliverable: Performance metrics of the prompt pairs on a subset of training data.
- Example: Evaluate the generated prompt pairs on a random subset of training data to measure their effectiveness.
7. Mutation Operators
- Objective: Apply mutation operators to create variations in the prompts.
- Deliverable: A new set of mutated prompts.
- Example: Zero-order prompt generation, First-order prompt generation, Lineage-based mutation, Hypermutation of mutation prompts, Context shuffling, Lamarckian mutation.
8. Evolution Process
- Objective: Iteratively refine the prompts based on their performance metrics.
- Deliverable: An optimized set of prompts.
- Example: Iteratively evaluate the effectiveness of task and mutation prompts, select the most effective ones, and generate a new set through crossover and mutation.
9. Diversity Maintenance
- Objective: Ensure that the prompt set remains diverse to avoid local optima.
- Deliverable: A diverse set of optimized prompts.
- Example: Introduce random mutations, use fitness proportionate selection, apply population distribution-based mutation to maintain diversity.
Final Deliverable:
A set of highly optimized task-prompts and mutation-prompts for the selected domain, along with performance metrics demonstrating their effectiveness
Last edited: