EvoPrompt – Evolutionary Algorithms Meets Prompt Engineering. A Powerful Duo

A Detailed Review on an Academic Approach to Prompt Engineering with Evolutionary Algorithms

Austin Starks
AI Advances

--

I’ve been writing about genetic algorithms (GAs) for a few months now. These algorithms are elegant and powerful, drawing inspiration from the natural process of evolution. When I discuss GAs, many of my readers are wondering what a practical use-case for it is outside of the realm of finance. Well, I just found one.

Researchers at Tsinghua University, Northwestern University, and Microsoft have developed a novel algorithm for optimizing prompts using an approach inspired by evolutionary algorithms. These researchers released a preprint paper about how to perform automated prompt engineering with their approach. Their paper is titled Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers.

This article will be a detailed review on this pre-print paper. Some aspects of this approach may be incorporated for my future Prompt Engineering as a Service idea.

I will discuss the researcher’s approach, some of the benefits I see, and some of the downsides. I’ll also discuss what the paper is missing (in my opinion) and how it could be improved.

Summary of the Approach

This paper presents a novel approach to prompt engineering. Traditional prompt engineering is usually extremely manual, and requires substantial human investment with little concrete guidance.

So, the authors had the idea of using evolutionary algorithms for prompt engineering, with some caveats. With traditional evolutionary algorithms, the operators tend to independently operate on tokens. This is detrimental for prompt optimization because we would lose coherence in the sentence structure. Just think about it logically – if we replace a random word in a sentence, that has the potential effect of creating a nonsensical phrase.

To overcome this limitation, the authors had the brilliant idea of using a Large Language Model to simulate evolutionary algorithms instead of implementing it in the traditional way. The LLM performs the operations needed to generate new candidate solutions.

They call this architecture EvoPrompt. In their evaluations, EvoPrompt significantly outperforms manual prompt engineering and other automated approaches. It’s extremely easy to implement, only requiring a Large Language Model. And, it works with relatively small population sizes.

The article discusses two different Evolutionary Algorithms: Genetic Algorithms and Differential Evolution. Both approaches outperform manual prompt engineering and scored similarly to each other. And, because I’m biased towards genetic algorithms (being a biology major from Cornell), I will only discuss that approach for the rest of this article.

Genetic Algorithm (GA) Implemented by LLMs

Figure 1 in the EvoPrompt Paper

A MAJOR benefit of this approach is that its extremely easy to implement, only requiring access to a Large Language Model like GPT-3.5. The sequence of steps is similar to ordinary genetic optimization: initialization, selection, crossover, mutation, and evaluation.

In the initialization phase, the person supplies a list of possible prompts and potentially generates some prompts using GPT. Each prompt is then evaluated, and given a fitness score.

Next, during selection, two individuals are chosen from the population. While they experimented with different selection strategies, the best approach seemed to be the roulette wheel strategy. This method is a probabilistic approach for selecting parents for the next generation, where the chance of selection is proportional to an individual’s fitness.

Next we have crossover, where the parents “breed” to create a new child. Essentially, you supply both parents to the LLM and say the following:

1. Cross over the following prompts and generate a new prompt

This generates a new prompt that’s a mixture of the parents. After completing crossover, we undergo mutation, which has a very similar approach.

2. Mutate the prompt generated in Step 1 and generate a final prompt bracketed with <prompt> and </prompt>

Afterwards, the newly-generated child prompt undergoes evaluation and is added to the population. This is repeated a fixed number of times before the population is sorted and the worst-performing prompts are eliminated from the population.

This process is repeated a few times until we generate a population of prompts each better than the initial one.

The Strengths of EvoPrompt

EvoPrompt is a novel approach to prompt engineering that outperforms other methods, while also being extremely easy to implement. The algorithm is effective even with small population sizes, and it only requires around 8 iterations to converge near the optimal solution. This is extremely beneficial because it only requires a few minutes to optimize a prompt, unlike manual prompt engineering which can potentially take hours. This has the added benefit of being highly scalable and inexpensive.

Additionally, the idea of EvoPrompt can be extended. Even with an extremely simple implementation, EvoPrompt outperforms manual methods of prompt engineering. Someone who develops a more sophisticated, robust framework would probably be able to generate even better solutions than EvoPrompt. For example, a novel approach might utilize multi-objective optimization to optimize for both costs and accuracy at the same time.

Lastly, the EvoPrompt paper does an excellent job at explaining the algorithm and benchmarking different approaches. It’s very accessible, even to someone who isn’t an expert on genetic optimization. They also provided optimized prompts for a variety of common LLM tasks.

Drawbacks and Potential Room for Improvement

While this approach is extremely promising, the paper does leave a little bit to be desired. The biggest thing missing from the paper is details regarding how each solution is evaluated. They completely hand-waved this section, which is arguably one of the most important sections of the paper.

For example, do they manually assign a score for each prompt? Do they use another LLM to judge? While the tasks they’re doing are sentiment analysis and classification, and we can extrapolate that they perhaps had ground truth labels, this process isn’t robust enough for real-world prompt engineering use-cases. In many LLM-powered applications, the “best” response may be subjective, and so a broader discussion on how this approach could be applied for real-world prompt engineering would be much appreciated.

In the same vein, the process of performing this optimization process is exponentially harder for real-world problems. Take NexusTrade, an AI-Powered automated investing platform that features a powerful AI Chat. One of the “prompts” in the system is the “Create a Portfolio” Prompt.

The “Create a Portfolio” prompt in NexusTrade

This prompt iteratively creates a portfolio configuration from a conversation with the Large Language Model. Thinking about how to apply the EvoPrompt approach to this prompt is extremely challenging. The portfolio is created from an entire conversation, and evaluating the performance for something like this would be extremely convoluted.

Ultimately, while EvoPrompt is an interesting novel approach, applying it to problems, such as the NexusTrade platform, isn’t straightforward, and more work would need to be done to extend this framework to work with real-world problems.

Conclusion

The innovative EvoPrompt method, leveraging evolutionary algorithms for prompt engineering, presents a significant advancement in optimizing interactions with Large Language Models (LLMs). Its simplicity, efficiency, and scalability highlights its potential as a powerful tool for developers and researchers alike.

However, the approach’s application to complex, real-world scenarios, such as those encountered in systems like NexusTrade, remains a challenge. The paper’s lack of detail on solution evaluation and adaptation to subjective outcomes underscores the need for further research and development. Despite these limitations, EvoPrompt’s successes in outperforming manual and other automated methods of prompt engineering offer promising insights into the future of AI-driven optimization.

EvoPrompt is a step in the right direction for automated prompt engineering. Many of the techniques they use can be incorporated into other prompt engineering optimization approaches, including ones that use more traditional genetic algorithms. Overall, I’m extremely excited about this direction of work and seeing how it evolves in the future!

Thank you for reading! If you enjoyed this article, please give me some claps and share this article with a friend (or social media)! I have several newsletters you could follow. Aurora’s Insights is the perfect blog if you’re interested in artificial intelligence, machine learning, finance, investing, trading, and the intersection between these disciplines. You can also create a free account on NexusTrade to get access to a next-generation algorithmic trading platform.

NexusGenAI is the platform that hosts NexusTrade’s AI-Powered Chat. It is also open for users on the waitlist!

🤝 Connect with me on LinkedIn

🐦 Follow me on Twitter

👨‍💻 Explore my projects on GitHub

📸 Catch me on Instagram

🎵 Dive into my TikTok

PS, did you share with a friend? 🤨

--

--

https://nexustrade.io/ Highly technical and ambitious. Building a no-code algotrading platform and an ecosystem of AI applications. https://nexusgenai.io