Table of Contents:
- What is Prompt Engineering?
- Defining Prompt Engineering
- The Art of Prompt Engineering: An Empirical Approach
- The New Paradigm in Programming with LLMs
- Exploring the Two Levels of Prompt Engineering
- The Easy Way: Using ChatGPT and Similar Tools
- The Less Easy Way: Programmatic Interaction with LLMs
- Building AI Apps with Prompt Engineering
- A New Paradigm for Software Development
- Case Study: Creating an Automatic Grader
- Seven Essential Tricks for Effective Prompt Engineering
- Trick 1: Being Descriptive for Better Responses
- Trick 2: Providing Examples for Improved Performance
- Trick 3: Using Structured Text for Consistent Outputs
- Trick 4: Implementing Chain of Thought for Complex Tasks
- Trick 5: Utilizing Chatbot Personas for Tailored Interactions
- Trick 6: The Flipped Approach: Letting the Model Ask Questions
- Trick 7: Reflect, Review, and Refine for Enhanced Accuracy
- Demo: Building an Automatic Grader with Python and LangChain
- Setting Up Your Environment and Imports
- Creating and Using the Prompt Template
- Implementing Output Parsers for Refined Results
- Testing the Automatic Grader: Results and Analysis
- Limitations of Prompt Engineering
- Model Dependency and Optimal Prompt Strategies
- Context Window Constraints and Information Limitations
- General-Purpose vs. Specialized Models
- Looking Ahead: Fine-Tuning and Future Content
- Introduction to Model Fine-Tuning
- Upcoming Articles and Tutorials
- Conclusion and Further Reading
- Additional Resources and References
What is Prompt Engineering?
Defining Prompt Engineering
Hey everyone, I'm Icoversai. This is the fourth article on using large language models in the larger series. In practice today, I will be talking about prompt engineering, and now before all the technical folks come after me with their pitchforks. So if you're a technical person like Tony Stark, here you might be rolling your eyes at the idea of prompt engineering.
You might say prompt engineering is not engineering or prompt engineering is way overhyped or even prompt engineering is just a complete waste of time. When I first heard about the concept, I had a similar attitude. I was more concerned with the model development side of things like, how can I fine-tune a large language model?
But after spending more time with it, my perspective on prompt engineering has changed. My goal with this dog is to give a sober and practical overview of prompt engineering and the technical people out there, who are rolling their eyes like this version of Tony Stark. Maybe by the end of this, you'll be more like this version of Tony Stark. Oh wow! imprompt engineering will be another tool in your AI data science and software development Arsenal. So since this is kind of a long session. I apologize in advance first.
I'll talk about what is prompt engineering then I'll talk about two different levels of problem engineering. What I call the easy way and the less easy way. Next, we're going to talk about how you can build AI apps with prompt engineering. Then I'll talk about seven tricks for prompt engineering and then finally, we will walk through a concrete example of how to create an automatic grader using Python and Lang chain.
What is prompt engineering? The way I like to Define it is, that it's any use of an llm out of the box. But there's a lot more that can be said about prompt engineering. Here are a few comments on prompt engineering that have stood out to me. The first comes from the paper by white which defines prompt engineering as how LLMs are programmed with prompts. This raises the idea that prompt engineering is a new way to program computers and this was really eye-opening for me.
The Art of Prompt Engineering: An Empirical Approach
What do you want in natural language? Another definition comes from the paper by Hugh at all which defines prompt engineering as an empirical art of composing and formatting the prompt to maximize a model's performance on a desired task. The reason this one stood out to me is because it highlights this aspect of prompt engineering. Then it's at this point, it's not really a science it's a collection of heuristics and people throwing things against the wall and accidentally stumbling across techniques, and then through that messy process it seems like some tricks and heuristics are starting to emerge and this might be part of the reason.
The New Paradigm in Programming with LLMs
Why people are so put off by prompt engineering? Because it doesn't seem like a serious science and that's because it's not a serious science, it's still way too early in this new paradigm of large language models that we're operating in it's going to take a while for us to understand. What these models are actually doing? Why do they actually work? I think with that we'll have a better understanding of how to manipulate them. How to throw stuff at them and get desired results out and the final comment that I really liked about prompt engineering comes from Andre Carpathy in his state of GPT talk from Microsoft build 2023 where he said language models want to complete documents.
I feel like this captures the essence of prompt engineering language models that are not explicitly trained to do the vast majority of tasks. We ask them to do all these language models.
Want to do is to predict the next token and then predict the next one. The next one and the next one and so I love this concept of tricking the AI into solving your problems and that's essentially all prompt engineering is constructing some text that generates the desired outcome from the large language model.
Exploring the Two Levels of Prompt Engineering
The Easy Way: Using ChatGPT and Similar Tools
So the way I like to think about it is that there are two levels of prompt engineering. The first level is what I call the easy way which is essentially chat GPT or something similar. So now Google has barred out there Microsoft has Bing chat. All these different applications provide a very user-friendly and intuitive interface for interacting with these large language models. So while this is the easiest and cheapest way to interact with large language models.
The Less Easy Way: Programmatic Interaction with LLMs
It is a bit restrictive in that you can't really use chat GPT to build an app. Maybe, it'll help you write some code but you can't integrate chat GPT into some piece of software or some larger application that you want to build out. That's where the less easy way comes less easily is to interact with these large language models programmatically. So you could use Python for this. You could use JavaScript or whatever programming language. The key upside of the less easy way is that you can fully customize, how a large language model fits into a larger piece of software.
Building AI Apps with Prompt Engineering
A New Paradigm for Software Development
This in many ways unlocks a new paradigm for programming and software development and that brings us to building AI apps with prompt engineering. I just said the less easy way unlocks a new paradigm of software development and demonstrates this. Let's just look at a specific use case. Suppose we wanted to make an automatic grader for a high school history class. While this might be easy enough if the questions are multiple-choice or true or false. This becomes a bit more difficult when the answers are short form or even long form text responses and so an example of this is as follows.
Consider the question who was the 35th president of the United States. Well, you might think that there's only one answer, John F Kennedy. Many answers are reasonable and could be considered correct. So here's a list of a few examples. There's John F Kennedy, but JFK a very common abbreviation of his name could also be considered correct. There's also Jack Kennedy which is a common nickname used for JFK. There's John Fitzgerald Kennedy which is his full name and someone probably trying to get extra credit and then there's John F Kennedy where the student may have just forgotten to put one of the ends in his last name.
Case Study: Creating an Automatic Grader
Let's see how we can go about making a piece of software that can do this grading process automatically. First, we have the traditional Paradigm. This is how programming has always been done and here it's on the developer to figure out the logic to handle all the variations and all the edge cases. This is the hard part of programming. It's like writing a robust piece of software that can handle all the different edge cases. So this might require the user to input a list of all possible correct answers and then that might be hard.
You know with homework with a bunch of questions, you can't anticipate every possible answer that a student is going to write down, and traditionally, if you're trying to evaluate texts against. Some like Target text you probably would be using, some kind of exact or fuzzy string matching algorithm. But now let's look at this new paradigm where we can incorporate large language models into the logic of our software. Here you can use an OLM to handle all the logic of this automatic grading task using prompt engineering instead of coming in with some code that does exact matching or fuzzy matching and figuring out the logic that gives you the desired outcome.
The single correct answer is going to be placed in and then we also have a place for the student answer. All this can be fed to a large language model and the language model will generate a completion that says the student's answer is correct or the student's answer is wrong. Maybe it'll give some reasoning behind, why this student's answer is wrong taking a step back and comparing these two approaches to this problem approach.
One was to manually sit down think and write out a string-matching algorithm that tried to handle all the different edge cases and variations of potentially correct answers. I'm an okay programmer at best, so it would probably take me a week or so to get a piece of software that did an okay job at doing that comparing that to how long it took me to write this prompt which is about two minutes think of the time-saving.
Here I could have spent a week trying to use string matching to solve this problem or I could have spent a couple minutes writing a prop. This is just like the core logic of the application. This doesn't include all the peripherals. The user interfaces, the boilerplate code, and stuff like that, but that's the cost savings we're talking about here. We're talking about minutes versus days or weeks of software development. So that's the power of prompt engineering and this kind of new way of thinking about programming and software development. So now let's talk about best practices for prompt engineering.
Seven Essential Tricks for Effective Prompt Engineering
Here, I'm going to talk about seven tricks you can use to write better prompts and this is definitely not a complete or comprehensive list. This is just a set of tricks that I've extracted from comparing and contrasting a few resources. If you want to dive deeper into any one of these tricks, check out the blog published and towards data science where I talk more about these tricks and different resources. You can refer to this to learn more about any of these so just run through this.
The first trick is to be descriptive. Even though in a lot of writing tasks less is more when doing prompt engineering. It's kind of the opposite more is the better trick. twos give examples and so this is the idea of few-shot learning. You give a few demonstrations of questions and answers. For example, in your prompt that tends to improve the llm's performance trick. Three is to use structured text which we'll see what that looks like later.
Trick four is Chain of Thought which is essentially having the llm think step by step. Trick Five is using chatbot personas. So basically assigning a role or expertise to the large language model. Trick six is this flipped approach where instead of you are asking the large language model questions you are prompted to ask you questions. So it can extract information from you to generate a more helpful completion.
Finally, trick 7 is what I summarize as reflect review and refine which is essentially having the large language model reflect on its past responses and refine them either by improving it or identifying errors in past responses, okay. So let's see what this looks like via a demo. Here, I'm going to use Chat GPT and it's important to know what large language model you're using because optimal prompting strategies are dependent on the large language model that you're using.
Chact GPT is a fine-tuned model. So you don't really have to break your back too much on the prompt engineering to get reasonable responses. But if you're working with a base model like gpt3. You're going to have to do a lot more work on the prompt engineering side to get useful responses and that's because gpt3 is not a fine-tuned model. It only does word prediction. While chat GPT is a fine-tuned model it was trained to take instructions and then on top of that, they did this reinforcement learning with human feedback to refine those responses.
Trick 1: Being Descriptive for Better Responses
Trick one is to be descriptive. So let's compare and contrast an example with and without this trick. So let's say I want to use ChatGPT to help me write a birthday message for my dad. The naive thing to do would be to type to chat GPT. The following prompt write me a birthday message for my dad and so it's gonna do that. While this might be fine for some use cases. I don't write messages that are verbose like this and the response is a bit generic you know like Dad. You've been my rock my guide my source of inspiration throughout my life.
I am today for that I am eternally grateful. Oh! that's very nice. I tend to be a bit more cheeky when it comes to these kinds of birthday messages and what not another thing we can do, is to employ this trick of being descriptive and getting a good response from chat GPT. What that might look like is you type in and write me a birthday message for my dad no longer than 200 characters, okay. So now we don't want it to be as verbose. This is a big birthday because he's turning 50.
Now we're giving more context to celebrate. I booked us a boy's trip to Cancun for more context and then be sure to include some cheeky humor. He loves that so I'm giving Chat GpT more to work with to tailor the response to something closer than I would actually write. So let's see what this response looks like, okay.wisdom kindness and unwavering support have shaped me into
So it's a lot more concise which I like that it says Happy 50th Dad time to Fiesta like you're 21. Again in Cancun cheers to endless Adventures ahead hashtag dad and Cancun that's actually pretty funny. Maybe I want to use this exactly, but I could see it as a starting point for actually writing a birthday message.
Trick 2: Providing Examples for Improved Performance
So the second trick is to give examples. Let's compare prompts without and with this trick. Without giving examples, we might prompt chat gpt as follows given the title of a torch data Science Blog article. Write a subtitle for it here, we're putting in the title as prompt engineering how to trick AI into solving your problems which is the title of the blog associated with this article and then we leave the subtitle area blank. So the completion that it spits out is Unleash the Power of clever prompts for more effective AI problem solving, yeah pretty nifty. Let's see what this looks like if we give a few more examples to try to capture the style of the subtitle that we're looking for.
I kind of prefer this one over this one here. The only reason is that again. I don't like verbose text and this is more concise than the previous one here. So I think maybe that's what Chat GPT picked up on it's like, oh! These subtitles here have this number of tokens. Let's make sure that the next subtitle has about the same number of tokens just speculating, but regardless that's how you can incorporate examples into your prompt.
Trick 3: Using Structured Text for Consistent Outputs
The next trick is to use structured text. So I suppose this is our prompt for tragedy BT. We don't have any structured text here. We're just putting in a prompt without structured text. So we're asking it to write me a recipe for chocolate chip cookies gives a pretty good response, gives us ingredients, gives us instructions, and gives us some tips. If Chat GPT was not fine-tuned it may not have spit out this very neat structure for a chocolate chip cookie recipe.
This is another indication of why what large language model you're working with matters because I could be happy with this response. Here there may not even be a need to use structured text but still, let's see what this could look like. If we did use structured text in our prompt. Here the prompt is a little different create, a well-organized recipe for chocolate chip cookies. Use the following formatting elements.
The key difference here is we're now asking it specifically to follow this specific format and we're giving it kind of a description of each section that we want. So let's see what this looks like one subtle difference here is that in the completion where we use structured text. You notice that it just kind of gives the title and the ingredients and so on this is something that you could easily just copy and paste onto a web page without any alterations well.
If we go here, there's no title which could be fine but you have this certainly. So now it's trying to be more conversational and may have required some extra steps. If this fits into a larger automated pipeline but other than that it doesn't seem like there's much difference between the other aspects of the completion. One interesting thing is that here the tips are a bit clearer and bolded. Well, here there are just some quick bullet points.
Trick 4: Implementing Chain of Thought for Complex Tasks
Next, we have trick four which is Chain of Thought and the basic idea with Chain of Thought is to give the LLM time to think and this is achieved by breaking down a complex task into smaller pieces. So that it's a bit easier for the large language model to give good completions without using a Chain of Thought. This is what the prompt might look like write me a LinkedIn post based on the following medium blog and then we just copy and paste the medium blog text here through some text in here.
Trick 5: Utilizing Chatbot Personas for Tailored Interactions
Trick five is to use these chatbot personas. So it spits out something that looks pretty good. Now, let's see what this could look like with a Persona. So here instead of just asking it straight up for an itinerary.
Well, this one will just say start with the bagel at Central Park Stroll Museum and grab a bagel again. It's just eaten Bagels every single day. Oh! that's funny. I like how it injected a bit of humor here, Yep. You guessed it another Bagel fuel of your final day maybe. You really have to read through these to get a sense of the subtle differences. But maybe just from this Bagel example, this just gives you two different flavors of itineraries and maybe one matches your interests a bit more than the other
Trick 6: The Flipped Approach: Letting the Model Ask Questions
Trick number six is the flipped approach. Here instead of you asking all the questions to the chatbot, you prompt the chatbot to ask you questions to better help you with whatever you're trying to do. So let's see this without the trick. Let's say you just want an idea for an LLM-based application. You give it that prompt and it's just gonna generate some idea for you. Here's generating an idea for us edubot pros and an intelligent educational platform that harnesses the power of LLMs to offer personalized learning and tutoring experience for students of all ages and levels.
This could be a great product idea. The problem is maybe isn't something that you're passionate about or that you really care about or this idea is not tailored to your interests and skill set as someone that wants to build an app. Let's see how the flipped approach can help us with this. So here instead of asking for an idea just straight up. We can say I want you to ask me questions to help me come up with an llm based application idea.
You can see right off the bat what are your areas of expertise and interest that you'd like to incorporate into your llm-based application idea. I didn't think to say. Oh yeah, maybe I should tell the chatbot what I know and what I'm interested in. So we can better serve me and maybe. There are a bunch of other questions that are critical to making a good recommendation on an app idea that I just wouldn't think about. That's where the flip approach is helpful because the chatbot will ask you what it needs to know to give a good response and those questions may or may not be something that you can think of all up front.
Trick 7: Reflect, Review, and Refine for Enhanced Accuracy
The seventh final trick is to reflect review and refine. So this is essentially where we prompt the chatbot to look back at previous responses and evaluate them. Whether we're asking it for improvements or to identify potential mistakes. So what this might look like is here we have the edu bot Pro response from before. Let's see what happens when we prompt it to review the previous response. So here I'm saying review your previous response pinpoint areas for enhancement and offer an improved version.
It looks pretty similar but since we asked it to explain how it improved the responses, it gave us this extra section here. So reasoning for enhancements Clarity is conciseness, emphasizing, personalization, enhanced language, and then monetization strategies. The monetization section provides more detail on viable strategies, okay cool. Well, I'm not going to read through this but this prompt or something like it. You can basically copy-paste this as needed to potentially improve any chat completion.
So I know that was a ton of content and I flew through that but if you want to dive into any particular trick a bit more check out the Torches Data Science Blog, where I talk about each of of these a bit more insight resources where you can learn more everything.
Demo: Building an Automatic Grader with Python and LangChain
Setting Up Your Environment and Imports
We have just talked about applying both the easy way and the less easy way of prompt engineering. But now I want to focus more on the less easy way and I'm going to try to demonstrate the power of prompt engineering. The less Easy Way by building out this automatic greater example. We were talking about before using the long-chain Python Library. First, as always we're going to do some imports. So here we're just importing everything from long-chain and then here we're going to be using the openai API.
So that requires a secret key if you haven't worked with the open AI API before check out the previous article that talks all about that there. I talk about what an API is talk about open ai's API and give some example Python code of how you can use it here. We're just importing our secret key which allows us to make API calls. Here, we're going to make our first chain. The main utility of the Lang chain is that it provides a ton of boilerplate code that makes it easy to incorporate calls to large language models within your Python code or some larger piece of software that you're developing and it does this through these things called chains which is essentially a set of steps.
Creating and Using the Prompt Template
You can modularize into these so-called chains. Let's see what that looks like. The first thing we need is our chat model. So here we're going to incorporate Open AI's GPT 3.5 turbo. The next thing we need is a prompt template. So essentially this is going to be a chunk of text that we can actually pass in inputs and dynamically update with new information. So for example, this is the same prompt we saw from the previous slide for the automatic grader. We'll be able to pass in these argument questions correct answers and student answers into our chain and it'll dynamically update this prompt template send it to the chatbot and get back the response to put this chain together.
It's super simple, the syntax looks like this. You have an LLM chain. You define what your LLM is which is the chat model which is the open AI model. We instantiated earlier. The Prompt is prompt which is the prompt template we created on the previous slide you combine it all together into this llm chain and we Define it as a chain. We Define the inputs.
Here we're going to define the question of who was the 35th President of the United States of America. We Define the correct answer John F Kennedy and we Define the student's answer FDR. So we can pass all these inputs to the chain as a dictionary. We have this question correct answer, students answer keywords and then we plug in these values that we Define up here and then this is what the large language model spits out student's answer is wrong. So it correctly grades the student's answer as wrong because FDR was not the 35th President of the United States.
Implementing Output Parsers for Refined Results
However, there's a small problem with our chain right now. Namely, the output from this chain is a piece of text that may or may not fit nicely into our larger data pipeline or software pipeline that we're putting together. It might make a lot more sense instead of outputting a piece of text the chain will output like a true or false indicating whether the student's answer was correct or not with that numerical or Boolean output. It'll be much easier to process that information with some Downstream tasks. Maybe you want to sum up all the correct and incorrect answers to the homework and generate the final grade of the entire worksheet.
We can do this via output parsers. So this is another thing we can include in our chains that will take the output text of the large language model. We'll format in a certain way extract some piece of information or convert it into some other format as we'll see here.
Here I'm defining our output parser to determine whether the grade was correct or wrong. I just use a simple piece of logic here. I have it returned a Boolean of whether or not the word wrong is in the text completion as an example before the completion was the student, the answer is wrong. So this word wrong appears in the text completion. This parser here will return false because wrong is in the completion and so this knot will flip that and it'll make it false. So as you can see we haven't automated all the logic out of programming.
You still need to have some problem-solving skills and programming skills here. But then once we have our parser defined, we can just add it into our chain like this. So we have our llm same as before our prompt template same as before and then we add this output parser which is the grade output parser that we defined right here. Then we can apply this chain. So let's see what this looks like for Loop. So we have the same question and correct answer as before, who's the 35th President of the United States and then the correct answer is John F Kennedy.
Testing the Automatic Grader: Results and Analysis
Now we're defining a list of student questions that we may have received which are John F Kennedy JFK FDR, and John F Kennedy. Only one n John Kennedy Jack Kennedy Jacqueline Kennedy and Robert F Kennedy. Also with one end, we'll run through this list in a for Loop. We'll run our chain just like we did before and we'll print the result so here we can see that John F Kennedy is true indicating a correct response. JFK is true FDR is false.
John F Kennedy spelled incorrectly is true because we specifically said misspellings are okay John Kennedy is true because we're just dropping the middle initial Jack Kennedy's true. It's a common nickname Jacqueline Kennedy is false that was his wife and then Robert F Kennedy is false because that's his brother and as always the code is available at the GitHub repo.
Limitations of Prompt Engineering
Model Dependency and Optimal Prompt Strategies
For this article series which is linked down here, feel free to take this code adopt it or maybe just give you some ideas of what's possible with prompt Engineering. In this way, I would be remiss, if I did not talk about the limitations of prompt engineering which are as follows As I said before optimal prompt strategies are model-dependent. What is the optimal prompt for chat GPT? It's going to be completely different than what's an optimal prompt for gpt3.
Context Window Constraints and Information Limitations
Another downside is that not all pertinent information may fit into the context window, because only so much information can be passed into a large language model, and if you're talking about a significantly large knowledge base that's not something that prompt engineering may be able to do most effectively.
General-Purpose vs. Specialized Models
Another limitation is that typically the models, we use to do prompt engineering are like huge general-purpose models and if you're talking about a particular use case this might be cost-efficient or even overkill for the problem.
Looking Ahead: Fine-Tuning and Future Content
Introduction to Model Fine-Tuning
You're trying to solve and another version of this is that smaller specialized models can outperform. A larger general-purpose model an example of this was demonstrated by open AI. When comparing their smaller instruct GPT model to a much larger version of GPT3. So this brings up the idea of model fine tuning.
Upcoming Articles and Tutorial
That's going to be the topic of the next article in this series, we're going to break down some key fine-tuning Concepts and then I'm going to share some concrete example code of how you can fine-tune your very own large language model using the hugging face software ecosystem.
Conclusion and Further Reading
Additional Resources and References
I hope this article was helpful to you. If you enjoyed it, please consider liking subscribing, and sharing with others. If you have any questions or suggestions for future content, please feel free to drop those in the comments section below.