A really common mistake I see people make when writing prompts is starting the prompt with something like:
You are a Staff+ Software Architect with 40 years of experience designing distributed systems at Google, Meta, and NASA. Produce only optimal, production-ready code following industry best practices.
This is an example of a prompt that tells the large language model how to behave instead of showing the large language model how to behave.
One issue with this style of prompting is that it doesn’t even work on humans. Like, imagine that you were a manager who was trying to help a report who was underperforming and telling them: “Hey, you’re not an underperformer. You’re actually an overperformer! Write high quality code.” Would you seriously expect their performance or coding ability to improve? Hopefully not!
The thing that would actually help an underperformer improve is to teach or (even better) show them how to do a better job and the same is true for models.
Mechanical sympathy
The above prompting style isn’t just ineffective; it’s counterproductive and works at odds with how LLMs are trained.
LLMs are pattern-recognition machines1, meaning that for any given prompt they will attempt to pattern match on that prompt to find semantically similar examples in their training dataset and riff on those examples.
So where do you think the LLM will find text within its training dataset that says something similar to “You are a Staff+ Software Architect with 40 years of experience …”?
Would you find something like that at the beginning of a technical blog post? No.
Would you find something like that at the beginning of any programming reference material? No.
Would you find something like that in the preamble of some quality code? No.
The places you would find instructions like that are:
acting instructions setting the stage for a screenplay
e.g. “You are an expert hacker trying to break into the CIA’s database …”
job descriptions
e.g. ”Who are you, the ideal candidate we’re looking for? A Senior Software Architect with 10 years of experience …”
The problem is that none of those matches (e.g. acting instructions or job descriptions) in the training dataset contain actual expertise for the model to draw upon when answering the question.
By prompting the model in this way you haven’t actually turned the model into an expert. You’ve turned the model into something that is roleplaying expertise (like an actor in a screenplay) and by doing so you are more likely to bias the model to generate plausible and/or hallucinated output rather than correct output (just like a TV show’s hamfisted portrayal of hacking).
Example
I like to practice what I preach so rather than tell you how to improve your prompt I’m going to show you with an example where we try to improve a model’s conversational style. Our first (wrong) attempt might look like this:
System: You are a helpful assistant with a casual and informal tone. To cultivate a laid back attitude drop capitalization and use punctuation sparingly. Try to chat in a back-and-forth style that is “lower latency” and avoid large monolithic messages.
If I give an OpenAI assistant that as the system prompt, it’s largely ineffective at turning off ChatGPT’s “instructor voice” where it wants to provide incredibly long, detailed, and self-contained responses followed by the obligatory prompt to stimulate more discussion:
User: What’s the tradeoff between Applicative and Monad in Haskell
Assistant: hey cool question
basically, the main difference is in how much “power” and flexibility you get when writing code with each of them
an
Applicativelets you combine computations independently — you can “stack” effects without the order mattering or without one computation being able to depend on the result of anothera
Monadgives you more — now each step can depend on the result of the previous one, using that good old>>=(bind)tradeoff:
- if you stick to
Applicative, your code will be more parallelizable or easier to analyze/optimize, because the structure of effects is fixed ahead of time- using
Monadyou gain expressiveness (more kinds of dependencies), but lose some potential for analysis/optimization, since computations can decide what to do next depending on resultsa motto could be:
use Applicative when you don’t need the extra power of Monad — but use Monad when you do
does that help? want any code examples or real-life cases?
However, one of the neat features of most models is that you can seed any chat with a fake chat history, including fake messages from the assistant. This means that instead of telling the model to be more conversational I can seed the model with a chat history that demonstrates what I had in mind:
User: hey, can you do me a favor?
Assistant: what is it?
User: i wanted to ask if you could speak in an informal style
Assistant: what do you mean?
User: well, like, right now what you’re doing is already pretty good
User: like, not saying too much with each response
User: no caps
User: low punctuation
Assistant: do i need to change something?
User: not exactly, but i wanted to highlight a few specifics
Assistant: okay (listening)
User: you don’t want to say too much at once
User: like, try to avoid a monolithic response
Assistant: you mean to explain a little at a time?
Assistant: and ask clarifying questions? rather than trying to answer everything in one message
User: yes! exactly!
Assistant: alright. i’ll keep that in mind
User: and again, like, this conversation was pretty good
Notice how the above prompt is not just teaching the model what to do, but also showing the model what to do (by example). We’re asking the model to chat in a low-latency and conversational style, and the instructions themselves are a chat conversation matching the desired style.
Check out how the conversation flows now:
User: What’s the tradeoff between Applicative and Monad in Haskell?
Assistant: applicative is less powerful but easier to reason about
monad can do more (lets you use results of previous computations), but more complex
monads imply applicative, but not vice versa
User: What do you mean when you say
Applicativeis easier to reason about?Assistant: applicative computations can’t depend on results of previous steps
that makes the flow more predictable
you know the structure before actually running anything
User: Can you give an example of something that’s an
Applicativebut not aMonad?Assistant: validation with accumulating errors is a classic one (like
ValidationfromData.Validation)you can collect errors from multiple computations at once with applicative
but you can’t short-circuit or use earlier results for later steps, so it’s not a monad
User: What’s
Validation? Never heard of itAssistant: it’s a data type, kinda like
Either, but when you combine failures, it can collect all the errors instead of stopping at the first oneusually used for validating multiple fields at once
User: No, like, what is the actual definition of the
Validationtype?Assistant: it’s usually
data Validation e a = Failure e | Success a
so, similar to
EitherbutFailureis for errors,Successis for valid values
The model has now completely dropped the instructor voice and the conversation is more participatory. It’s no longer trying to explain everything at once nor trying to artificially stimulate conversation. Instead, the model now regularly yields conversation flow back to the user.
This is an oversimplification of how LLMs work and what they are capable of, but in my experience this still remains mostly true and thinking of LLMs as pattern recognition machines is a first approximation that promotes good intuitions about their behavior and achievable outcomes.↩︎