Home
Nefe Tech LTD
a i ,

5 Prompt Engineering Patterns That Actually Work in Production

klement Gunndu

klement Gunndu

@klement_gunndu

March 09, 2026 7 min read 22 18
5 Prompt Engineering Patterns That Actually Work in Production

Most prompt engineering guides teach you to write "Act as a senior developer" and call it a day.

That works in ChatGPT. It fails in production. The moment your prompt runs inside an automated pipeline — no human reviewing outputs, no chance to retry manually — you need patterns that enforce correctness structurally, not hopefully.

These 5 patterns come from running LLM calls in automated systems where bad outputs mean broken pipelines, not just awkward chat responses. Each one includes working Python code you can copy into your project today.

1. Separate System Prompts From User Input

The most common production bug: stuffing instructions and user data into the same message. The model treats everything equally, and your carefully crafted instructions get diluted by the user's input.

The fix is structural. Every major LLM API separates system-level instructions from user messages. Use that separation.

Here's how it works with the Anthropic Python SDK (as of v0.49+, March 2026):

from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a code reviewer. Return only: PASS, FAIL, or NEEDS_REVIEW. "
           "Include a one-line reason. No other text.",
    messages=[
        {"role": "user", "content": f"Review this function:\n\n{code_snippet}"}
    ],
)

verdict = response.content[0].text
Enter fullscreen mode Exit fullscreen mode

The system parameter is not a suggestion — it is a separate instruction channel that shapes the model's behavior before it sees the user message. When you put review criteria in the system prompt and the code in the user message, the model treats them differently. Instructions stay instructions. Data stays data.

Why this matters in production: When your system prompt and user input live in the same string, a sufficiently long user input pushes your instructions out of the model's attention window. Separating them prevents prompt injection by design, not by hope.

The pattern: Put constraints, output format, and role definition in system. Put variable data in messages. Never mix them.

2. Force Structured Output With Pydantic

Parsing free-text LLM responses with regex is the production equivalent of catching rain with your hands. It works sometimes. It fails at 3 AM on a Saturday.

Structured output forces the model to return data that matches your exact schema. No parsing. No "the model forgot to include the field." The schema is the contract.

Here's how it works with the OpenAI Python SDK using the Responses API (as of v1.66+, March 2026):

from openai import OpenAI
from pydantic import BaseModel, Field

client = OpenAI()

class CodeReview(BaseModel):
    verdict: str = Field(description="PASS, FAIL, or NEEDS_REVIEW")
    reason: str = Field(description="One-line explanation")
    severity: int = Field(ge=1, le=5, description="1=minor, 5=critical")

response = client.responses.parse(
    model="gpt-4o",
    input=[
        {"role": "system", "content": "Review the code. Return structured output."},
        {"role": "user", "content": f"Review:\n\n{code_snippet}"},
    ],
    text_format=CodeReview,
)

review = response.output_parsed  # CodeReview instance
print(review.verdict)   # "FAIL"
print(review.severity)  # 4
Enter fullscreen mode Exit fullscreen mode

The text_format parameter takes a Pydantic BaseModel class. The SDK handles JSON schema generation and response deserialization automatically. Your downstream code receives a typed Python object, not a string.

If you're using the Chat Completions API instead:

completion = client.chat.completions.parse(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Review the code. Return structured output."},
        {"role": "user", "content": f"Review:\n\n{code_snippet}"},
    ],
    response_format=CodeReview,
)

review = completion.choices[0].message.parsed
Enter fullscreen mode Exit fullscreen mode

Why this matters in production: Without structured output, you write regex to extract fields, handle edge cases where the model wraps its response in markdown, and debug silent failures when a field is missing. With Pydantic, the contract is enforced at the API level. If the response doesn't match your schema, you get an error — not corrupted data.

The pattern: Define your output as a Pydantic model. Pass it to the API. Never parse free text in an automated pipeline.

3. Use Few-Shot Examples to Lock In Format

System prompts define the rules. Few-shot examples show the rules in action.

When you need consistent formatting across thousands of calls — extracting data, classifying inputs, generating reports — few-shot examples reduce variance more than any instruction ever will. Models learn by imitation, not just instruction.

Here's how it works with the Anthropic SDK:

from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=256,
    system="Extract the action items from meeting notes. "
           "Return each as: - [OWNER] Action description (DEADLINE)",
    messages=[
        # Few-shot example 1
        {
            "role": "user",
            "content": "Notes: Sarah will update the API docs by Friday. "
                       "Mike needs to fix the login bug before sprint end.",
        },
        {
            "role": "assistant",
            "content": "- [Sarah] Update API docs (Friday)\n"
                       "- [Mike] Fix login bug (sprint end)",
        },
        # Few-shot example 2
        {
            "role": "user",
            "content": "Notes: Team agreed to defer the redesign. "
                       "Jake to send the Q3 report to finance by EOD Tuesday.",
        },
        {
            "role": "assistant",
            "content": "- [Jake] Send Q3 report to finance (EOD Tuesday)",
        },
        # Actual input
        {
            "role": "user",
            "content": f"Notes: {meeting_notes}",
        },
    ],
)
Enter fullscreen mode Exit fullscreen mode

Notice: the "defer the redesign" note produced no action item. That second example teaches the model to skip non-actionable statements. Without it, models tend to generate phantom action items from every sentence.

Why this matters in production: Instructions describe what you want. Examples describe what it looks like. The gap between "extract action items" and "extract action items formatted exactly like this, omitting non-actionable statements" is where production bugs live. Few-shot examples close that gap.

The pattern: Include 2-3 examples that cover your edge cases. At least one example should show what the model should NOT include. Alternate user/assistant roles to simulate a real conversation.

4. Chain-of-Thought for Complex Reasoning

When a model needs to classify, compare, or decide — not just extract — asking for the answer directly produces unreliable results. Chain-of-thought prompting forces the model to show its reasoning before committing to an answer.

This is the difference between "guess the answer" and "work through the problem."

from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=(
        "You are a security reviewer analyzing code for vulnerabilities.\n\n"
        "For each code snippet, follow these steps:\n"
        "1. Identify what the code does (2-3 sentences)\n"
        "2. List potential security issues (be specific)\n"
        "3. For each issue, state the attack vector\n"
        "4. Give your final verdict: SAFE, REVIEW, or VULNERABLE\n\n"
        "Always complete all 4 steps before giving the verdict."
    ),
    messages=[
        {"role": "user", "content": f"Review this code:\n\n{code_snippet}"}
    ],
)
Enter fullscreen mode Exit fullscreen mode

The key is step 4: "Always complete all 4 steps before giving the verdict." Without that constraint, the model often jumps to the verdict after step 1 and rationalizes backward. Forcing sequential reasoning produces more accurate classifications.

Here's a tighter version using XML tags for structure (works well with Claude models):

system_prompt = """Analyze the code for security issues.

<thinking>
Step 1: What does this code do?
Step 2: What security issues exist?
Step 3: What are the attack vectors?
</thinking>

<verdict>SAFE | REVIEW | VULNERABLE</verdict>
<reason>One sentence explaining the verdict</reason>

Always complete <thinking> before writing <verdict>."""
Enter fullscreen mode Exit fullscreen mode

XML tags give you parseable structure in the output. You can extract the verdict with a simple string search instead of hoping the model puts it in the right place.

Why this matters in production: On classification tasks, chain-of-thought prompting improves accuracy. Anthropic's own prompt engineering documentation recommends this pattern for complex analytical tasks. The model catches edge cases during reasoning that it would miss when jumping to conclusions.

The pattern: Number the reasoning steps. Put the final answer last. Tell the model to complete all steps before answering. Use XML tags if you need to parse specific sections from the output.

5. Template Variables With LangChain

When you're running the same prompt structure across different inputs — different customers, different documents, different code files — hardcoded strings become unmaintainable. Prompt templates separate the structure from the data.

Here's how it works with LangChain's ChatPromptTemplate (as of langchain-core 0.3+, March 2026):

from langchain_core.prompts import ChatPromptTemplate

review_prompt = ChatPromptTemplate.from_messages([
    ("system",
     "You are a {role} reviewing {artifact_type}. "
     "Apply these standards: {standards}. "
     "Return: verdict (PASS/FAIL), issues found, suggestions."),
    ("human", "Review this {artifact_type}:\n\n{content}"),
])

# Reuse across different review types
code_messages = review_prompt.format_messages(
    role="senior Python developer",
    artifact_type="pull request",
    standards="PEP 8, type hints required, no bare exceptions",
    content=pr_diff,
)

doc_messages = review_prompt.format_messages(
    role="technical writer",
    artifact_type="API documentation",
    standards="all endpoints documented, examples for each, error codes listed",
    content=api_docs,
)
Enter fullscreen mode Exit fullscreen mode

One template. Two completely different review contexts. The structure stays consistent; the variables change per call.

For more complex workflows where you need conversation history:

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

multi_turn_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a {role}. Be concise and specific."),
    MessagesPlaceholder("history"),
    ("human", "{input}"),
])
Enter fullscreen mode Exit fullscreen mode

MessagesPlaceholder injects a list of previous messages, letting you build multi-turn conversations without string concatenation.

Why this matters in production: When prompt logic lives in f-strings scattered across your codebase, changing the output format means finding and updating every instance. Templates centralize prompt logic. You version them, test them, and swap them without touching business logic.

The pattern: Define templates once. Pass variables at call time. Use MessagesPlaceholder for conversation history. Store templates as constants or in config files — not inline in business logic.

The Meta-Pattern: Combine Them

These 5 patterns are not alternatives. They stack.

A production-ready LLM call typically uses 3-4 of these together: system prompt separation (Pattern 1) + structured output (Pattern 2) + few-shot examples (Pattern 3), all wrapped in a reusable template (Pattern 5). Add chain-of-thought (Pattern 4) when the task requires reasoning.

The difference between a prompt that works in development and one that works in production is not cleverness. It is structure. Structured prompts produce predictable outputs. Predictable outputs don't break pipelines at 3 AM.

Start with one pattern. Add the next when your current approach fails. By the time you're using all five, your LLM calls behave more like function calls — inputs in, typed outputs out, no surprises.


Follow @klement_gunndu for more AI engineering content. We're building in public.

Share this article:
View on Dev.to