Enter the password to view these slides.
Driving the Engine — From Prompts to Reliable AI Outputs
Harper Carroll AI · AI User to AI Builder · Cohort 1
Three ideas that underpin everything today.
LLMs predict the most likely next token given everything before it. It's pattern matching at scale.
Models see subword pieces, not whole words.
This is why letter-counting can fail.
The model can sound confident even when it's wrong. It predicts plausible text, not verified facts.
Models don't pay equal attention to everything in the context window. (Liu et al., Stanford, 2024)
Models pay the most attention to the beginning and end of the context window.
Information buried in the middle gets overlooked — even critical details like "my partner is allergic to shellfish."
The key insight:
Today: what's universal, what changed, and what's still essential.
How to think about prompting
How reasoning models changed the way we prompt
The habits behind effective results
Essential techniques for standard models
Chaining prompts into systems you can automate
A thinking discipline AND a prompting technique.
The model tends to fill every gap you leave.
The model has to guess what you actually want.
Who should the model be? Frames everything that follows.
What specific action should the model perform?
Background, audience, relevant data — what would you tell a smart new hire?
What should the output look like? Be specific.
What should the model NOT do? Boundaries and limits.
1–3 input/output pairs. Show, don't just tell.
Repeat Task + Constraints at the end so the model sees them right before responding.
Task: Extract action items from meeting transcripts.
Missing owners, vague deadlines, harder to hand off or automate
Consistent, actionable, reliable every time
What reasoning models changed about prompting
Prompting trick to get better answers
Reasoning as a built-in model capability
Ask the model to show its work before answering. Wei et al.
All of these techniques share one thing in common:
They are humans compensating for models
that couldn't reason on their own.
What if the model could learn to do this itself?
Models that "think" before they answer.
Prompt in → answer out.
One forward pass. No thinking step.
Think first, then answer.
Generates a hidden chain-of-thought.
Training-time compute: How much you spent training the model ($100M+)
Inference-time compute: How much you spend each time the model answers
Reasoning models trade more inference-time compute for better answers.
The model thinks before it answers. You can see the trace.
The burden shifted from you to the model.
It was important to:
The model handles reasoning internally. You focus on:
| Factor | Standard Model | Reasoning Model | + Tool-Enabled Model (either) |
|---|---|---|---|
| Best for | Simple, well-defined | Complex, multi-step analysis | Tasks requiring external facts or actions |
| Factual reliability | Moderate without extra controls | Better reasoning, still can hallucinate facts | Highest when grounded in approved sources/tools |
| Latency tolerance | Fastest | Seconds to minutes (adaptive reasoning can be fast) | Variable (depends on tool calls) |
| Cost profile | Lowest per request | Higher token cost | Model + tool/runtime cost |
| Examples | Email triage, first drafts, tagging | Contract analysis, planning, hard coding/debugging | Policy Q&A with retrieval (e.g. RAG), pricing from APIs, workflow actions |
The skills that matter more than syntax
Outperformers treat AI like a teammate, not a one-shot tool.
Takes the first acceptable answer and stops.
Creates a short interview before drafting, which improves quality.
Use two workflows: volume first, rigor second. (Utley, "Stop Fighting AI Glazing")
| Task | Diverge | Converge |
|---|---|---|
| Naming a product | "20 name ideas for a freelancer PM app." | "I like #4 and #11. Give me 15 more in that direction." |
| Presentation | "15 angles for a talk on AI adoption." | "Angle #7 is strongest. 10 more in that zone." |
RLHF teaches models to agree with you — even when you're wrong.
RLHF rewards responses humans prefer. Humans prefer responses that agree with them. So the model tends to learn:
"Agreement = reward"
Reasoning models sound more authoritative. The reasoning trace makes agreement feel more justified. This makes sycophancy harder to detect.
Four techniques to get honest answers.
Instead of "I think X is better, what do you think?" ask "Compare X and Y on these criteria." Wang et al., 2025
"Now tell me why this is a bad idea" — this often reveals the real answer the model was suppressing.
When the model says "Great idea!" or "You're absolutely right!" — that's a red flag, not a signal of quality.
Even reasoning models lose track of multi-part instructions.
Four tasks in one sentence. The model may do 2-3 well and quietly skip the rest.
Same work, but numbered and sequenced. Nothing gets lost.
What to do when the output isn't right.
What specifically is wrong? Wrong format? Wrong content? Hallucination? Too long?
Score the output on accuracy, completeness, format, tone, and cost before iterating.
Don't rewrite the whole prompt. Change one element and re-run.
One good output isn't enough. Run it 3-5 times. Does it consistently work?
How to decide if an output is ready to use.
| Dimension | Pass condition | Typical fix if it fails |
|---|---|---|
| Accuracy | Claims match verified sources | Improve context, add retrieval, enforce citations |
| Completeness | All required items are covered | Clarify task and checklist in format spec |
| Format compliance | Matches schema/template exactly | Provide strict output template or JSON schema |
| Tone and audience | Appropriate for end user | Add examples and role/tone constraints |
| Cost and latency | Within budget and response SLA | Route to cheaper/faster model when possible |
You verify claims and verify the sources behind those claims.
Could be hallucinated, incomplete, or overconfident.
Could be misquoted, fabricated, or not actually supporting the claim.
Even real sources can be outdated, biased, or wrong.
Decide what can and cannot enter an AI tool.
System sets durable rules. User sets the per-turn task.
Persistent instructions: role, rules, constraints.
Specific request for this turn only.
Shaped by both system + user prompt
Set project-level instructions in the UI.
Personalization settings across chats.
System prompts as files for coding agents.
Full control for product workflows.
Where to find them in the tools you already use.
Personalization → Custom Instructions
Create a Project → Set Instructions
Essential techniques for standard models.
Standard models have structural limitations. These techniques compensate.
Standard models read your prompt sequentially. Earlier tokens can't see later instructions.
They don't have a dedicated "thinking" phase to reason through the problem.
The structure you provide in your prompt shapes how the model approaches the problem.
A dead-simple technique backed by research. (Leviathan et al., 2024)
Session 1 callback: The model encodes meaning left-to-right. Each token's representation is built from the tokens before it — never after.
If the task comes last, earlier tokens were encoded without knowing what you're asking for. Repetition fixes this: the second copy of your instructions sees the full context from the first.
Assigning a persona to activate domain-specific patterns.
The model learned patterns from financial analysts writing online. Assigning the role activates those learned distributions.
The model mirrors the register of your input.
You get: vague, breezy, surface-level output that matches the energy you gave it.
You get: structured, detailed, professional output.
Session 1 callback: The model learned from the internet. Casual inputs were followed by casual responses. Professional inputs were followed by professional responses.
You're influencing which region of the training distribution the model draws from.
The most powerful technique most people skip.
Decent — but inconsistent on edge cases
Dramatically more accurate and consistent
How wording choices accidentally bias the model.
Leads the model: the model starts by encoding that slide 12 should be deleted and doesn't offer value
Sycophancy trigger: you stated your opinion
Vague criteria: no way for the model to evaluate objectively
Open question: asks the model to evaluate, not confirm
No opinion stated: the model must reason independently
Specific criteria: gives the model something concrete to assess
Use AI to sharpen the Role in your Prompt Framework.
Start simple, then refine with AI:
Your first prompt
"You are an expert in child psychology and child education methodologies and an expert software engineer."
"How would you refine this prompt to make it so that the LLM is an expert at teaching children how to learn in the best ways?"
AI-refined result
"You are an elite educational architect and an expert in child developmental psychology, specializing in metacognition. You have deep expertise in diverse, evidence-based pedagogical methodologies — including Montessori, Vygotsky's ZPD, Bloom's Taxonomy, and neuroplasticity-driven learning.
You are also an expert software engineer. Your unique skillset allows you to translate these complex methodologies into scalable algorithms and engaging digital experiences."
What to use when — and whether reasoning models need it.
| Technique | What it does | Standard model | Reasoning model |
|---|---|---|---|
| Prompt Framework (R+TCFCE+T) | Role first, define the task, re-state it at the end | Essential | Good practice — helps you think clearly |
| Tone matching | Your register shapes output quality | Essential | Less important — reasoning normalizes output |
| Don't state your opinion | Avoids triggering sycophancy | Essential | Essential |
| Explicit steps | Number each task so nothing gets skipped | Essential | Essential |
| Role prompting | Activates domain-specific patterns | High impact | Helpful — can infer from context |
| Few-shot examples | Shows the model what "good" looks like | High impact | Helpful for complex formats |
| Task-first + repetition | Puts instructions where encoding sees them | High impact | Saves tokens — less rethinking needed |
A single prompt is a tool. A chain is a system.
Before you build anything, answer these five questions.
What starts the workflow? (new email, scheduled time, manual click, new data)
What data does the AI need? Where does it come from?
What does the AI do at each stage? Standard model or reasoning model for each?
What's the final deliverable? Where does it go?
Where does a human need to review before the workflow continues?
Take your use case from homework and apply today's tools.
Use this before homework. It turns today's concepts into an execution checklist.
Download Claude Code and/or Codex. We'll build with them in Session 3 — come ready.
Come up with 2–3 project ideas. Feel free to use AI as a collaborative brainstorming partner using today's techniques.
As you use AI this week, notice: Are you being clearer? Reviewing plans? Checking sources? That awareness is the skill.
Questions, ideas, and "wait, what?" moments welcome.
Harper Carroll AI · AI User to AI Builder · Session 2: The Method · Cohort 1