Are You Using AI Agents Wrong?

AI agents can search, build, and execute entire workflows on their own. So most people set them up, step back, and let them run.

That’s a mistake.

Agentic AI, such as Claude Code, makes dozens of non-binary decisions you never see. Like, what content is relevant, what things are important, and what constitutes “good”. Every one of those invisible decisions is a place where a simple prompting technique could dramatically improve the output.

Let me show you what I mean.

The Problem with “Just Let It Run”

I’ve been tinkering with a LinkedIn content pipeline that I built in Claude Code. It searches the internet for articles relevant to my audience, then uses them as source material for LinkedIn posts. I fed it knowledge files such as audience persona documents, brand guidelines, a detailed copywriting handbook. Everything it would need to make good decisions.

The searching worked. The relevance didn’t.

Claude kept finding fear-mongering pieces with no practical value, biased tool roundups written by companies selling the tools they recommended, and articles with outdated advice dressed up as “ultimate guides for 2026.” One article recommended DALL-E 3 for image generation in 2026, when hundreds of better models exist. Another was a “best AI productivity tools” list from Zapier, which shockingly included Zapier.

The instructions were detailed. The output was wrong. So I almost did what most people do: writing even more instructions.

Then I remembered a technique I used heavily in the pre-ChatGPT era and still use inside ZimmWriter’s code to this day.

The Fix: Show, Don’t Describe

Instead of writing more rules, I showed Claude what good and bad looked like and told it why.

I set up a training session and had Claude fetch 100 articles in batches of 10. For each batch, I marked every article as relevant or not and explained my reasoning.

  • “This is a tool roundup from Zapier, so they’ll recommend their own product, making it biased. Discard it.”
  • “This is by IBM and relates to practical prompting advice with real implementation examples. Keep it.”
  • “This is from Psychology Today about AI eroding critical thinking, but it includes actionable steps to rebuild it. That’s the difference between this and the doom-and-gloom articles. Keep it.”
  • “This one’s from a personal blog. Not reputable enough. For prompting content, I only want sources like MIT, IBM, Anthropic, or peer-reviewed research. Discard it.”

And here’s where it got interesting.

Each batch Claude generated was better than the last because it was fed the feedback I’d given it up to that point. I could watch it learn in real time. By batch six or seven, it was surfacing articles I would have picked myself.

By the end, Claude generated a summary of all my feedback with the patterns it had extracted. It had things like:

  • Prompting guides only qualify from top-tier sources (MIT, IBM, Anthropic), not random bloggers writing “ultimate guides.”
  • Doom-and-gloom articles about AI must include a positive takeaway to qualify for inclusion.
  • Tool roundups from companies that sell tools are automatically rejected since they’re guaranteed to be biased.
  • Feature announcements about AI platforms should come from the platforms themselves, not third-party rehashes.
  • And negative articles about AI only get a pass if they come from top-echelon sources like HBR or MIT, otherwise they’re just noise.

I didn’t write any of those rules. Claude extracted them from my feedback.

This is Few-Shot Prompting

What I just described has a name: few-shot prompting. You show the AI multiple inputs with their expected outputs and explain the reasoning.

It’s a technique that’s been around since before ChatGPT existed, and it’s fantastic when decisions are nuanced, when you’re not asking something binary like “Is this in English?” but something more like “Would my specific audience actually care about this?”

The iterative version, where each round of feedback improves the next round, is what researchers call active learning. There’s a paper from EMNLP 2024 called CoverICL that describes almost exactly this loop: use uncertainty to identify where the AI is weakest, get human feedback on those cases, then repeat.

I built the same thing by accident.

Why This Matters Beyond My Pipeline

When people moved from chatting with AI to building with it (agents, automated workflows, Claude Code projects), most of them left their prompting techniques behind. The tools felt more autonomous. It made sense to stop.

But it’s the wrong decision.

If you’re not using prompting techniques for non-binary decisions, then you’re treating agentic AI like a slot machine. Put in the instructions, get out the result, hope it’s good. But agents make dozens of judgment calls you never see, and every one of those is a place where a few examples with clear reasoning would produce dramatically better output than another paragraph of instructions.

Think about your own setup. Wherever you’ve written long, detailed instructions and the output still isn’t right, then that’s the signal. Stop describing what you want. Show the AI several examples of what good looks like instead. Tell it why they’re good. Then let it use that feedback.

The bar is sitting on the floor. Very few people are applying prompting techniques inside their AI agents and automated workflows. Which means doing even a little of this puts you ahead of almost everyone.

Try it this week. Pick one task. Show instead of tell.