Vibe Coding Without Guardrails is a Disaster

5 min read

You open your editor. You type a fuzzy one-liner to Claude. "Build me a user settings page." Ninety minutes later you have code. Some of it works. You're not sure which parts. There are no tests. The git log reads "wip wip fix". You don't know why the AI picked any of its decisions, and neither does the AI, because the chat window rolled off ten prompts ago.

That's vibe coding. It feels productive. It isn't.

Why it feels good and breaks anyway

Vibe coding is a dopamine loop. Prompt, output, prompt, output. No friction. No resistance. No structure. The AI happily writes whatever you suggest, and it suggests back whatever you seemed to want. You're both improvising. Neither of you is really thinking.

This works fine for a demo or a scratch script. It falls apart the moment the work has to live somewhere. You end up with:

Code that passes no tests, because you never wrote any
Scope creep, because the brief was never pinned down
Rewrites of things the AI already wrote, because it forgot
Silent assumptions baked in, because nobody named them
A commit history that tells you nothing about what happened or why

The real problem isn't the AI. It's that the brief is doing all the work, and the brief is usually wrong.

You typed "user settings page." You meant: email change with verification, password change with confirmation, old sessions invalidated, a toast on success, and tests that cover the happy and broken paths. The AI only heard the first four words. So did you, while you were typing it.

What guardrails actually are

Guardrails aren't "better prompts." They're not "use agents" either. A guardrail is scaffolding that forces the work to become traceable and disciplined before it becomes code.

Three things have to happen:

The brief gets interrogated, not executed. Before a single line gets written, the assumptions come out. The gaps get named. The risks get listed.
The work gets broken down before it gets done. A task you can commit is a task you can think about. One big blob is a task nobody can audit.
Every change is tested and committed on its own. No batch-dumps. No "I'll fix it later." The git log becomes the record of what happened.

None of this is new. Good engineers have done it forever. The new part is this: the AI doesn't do any of it unless you force it to.

The pipeline, stage by stage

I built a tool called do-work to force these guardrails. It's a Claude Code skill. You point it at a brief and it walks the work through a pipeline. Here's what each stage does and what it protects you from.

Intake. Your brief gets saved word-for-word. Your exact words, not a paraphrase. Why this matters: a week from now when the code doesn't match your head, you can re-read what you actually asked for. You'd be shocked how often the drift starts right there.

Question. The tool grills you about the brief, one question at a time. What's the success criteria? What's out of scope? What happens when a user tries to change their email to one that's already taken? You can skip this step. You usually shouldn't. The questions feel boring, and that's the point. Boring questions surface the assumptions you didn't know you had.

Ideate. A creative review runs over the brief. It lists assumptions, risks, and connections you missed. This is the part most people skip in their heads, because thinking is hard and typing is easy.

Capture. The brief gets broken into REQ files. Each REQ is a small, discrete piece of work. One REQ, one test, one commit. If a task is too big to fit in one REQ, it's too big to commit cleanly, which means it's too big to reason about.

Verify. The tool scores the REQs against the original brief. If the coverage is under 90 percent, you don't move to code. You fix the gap first. This is the step that stops you from building the wrong thing at full speed. Every REQ's acceptance criteria gets checked for vagueness. "Should work" becomes "passes this specific test." Soft spots get auto-fixed. This is where the AI stops saving its own skin with weasel words.

Run. One REQ at a time. Write the failing test first. Build the thing. Watch the test pass. Commit. Move on. This is TDD, which isn't new either, but the AI doesn't do it unless the pipeline forces it.

Every stage is a place vibe coding skips. That's why vibe coding fails.

The commit is the artifact

Here's the piece I care about most.

When you work with AI, the natural unit feels like the conversation. You talk to it, it talks back, you get code, you move on. But conversations evaporate. The chat history rolls off. The context window clears. Six months from now, you can't answer the question "why does this code look like this?" because the reasoning lived in a transcript that doesn't exist anymore.

Commits don't evaporate. A commit has a message, a diff, a parent, and a trail. If the AI did the work and the pipeline produced a commit, then git blame can always tell you:

What was changed
Which REQ authorized it
Which brief the REQ came from
When it happened and in what order

That's not bureaucracy. That's the minimum price of trusting autonomous work. If you can't audit what the AI did, you didn't do real work. You did a magic trick.

do-work writes every commit in the same shape, with the REQ file and the original brief referenced in the message. The git log becomes the audit trail. You can walk back from any line of code to the original request that caused it. That's what guardrails actually buy you.

If you're doing real work

Vibe coding is fine for throwaway scripts and sandbox experiments. If you're shipping code that someone else will depend on, or that you'll be looking at next year, you need guardrails.

do-work is one shape of them. There are others. The point isn't the tool. The point is that AI output without structure is a mess waiting to happen, and the fix isn't "prompt better." The fix is to put the scaffolding around the work before the work starts.

Install it. Point it at a brief. Watch what happens when the brief gets interrogated before the code gets written. You'll either love it or hate it. Either way, you won't vibe code again.

github.com/rawphp/do-work

Vibe Coding Without Guardrails is a Disaster

Why it feels good and breaks anyway

What guardrails actually are

The pipeline, stage by stage

The commit is the artifact

If you're doing real work

Replies

Like this post?

Related posts

Seven days, 194 tasks, one daemon

Why Meso Tracks Two Streams, Not Streaks

Chapter 1 - Why willpower-based transformation fails, and what ten people installed instead

The Mental Architecture of Physical Transformation