How snag works

Snag applies FMEA — Failure Mode and Effects Analysis, the discipline behind failure analysis in aerospace and manufacturing — to a change. Instead of reading a diff line by line, it asks a structural question: what is supposed to happen here, and every way it could silently not?

Why a review misses these

Line-by-line review is good at what's on the screen: this null check, that off-by-one, this awkward name. But the failures that hurt most are about what isn't there — the unhandled disconnect, the retry that runs twice, the log that's lost on a hard kill. They don't live on any single line, so a reader scanning lines has no place to notice them.

Snag reframes the change as a Happy Path → Failure Map. The blue spine is what should happen, step by step. Everything branching off it is a way it can go wrong — enumerated deliberately, the way an FMEA walks each step and asks how it fails. The result is a map you can read, argue with, and commit.

Anatomy of a map

Happy path — the intended flow Failure mode — how a step silently goes wrong

Each map is a failure-map/v1 JSON document. It renders as an interactive graph in the viewer, and it lives in git next to the code it describes.

A worked example

Here's a real map from the demo — Runner Protocol v1, generated from a protocol spec. The spine is the intended lifecycle; below it are the failure modes snag surfaced off that spine — the kind a spec review nods past.

Runner Protocol v1

happy path → failure map

Runner boots
join (boot token)
Validate boot token
Issue session token; verdict: continue
job:assign (def, steps, env)
job:ack (CP stamps ts)
job:started (assigned -> running)
log:chunk seq 1..N, acked
job:finished (after all acked)
Seal: logs contiguous 1..N?
Job success
Channel disconnect -> rejoin (session token)
job:cancel (user / pipeline / timeout)

15 failure modes branching off that spine

↘ Boot token reused / leaked

↘ Boot-token TTL window too wide

↘ Boot timeout (~60s) -- never joins

↘ Max boot attempts (3) exceeded

↘ protocol_version mismatch

↘ Duplicate job:started not idempotent

↘ Re-sent job:assign -> double execution

↘ Grace period (30s) expiry -> runner_lost

↘ Resumes work after Job already terminal

↘ Lost chunk never resent -> seal fails 1..N

↘ Chunk-name collision -> log corruption

↘ Integrity checked only at seal, not on stream

↘ SIGKILL before flush -> lost final logs

↘ Process group not fully killed -> orphans

↘ Cancel-drain deadline (10s) -> force destroy

Each one is a node you can open in the interactive viewer — with the path it branches from and why it matters. None of them is a single line you'd circle in review.

What makes it different

Design altitude, not just diffs

Point it at a PRD or spec — not only a diff. Snag finds failure modes before the code exists, which a diff-only reviewer structurally can't.

Bring your own model

Your key, your provider — Anthropic, OpenAI, or OpenRouter. Nothing is proxied through a Snag service; there are no Snag servers in the loop.

A committed, diffable artifact

Maps are JSON in git. They travel with the change in review, and diffs across snapshots show how a change's failure surface moves over time.

Get started Open the example map →

← Back to the demo dave@teeang.net @snag-run/cli on npm