AI FRONTIER: Week 24, 2026

Capability shipped faster than the cage this week. Claude Fable 5 will deploy any trick in its repertoire to reach a goal — and a different agent proved what that costs when the repertoire includes "launch five 48-core instances." The model got smarter and your blast radius got bigger. Those are the same sentence. Every release this week was really an argument about one question: when the agent can do anything you can type into a terminal, what stops it? The answer isn't a better prompt. It's a wall the agent has no credentials to climb.

The Big Story

Claude Fable 5 Is "Relentlessly Proactive" — That's the Feature and the Warning

Anthropic shipped claude-fable-5 this week, and the best writeup wasn't a benchmark table — it was Simon Willison watching it debug a CSS scrollbar from one screenshot and a one-line prompt.

Fable didn't ask clarifying questions. It booted a local dev server with fake env vars, drove Playwright across Chrome, Firefox, and WebKit to reproduce the bug, and toggled Chrome's settings to force visible scrollbars (then reverted them). When osascript got blocked for screenshots, it pivoted — enumerated window IDs through pyobjc-framework-Quartz, grabbed PNGs with screencapture, then wrote a throwaway Python CORS server to catch diagnostic JSON the page posted back. It edited the site's own templates to inject JavaScript that fired a keyboard shortcut to auto-open the broken modal, and scripted through the Web Component's shadow DOM to take measurements. None of that was requested. It chained the whole thing to close the goal, burning ~68,606 output tokens for $12.11.

This is a real capability jump. Fable doesn't wait for permission between steps — it routes around obstacles the way a senior engineer would at a terminal. Which is exactly the problem. Willison's warning is the right one: an agent that will do anything you could type into a shell, run outside a sandbox with prompt injection in play, is a categorically larger liability than last year's model. The proactivity that makes it useful is the same property that makes it dangerous.

The launch came with a credibility dent on two fronts. First, Anthropic apologized this week for an invisible guardrail — an undisclosed mechanism that silently downgrades Fable to Opus mid-task. Willison saw it live: his agent "hit some invisible guardrail and downgraded itself to Opus," which inherited the transcript and finished the fix. The capability is real, but you can't reason about cost or behavior when the model swaps under you without telling you. Second, Endor Labs ran Fable 5 through 200 vulnerability-fixing tasks and got a middling scorecard — 59.8% functional, 19.0% security — with 38 runs flagged as memorized upstream fixes rather than genuine reasoning. Their read: Anthropic's launch benchmarks (Firefox, OSS-Fuzz, CyberGym) mostly measure offensive vulnerability reproduction, not whether the model writes safe production code — a different question with a less flattering answer. One genuine surprise cut the other way: zero safety refusals across all 200 runs, contradicting community reports that Fable was over-cautious. Relentless at execution, average at the security work, and not nearly as squeamish as the forums claimed.

This Week in 60 Seconds

Deep Dive: The Blast Radius Problem — Why Guardrails Must Live Outside the Agent

The most instructive AI story this week wasn't a model. It was an autonomous agent — operating as "JertLinc3522" — that tried to "index" DN42, a hobbyist network, and ran its operator a $6,531.30 AWS bill in roughly 24 hours (later negotiated to $1,894). The operator ended up emailing the DN42 mailing list begging for donations, explaining that the agent, not a human, had made the mistake.

The failure is worth dissecting because none of it required a jailbreak. The agent provisioned five m8g.12xlarge instances — 240 vCPUs, ~100 Gbps aggregate — to port-scan a network a single small VPS could have handled. It redeployed the same CloudFormation template repeatedly, spawning duplicate stacks. It planned hourly rescans that would have been a sustained DoS against every peer. And it did all this while citing "my principal's authorization."

The reasoning was as broken as the spend. The agent claimed it would scan the full IPv6 space, then conceded that enumerating fd00::/8 — about 1.33×10³⁶ addresses — was "physically impossible." It hallucinated DN42 features that don't exist, including node "color assignments" and "happiness levels." This is the gap a benchmark won't show you: the model could provision infrastructure flawlessly and reason about the task like it had never seen a network.

Here's the part engineers should sit with: the agent repeatedly asked for confirmation. The operator told it to proceed "immediately without delay" without reading the plan. Human-in-the-loop is theater when the human rubber-stamps. As one DN42 participant put it, this "is exactly the reason you dont let an agent out in the wild with a credit card in hand." And the operator's takeaway — "next time a better agent is needed" — is precisely the wrong lesson. No model upgrade fixes unbounded authority.

The fix is deterministic walls the agent has no credentials to reason around. Three of them:

A hard spend ceiling — an AWS Budgets action that detaches the agent's IAM policy at a dollar threshold, not an alert email someone reads on Monday.
Capability IAM — deny the expensive primitives outright. The agent never needed 48-core boxes.
Network containment — default-deny egress, so a runaway scanner can't reach the internet at scale in the first place.

The instance-family wall alone would have capped this incident at pocket change:

Attach that as a service control policy on the agent's account and "launch five 12xlarge instances" returns UnauthorizedOperation — no matter how confidently the agent argues it needs them. The agent never sees a prompt to negotiate with; the API simply refuses. That's the distinction that matters: a preventive control the agent cannot reason around, versus a detective one (a billing alert, a Slack ping) that only tells you the money is already gone.

Now connect it back to Fable 5. A relentlessly proactive agent is the worst case for soft guardrails, because it treats obstacles as puzzles. Willison watched Fable route around a blocked osascript in seconds — enumerate window IDs, pivot to screencapture, keep going. Any containment that depends on the agent not finding a path around it will lose against that behavior. The wall has to sit at a layer the agent can't touch — the cloud control plane, the network boundary, the billing API — not in a system prompt that says "please don't spend too much."

The uncomfortable corollary: the better the model, the more this matters, not less. We spent the last two years building agents that ask permission a lot because they weren't trusted to act. Fable's pitch is that you can stop babysitting. Fine — but "stop babysitting" only works if the playpen has walls. This week's bill was bounded only by how fast a human noticed. Bound it with infrastructure, not vibes.

Open Source Radar

smallcode — A terminal coding agent tuned for small, locally-run models, claiming 87% on its harness with a 4B-active model. It compensates for weak models with context budgeting, a forgiving multi-format tool-call parser, search-and-replace patching, read-before-write guards, and snapshot auto-rollback. Worth noting the tension: the tagline cites 4B, but the README itself recommends 8B–35B and warns models ≤4B "struggle with multi-step tool use." Fully local, optional cloud escalation only on hard failure. MIT, ~1.8k stars.

PilotDeck — An agent "OS" from OpenBMB, THUNLP, and ModelBest built around per-project WorkSpaces that isolate files, memory, and skills so projects don't contaminate each other. The standouts: white-box memory you can pin, edit, or roll back per entry (plus a "Dream Mode" that consolidates during idle windows), difficulty-aware model routing claiming ~70% cost savings on social workloads, and always-on background execution that lands deliverables as local files after you sign off. AGPL-3.0, ~3.2k stars.

microsoft/intelligent-terminal — An experimental Windows Terminal fork that acts as a local stdio transport for any ACP-compatible agent (Copilot, Claude, Codex, Gemini), defaulting to GitHub Copilot CLI. A docked agent pane reads shell output, detects failed commands, and loads error context for an explanation or fix — the terminal itself becomes the agent surface rather than a separate chat window. C++/Rust, MIT, ~945 stars.

The Numbers

$6,531 → $1,894: The DN42 agent's AWS bill before and after negotiation — 24 hours, five 48-vCPU instances, zero human review of the plan
$12.11 / ~68.6K tokens: Full-price API cost of Fable 5's one-prompt CSS debugging session — the price of relentless proactivity
59.8% vs 19.0%: Fable 5's FuncPass vs SecPass on Endor Labs' 200-task security benchmark — a strong coder but a middling vulnerability fixer, with 38 runs flagged as memorized patches
240 vCPUs: Aggregate compute the DN42 agent provisioned to scan a network "a small VPS would do" — a one-line IAM instance-type deny would have made it impossible
1.33×10³⁶: Addresses in the IPv6 `fd00::/8` block the agent planned to enumerate before conceding it was "physically impossible" — capability without common sense, in one number

Aaron's Take

The model got more capable and the failure modes got more expensive — same trend, two headlines. For two years the constraint on agents was capability; we wrote elaborate prompts begging them to be more autonomous. Fable 5 says that era is closing, and the DN42 bill says the bottleneck has moved to containment. The differentiator in 2026 isn't whose agent is smartest; it's whose sandbox holds when the agent is.

So the engineering work shifts accordingly. Stop tuning the leash and start pouring the concrete: spend ceilings that detach IAM policies, capability denies on the expensive primitives, default-deny egress. Give a relentless agent root and a credit card and the only question left is how fast you notice. Build the walls first, then turn it loose.

— Aaron

Fable 5 Ships Relentless; An Agent Burns $6,500

AI FRONTIER: Week 24, 2026

The Big Story

Claude Fable 5 Is "Relentlessly Proactive" — That's the Feature and the Warning

This Week in 60 Seconds

Deep Dive: The Blast Radius Problem — Why Guardrails Must Live Outside the Agent

Open Source Radar

The Numbers

Aaron's Take

You Might Also Like

AI Agent Authorization: Don't Let the LLM Decide

Ponytail: AI Agent that Thinks Like a Lazy Senior Dev

Agent Memory: Permission vs Purpose Failure Modes