Empty Pockets

If you've seen one developer recounting how their AI agent deleted production, you've seen them all. They're mostly not interesting stories. It's like watching someone speeding through traffic on a motorcycle without a helmet: the eventual tragedy is sad, but it's unsurprising and not an interesting story to tell. It's not even interesting as a warning: the kind of person who speeds on a motorcycle without a helmet isn't doing so because they don't understand the danger. They've just decided it doesn't apply to them.

But the founder of PocketOS, Jer, recently shared how- whoopsie!- their AI agent deleted production. There's a lot of ingredients that go into this particular disaster, which I think makes it interesting, because the use of a poorly supervised AI agent is only one ingredient in this absolute trainwreck of a story.

PocketOS is a small company that makes software for rental companies to manage reservations. Car rentals are a big customer, but the tool is more general than that. They manage all of their infrastructure via a service called Railway. Railway is a pretty-looking GUI tool for automating your deployments and the target environments.

PocketOS also is heavily adopting Cursor wrapping around the Claude model. They've paid big bucks for the top-end model offered. Many of their components, like Railway, offer MCP services so that their LLM can do useful things. They're using the Claude LLM to automate as much as they can.

So far, this is all a pretty typical setup. They pointed Claude at their code and gave it a "routine" task, and sent it to work. It toddled through the problem and encountered a credential issue. It "decided" that the fix for this issue was to delete a storage volume and recreate it. It scanned through the code to find a file containing an API key, found it, and then sent a POST request via cURL to delete the volume in question.

Jer writes:

To execute the deletion, the agent went looking for an API token. It found one in a file completely unrelated to the task it was working on. That token had been created for one purpose: to add and remove custom domains via the Railway CLI for our services. We had no idea — and Railway's token-creation flow gave us no warning — that the same token had blanket authority across the entire Railway GraphQL API, including destructive operations like volumeDelete. Had we known a CLI token created for routine domain operations could also delete production volumes, we would never have stored it.

Wait, the tokens you create in Railway all have god-level privileges? That sounds like a terrible idea. And you were storing the token in your code? We'll come back to this in a moment, but sure, this is bad, but you can just restore from backup, right?

The volume was deleted. Because Railway stores volume-level backups in the same volume — a fact buried in their own documentation that says "wiping a volume deletes all backups" — those went with it. Our most recent recoverable backup was three months old.

Oh. Oh no.

Now, I don't think it's literally true that Railway is storing your backups literally in the same volume as the thing they're backing up. I certainly hope not. But they do apparently delete your backups when you delete the volume associated with them. Which is a choice, certainly. A bad one. And one that they documented, according to Jer. It was, in his words, "buried" in the docs.

But let's go back to the tokens for a moment. I am not a Railway user, but I checked out the tool and went through the process of creating a project token. And while no, Railway does not give you big red flags warning you "Hey, this token can do ABSOLUTELY ANYTHING", it also never gives you an opportunity to scope the token. Which, I don't know about you, but the first thing I do when I create an authentication entity is try and figure out how to control its authorizations, because I assume at the start it doesn't have any. That'd be sane.

The scoping happens when you create the token, depending on what context you're in when you do it. It's only a handful of scopes, and no fine grained permissions on API keys at all. The lowest level is "Project" which can do anything to a single environment- which does mean that even if you, like Jer's team, wanted to have a script that changed some DNS settings in production, that same key could be used to delete volumes in production. Which means you really really want to take care of that key, and you certainly don't want to leave it where some junior developer or bumbling AI agent can find it.

Jer also complains that Railway shouldn't allow an API call to take destructive actions without more protections, like forcing someone to type in the name of the thing being deleted or sending a confirmation email, or something. This, I'm more skeptical of. Most cloud providers don't offer anything like this in their APIs, at least that I've seen, because on a certain level, if you're invoking the API with the proper credentials, that's a big enough hill to climb that we can assume you've intended your action. The correct way to protect against this is properly scoped keys and keeping those keys secure and not just lying around in plain text. There's a certain aspect of understanding that you're using a potentially dangerous tool and need to take the responsibility for safety into your own hands; while a table saw can easily take some fingers off, it's perfectly safe when used correctly.

This is all bad, but how can we make it worse? Well, Jer demanded that Claude "explain itself". In a section called "The Agent's Confession", Jer highlights that the agent is able to identify the explict rules that it failed to follow.

Read that again. The agent itself enumerates the safety rules it was given and admits to violating every one. This is not me speculating about agent failure modes. This is the agent on the record, in writing.

No, it is not the agent on record. I see this kind of thing a lot when people talk about LLMs. An LLM cannot explain its reasoning. It cannot go on "the record". It cannot confess to anything. While what it plops out when asked might be interesting, it is not an explanation. The only explanation is that it's a powerful statistical model trying to create a plausible string of tokens! It's simply looking at its context window and your prompt and trying to predict what it should say. It can tell you what rules it violated not because it understands the rules or knows it violated any rules, but because those rules are in its context window. If you ask it right, it'll confess to killing JFK and framing Oswald for the crime.

Jer then tries to ensure that Cursor takes some of the blame, pointing to Cursor's "guardrails" documentation. Except, here, the documentation is actually quite explicit about what those guardrails guarantee. If you're using a first-party tool, it will prohibit unsafe operations. When using 3rd party MCPs, like Railway's, the only guardrail is that it requires human approval for every action- unless you update your allowlist for that MCP. If you put them in your allowlist, the guardrails go away. Jer argues that tools should enforce more protection against LLM behaviors, but the problem with that is people- like the PocketOS team- turn those protections off. And like a lot of safety mistakes, they can get away with it all the way up until the point where they can't.

Jer follows this by listing off a pile of other times using Cursor has caused disasters, which isn't making the argument he thinks it is: yes, Cursor is dangerous, but those dangers are well known. It makes the choice to turn Cursor loose without strict supervision seem even more foolish.

Jer writes:

For now I want this incident understood on its own terms: as a Cursor failure, a Railway failure, and a backup-architecture failure that all happened to one company in one Friday afternoon.

It's also a PocketOS failure. It's a failure to properly assess the tools and environments you chose to use for your product. A failure to read and understand the docs for vital features, like *backups*. A failure to employ even the most basic safeguards. A failure to put a second's thought into key management- even if that key was only for DNS entries, you still shouldn't chuck it in source control. A failure to have a competent backup strategy. It's worth noting that they did restore from a three month old backup, which means they were at one point taking backups outside of Railway's volume setup. That was a wise decision. That they stopped is a failure.

The first rule of disaster retrospectives is that it's never one piece that's the failure. It's never one person's fault, one tool's fault, one vendor's fault. It's a systemic failure. Railway's keys should be finer grained. But also, you shouldn't leave keys lying around. Deleting backups when you delete the volume is a terrible idea, but having only one service for backups (that's also your primary site) is a terrible idea. Claude's ability to enforce its own guardrails should be better, but LLMs are notoriously dangerous about this: you should know better, and by your own words you did.

This is not an anti-AI post, or even a "get a load of this asshole" post. It is a "understand the damn tools you're using" post. Be critical of them. Don't trust them. Ever. Especially LLMs, because the worst part of an LLM is that it takes away the one thing computers used to be good at: predictable, deterministic behavior. But not just LLMs: don't trust your cloud provider, don't trust your infrastructure manager. Dig into them and understand how they work, and if they seem to complicated to understand, than they may be too complicated to trust.

Update: As pointed out in the featured comment below, Railway did finally get a backup restored. So they got their data back. Yay? From the post, Jer remains committed to making this a Railway issue and not a PocketOS issue.

[Advertisement] Picking up NuGet is easy. Getting good at it takes time. Download our guide to learn the best practice of NuGet for the Enterprise.

Empty Pockets

Featured Comments