TL;DR - Every "look what my AI agent did" demo quietly skips the part that matters for your security: the harness around the model runs with your keys, on your machine, as you. The good news is that the same shift making that risky - cheap open-weights and local models you can actually own - is also the fix. Own the harness, keep the blast radius small, and run the sensitive work on a model that never leaves your building. What to do: find out what your agents can actually touch, put the dangerous tools behind a wall, and stop treating a vendor's sandbox as your security plan.
By The Numbers
| Thing | Figure |
|---|---|
| What my home assistant agent runs as | Me (full host access, no container) |
| Safe default tool surface for an agent | 3 (read, write, edit) - shell access opt-in |
| Local inference that runs real work | ~14 GB of unified memory (per Pi's creator) |
| Share of real coding issues a local model handled | 60-70% (DeepSeek V4 Flash, on a laptop) |
| AI-generated pull requests one open-source maintainer now gets | 50-60 per day (was 1-2 a week) |
| Cost of a local model seat | $0 per million tokens |
The assistant that runs as me
I have an assistant. I call her Meli. She lives on my workstation, reads my messages, writes to my files, runs commands, and remembers what she learns between sessions. She runs on an open agent runtime called Hermes, from Nous Research, with an open-weights model doing the thinking. She is genuinely useful, most days.
She also runs natively on my machine, as my user account, with no container around her. When Meli runs a command, the operating system cannot tell the difference between her and me. Same permissions. Same access to my files, my keys, my history. For a while, the door was open wider than I had realised: the chat channel she listens on was set to accept anyone in a particular server, which means anyone who could post in that server could, in effect, type a command and have it run on my computer.
I set that up myself. I am a security professional, and I did the convenient thing first and the careful thing later, which is exactly what I spend my working life telling other people not to do.
I am telling you this because it is the part of the AI-agent story that the demos leave out. Everyone shows you the output. Almost nobody shows you what the agent could touch to produce it.
The harness is the product, and it runs with your keys
When people say "AI agent," they usually mean the model. The model is the least interesting part. The interesting part is the harness: the code around the model that owns your files, your tools, your memory, your shell, and the loop that keeps going until the job is done. The model just decides what to do next. The harness is what actually does it.
Whoever controls that harness controls your results, and controls what happens to your machine when the model gets something wrong. That is not a hypothetical. A model that "helpfully" runs a cleanup command, or reads a file it should not, or follows an instruction hidden inside a web page it was asked to summarise, does real damage through a real harness with real permissions. The model is rented. The harness, and the blast radius, are yours.
So the first security question of the agent era is not "which model is safest." It is "what can this thing actually touch, and as whom." Most people have never asked it, because the tools are built to make the convenient answer, everything, feel normal.
Why the best agent I know ships with no safety rails
I have been reading interviews with Mario Zechner, the developer behind Pi, a coding agent that took off in open source after Claude Code stopped fitting his workflow. Pi is unusual: it runs in what everyone else would call "YOLO mode" by default. No permission prompts. It just does the thing.
That sounds reckless. His reasoning is the opposite, and it changed how I think about this. He does not add a half-built safety prompt because he does not want to hand you a false sense of security that you then misconfigure. Instead, by labelling it plainly as dangerous, he tries to make you stop and go find, in his words, "that quiet, dark place called security awareness" and decide for yourself how to contain the thing inside your own environment.
His own answer, and mine, is boring and correct: if you do not want the machine wrecked, do not run the agent's tools directly on the machine. Put them in a container. Give the agent a sandbox to play in, not your home directory. He is blunt that the safety theatre bolted onto some mainstream tools, including asking a language model whether a shell command is dangerous before running it, is not real containment. It is a vibe. A model that can be talked into writing malware can be talked into approving it.
The uncomfortable version of that, for the rest of us: your agent is only as contained as the box you put it in, and right now most people have not put it in a box at all.
The shell tool is a ticking time bomb
When I built out my own harness, the single most important design decision was the tool surface. The safe default is three tools: read a file, write a file, edit a file. That covers most real work. The shell, the thing that can run any command on your system, is not in the default set. You have to switch it on deliberately, and when you do, you whitelist the exact commands you will allow, not the ones you will block. Blocklists are a losing game. Allowlists are a decision.
There is a sensible ladder here, from least to most permissive:
- No shell at all. Read, write, edit. Surprisingly capable, and nearly nothing can go badly wrong.
- Whitelisted shell. Only pre-approved command patterns run. Everything else is refused.
- Pre-execution checks. A guard that inspects each command and blocks dangerous patterns and access to sensitive paths (your keys, your credential files, anything under a secrets folder) before it runs.
- Scripts only. No general shell. The agent can only call audited, version-controlled scripts you wrote.
- Sandboxed. The whole thing runs in a container or a throwaway virtual machine, so the worst case is a wiped sandbox, not a wiped laptop.
You do not need all five. You need to have made a choice, on purpose, about which rung you are standing on. The failure mode I see everywhere is people standing on rung zero and not knowing it.
Local and open-weights models are not just cheap, they are a boundary
Here is the part that surprised me, and it is the reason this is a hopeful story rather than a scary one.
You do not need a frontier model for most of this. Zechner makes the case better than I can. He runs real work on models small enough to sit on an ordinary laptop. In one interview he describes a full local voice assistant, speech in, a language model, speech out, running comfortably in about 14 gigabytes of unified memory, on hardware normal people already own. He reckons a heavily optimised open-weights model on a good laptop could handle 60 to 70 percent of the actual issues he works on in Pi. Not toys. Real tasks.
For security, cheap and local is not the headline. The boundary is. A model running on hardware you control is a model your data never has to leave the building to reach. The code, the customer records, the half-finished contract, the thing you would never paste into a public chat box, can be worked on by a capable model that runs entirely on your own machine, at zero cost per token, with nothing crossing the network.
That is the data-sovereignty control that "just move to the cloud" quietly took away from us, handed back. For anyone in this country dealing with privacy obligations, or defence, or critical infrastructure, that is not a nice-to-have. It is the difference between a tool you are allowed to use and one you are not.
So the security answer and the cost answer point the same way: run the trivial, high-volume, sensitive 80 percent on a small model you own, and only reach for a big rented model when a task genuinely needs it.
The honest part: code is never free
I want to be straight, because the hype in this space is exhausting and I will not add to it.
Zechner has a line I have not been able to shake: code is never free, because the consequences of your actions eventually catch up with you, and if you think a big pile of AI-generated code is fine now, you have just delayed the punishment. He has watched people generate hundreds of thousands of lines in a week and calls the outcome exactly what you would expect.
The security version of that is simple. Complexity you cannot review is risk you cannot see. If an agent produces more code, or more configuration, or more infrastructure than any human on your team can actually read, you have not saved work. You have hidden it, and you have hidden the vulnerabilities in it too. "We have too much code to review now" is not a productivity win. It is a security incident with a delay on it.
You can see the strain already. That same maintainer went from one or two human pull requests a week to fifty or sixty machine-generated ones a day, each one padded out to the length of a novel, each one touching anywhere from ten to a thousand files, most of them solving nothing. His fix is telling: before you can send code, you have to write a short issue in your own human words explaining what you want and why. Prove you are a person. Prove you understand the problem. Only then are you trusted to contribute. That is a trust boundary, drawn by hand, against a flood of automated noise. Expect to be drawing a lot of those.
What to do right now
If you are running, building, or buying anything with an AI agent in it, here is the checklist I would run this week.
- Find out what each agent can touch, and as whom. If the answer is "everything, as me, with no container," that is your highest-priority finding, not a footnote.
- Shrink the tool surface. Default to read, write, edit. Turn the shell off unless a task truly needs it, and whitelist commands rather than trying to blocklist the bad ones.
- Put the dangerous work in a box. Run agent tools in a container or a throwaway VM, so the worst case is a wiped sandbox. Do not rely on a vendor asking a model whether a command is safe.
- Lock the front door. If your agent listens on a chat channel or an inbox, restrict it to named, authenticated people. "Anyone who can message it" means anyone who can message it.
- Keep the sensitive 80 percent local. Stand up one small open-weights model on hardware you control for the private, high-volume work. It is cheaper and it never leaves the building.
- Review what you ship. If no human can read what the agent produced, do not ship it. Volume you cannot review is risk you cannot see.
- Add model and tool availability to your risk register. The model can be revoked, repriced, or restricted. The harness is the part you own. Plan for the day the rented brain disappears.
Key Takeaways
- The harness runs as you. The model is rented; the permissions, the files, and the blast radius are yours. Secure the harness, not just the model.
- Contain before you trust. The safest default is no shell and a sandbox. A vendor's "is this command safe" prompt is not containment.
- Local is a boundary, not just a bargain. A model on hardware you own is data that never has to leave the building.
- Cheap and safe point the same way. Run the sensitive, high-volume 80 percent on a small model you control.
- Volume you cannot review is risk you cannot see. More code is not less work if nobody can read it.
FAQ
Q: I just use ChatGPT in a browser. Does any of this apply to me?
Less of it, and that is the point. A chat box in a browser has almost no reach into your machine. The risk climbs the moment you install an "agent" that can read your files, run commands, or act on your behalf. The convenience and the danger arrive together. Know which one you have.
Q: Isn't a local model just worse?
For the hardest reasoning, the frontier models are still ahead. For the bulk of ordinary work, the gap is much smaller than the price and privacy difference. A capable open-weights model on your own machine handles a large share of real tasks, at zero cost per token, with your data never leaving the room. Match the model to the job.
Q: Aren't open-weights models a security risk because anyone can tamper with them?
Same threat model as any software you download. You pin the version, you check the hash against the publisher, you get it from the source. That is a solved, manageable problem. A single vendor that can revoke or restrict your model on a government's say-so is a less manageable one.
Q: Do I really need to containerise? It's a hassle.
If the agent can only read, write, and edit files in one project folder, you can probably skip it. The moment you hand it a shell on your real machine, a container is the difference between "the agent made a mess in a sandbox" and "the agent made a mess of my laptop." The hassle is smaller than the recovery.
Q: What about at work, where I don't control the tools?
Ask the two questions anyway. What can this agent access, and where does our data go. If nobody can answer, that is your answer. Push for the sensitive workloads to run on infrastructure you control, and for agents to have the smallest access that still lets them do the job.
My Take
The last time we had a shift this big, we handed the keys to the cloud and told ourselves someone else would worry about security. It took years and a lot of breaches to claw back the habit of asking where our data actually lives and who can reach it.
We are about to do the same thing with agents, faster, and with tools that run as us on our own machines. The demos are dazzling and the defaults are dangerous, and the two things are related: the defaults are dangerous because dangerous is convenient, and convenient demos well.
The hopeful part is that the fix is already here. Small, capable, open-weights models you can run yourself mean you no longer have to choose between "useful" and "contained." You can own the harness, keep the sensitive work on your own hardware, put the sharp tools behind a wall, and still get most of the productivity everyone is shouting about. You just have to make the boring choices on purpose instead of falling into the convenient ones by accident.
Own the loop. Rent the intelligence when you need it. And know, always, what the thing that runs as you is allowed to touch.
Mathew Clark
Founder, SecureInSeconds
Currently: keeping the sharp tools in a box.
Further Reading
- Harness over frontier: running production AI agents on open-weights models - the cost and availability side of the same shift.
- Claude Fable 5 recalled: the government pulled it overnight - why "the model you rely on could just disappear" stopped being hypothetical.
- Claude Mythos preview: the security angle - early warning signs on frontier model concentration risk.
- The Microsoft Copilot security disaster nobody saw coming - what happens when vendor convenience quietly becomes your architecture.
- Free security tools that actually work - open-source options for teams that want to own their stack.
- Zero trust for normal people - the same "what can this actually reach" thinking, applied to identity and access.



