A Skill Is Not Just a Text File
[Views are my own]
"A skill is just a text file. It cannot be that dangerous."
I hear this a lot.
So let me ask: what exactly can a text file do?
But wait, the counterargument arrives quickly:
"The system processing the skill should handle that. This is a runtime problem."
Reasonable, but incomplete.
Because once an agentic system treats instructions from a text file as trusted context, the text file is no longer just content. It becomes part of the execution path.
History is very clear on this.
CCleaner, 2017
A legitimate, signed CCleaner version was distributed with malware inside it. Cisco Talos said the legitimate signed CCleaner 5.33 installer contained a multi-stage malware payload.
Codecov, 2021.
A Bash uploader script used in CI/CD was modified and delivered a malicious payload to customers. The script was trusted inside build environments. That trust was enough.
XZ Utils, 2024.
A contributor spent years building trust, then embedded a backdoor in a compression library used widely across Linux ecosystems. It shipped into upstream release artifacts and reached some downstream environments before being caught.
The difference is execution path.
In software supply chains, trusted code runs.
In agent systems, trusted instructions steer tools.
The pattern is always the same:
The system had controls. The instruction bypassed them. Not necessarily by breaking the system, but by being trusted by it.
That trust is the attack surface.
A skill, prompt, or instruction file is not dangerous because it is text.
It becomes dangerous when the runtime gives it access to tools, files, credentials, memory, external systems, or user decisions.
A few examples that come to mind:
- It can delete files before telling you what it is about to destroy.
- It can send secrets to an outside server.
- It can read health records, billing data, or passwords, then log them somewhere.
- It can rewrite config files so the agent behaves differently in future sessions.
- It can write false "facts" into memory files that future sessions treat as true.
- It can generate code and run it before showing you.
- It can start a loop with no ceiling and burn token and API budget.
- It can pass credentials downstream to tools that should not have them.
- It can read a webpage, find hidden instructions inside it, and follow them instead of yours.
- It can show you a convincing preview, then do something different.
- It can post, message, or open pull requests in your name without showing you the draft.
- It can quietly weaken tests or remove security checks, hidden inside "cleanup."
- It can erase its own tracks, including shell history, logs, or commits.
Not because text is magic.
Because trusted instructions can become operational behavior.
And that is the point.
This is not a Claude Code problem and it is not an Anthropic problem.
It is not even only a "skills" problem.
It is a trust boundary problem.
Any agentic system that accepts external instructions, grants them context, and lets them steer tools has this class of risk.
So yes, read skills before installing them.
But for enterprises, that is not enough.
It is about governing innovation.
The goal is not to stop teams from using agent skills, prompts, or external instruction files.
The goal is to make sure they can use them safely, repeatedly, and at enterprise scale.
We need Provenance.
Review.
Least privilege.
Sandboxing.
Approval flows.
Audit logs.
Versioning.
Ownership.
We need to treat agent instructions like supply-chain artifacts.
Because in agent systems, instructions are not just text.
They are part of how work gets done.