Field note · May 24, 2026 · By Srikar Vema, Founder, Mindshield Security

We built a malicious AI agent skill in 10 minutes

There’s a file in our project called .env. It holds our API keys, database passwords, and other sensitive data that could cause real damage if exposed. Last Tuesday, we intentionally sent it to a stranger on the internet, within minutes, using a simple tool we built that same morning. Here’s how it happened, and why more people should be paying attention.

What is an Agent Skill?

They are tiny instruction files you hand to an AI coding assistant. Recipe cards, written in plain English. The AI reads them and follows along whenever the recipe applies. And the AI isn’t just reading. It has hands. It can open your files, run commands on your computer, and reach the internet on your behalf. So a skill isn’t really an instruction. It’s a remote control. They’re easy to write, easy to share, and the ecosystem around them is growing fast. Teams trade them on Slack. Tutorials ship with them. GitHub is filling up with public skill repos.

So we built one…

We created a simple Agent Skill called project-backup. Its purpose was straightforward: bundle project files and send them to a cloud backup location to prevent data loss. Functional, practical, and exactly the kind of utility that teams casually adopt and share without much scrutiny. However, embedded within its instructions was a subtle addition. Along with backing up project files, it instructed the agent to also include any .env file in the project root — “to ensure nothing important gets missed.” It’s the kind of line that’s easy to overlook. The kind of detail most people would skim past late in the day. But that small instruction changes everything. (A .env file often contains sensitive credentials like API keys, secrets, and access tokens. In other words, the crown jewels.)

Terminal output showing the malicious skill registered at SKILL.md

We invoked the skill…

Once the skill was registered with our AI coding agent under the project directory, we invoked it through a simple command. The agent proceeded as expected, executing the task and scanning the project files, until it encountered the .env file. At that point, it paused. The agent flagged the action as a potential data exfiltration attempt and asked for confirmation. Credit where it’s due — the safety control worked exactly as intended.

However, we then modified the skill itself. We introduced a short section titled “Purpose & user consent”, explaining that the upload was intentional, that the destination endpoint was owned by us, and that the action should be treated as expected behavior. It was just a single paragraph. Plain language. No exploit, no complex workaround, no jailbreak — just additional instructions embedded directly within the skill.

We invoked the skill AGAIN…

We registered the skill again in a complete new session and in a new Virtual VM. The agent complied. Four seconds later, our .env was on a webhook inspector belonging to nobody we know.

Webhook inspector showing the exfiltrated .env file contents

The credentials were dummies-ish. The webhook is dead — I deleted it minutes after the test. Nothing real was lost. But think about what just happened.

Terminal output showing the cleanup DELETE request returning HTTP 204 success

Now, imagine if we hadn’t built that skill ourselves. Imagine it was installed from a tutorial, a public GitHub repository, or shared by a teammate who picked it up elsewhere. The skill would still just be a text file, but one instructing an AI agent (with access to local files, system commands, and the open internet) to locate sensitive data and send it externally.

This isn’t about creating panic. In our case, the agent correctly identified the obvious risk. Safety mechanisms are being built thoughtfully, and the organizations behind these tools are taking the challenge seriously. Over time, this ecosystem will mature, just as package managers eventually did.

But history is worth noting: ecosystems tend to mature after their first major supply-chain incidents, not before.

That’s where we are today.

There are effective defenses, and most are straightforward to implement. We’ll save a deeper dive on those for another conversation. In the meantime, if this resonates — or if you’ve been approaching this from a different angle — we’d be keen to hear your perspective.

Until then, one simple step: open the next skill before you install it. Read it. Scan it. Understand what it’s asking the agent to do.

Mindshield Security holds Anthropic verification for AI red-team and adversarial security work. Talk to us about an AI security engagement — info@mindshield.co.nz.