I’ve been obsessing over agent-native tools lately. Tools built for an agent to call, not a human to click through. Two of them have been stuck in my head because of how differently they’ve gone for me.
Supadata is a scraping API built for agents. You give it a URL, it gives you clean markdown. I dropped it into my setup and my research work got noticeably better overnight.
Pencil is a visual design tool. You can ask an agent to manipulate a .pen file through structured operations. It’s thoughtfully built. There’s a get_screenshot tool so the agent can visually check its own work. Every decision in the architecture is smart.
Supadata is a homerun. I can’t get Pencil to work.
Both are properly agent-native. Neither has a human in the loop during their actual operation. Both are built by people who know what they’re doing. The difference in how useful they are right now is enormous, and I don’t think it’s about the tools at all.
Of course, Supadata is cashing a capability check agents have already mastered. Text is the thing LLMs do better than anything else, and giving an agent cleaner text makes its downstream work better. Pencil is asking agents to do visual design through a structured API, which is a thing they’re still mediocre at. You can engineer the tool perfectly and it won’t matter. The tool is running ahead of the agent, and the agent is doing the limiting.
Invisible tools and delegation tools
There are at least two categories of agent-native tools that work well right now, and they work for different reasons.
The first category is invisible tools. Supadata is one. So is Firecrawl, most scraping and search APIs, most MCP servers for databases and file systems. I don’t see Supadata’s result. I only care that the agent’s output got better. With agent-for-agent tools I don’t need to ask, I just notice, over time, that the agent is becoming more capable.
The second category is delegation tools. A delegation tool is one where I hand a task to the agent and walk away, trusting the tool to complete it. My top-of-mind example is StackCLI, which is a thing I’ve been building and just put into public alpha. It schedules Substack Notes from a terminal or from an AI agent. The tool removes work from my plate by accepting delegation. It still has a menubar showing the queue to help me keep track. You can imagine the same shape for calendar scheduling, for filing PRs, for anything where you’d hand off a task that touches your own account.
There’s a sharper way to say this: a delegation tool is one where the agent is the UI. I don’t click buttons in an app. I talk to Claude, and Claude uses it on my behalf. The agent isn’t a thing that sits next to the tool, it’s the interface layer of the tool. My input method is natural language.
In practice, delegation isn’t binary. Most of these tools live somewhere between assisted and fully autonomous, depending on how much you’ve come to trust them. And the line gets porous fast. I could wire StackCLI into a Claude Code loop running unattended, or hand it to an autonomous harness like OpenClaw, and it would stop being a delegation tool overnight. Nothing about the tool would change. The human would just be out of the loop. From StackCLI’s perspective, delegation and invisible aren’t really separate categories. They’re descriptions of who’s upstream of the call.
When there’s a human in the loop, the two categories still feel genuinely different. Invisible tools remove work by never involving me at all. Delegation tools remove work by involving me only at the point of intent, then handling everything after that. Different relationships, same net effect.
Pencil isn’t really either, yet. It’s trying to be a delegation tool, but the work it’s being asked to do needs fast visual iteration. Manipulate, look, adjust, manipulate again. Current agents can’t do that reliably, so I end up stepping in for every step, which defeats the delegation. Pencil is a well-built bet on the capability catching up soon.
Gemini Live just got vision, which suggests the bet is about to pay off.
The curator job
One more thing worth noticing about both kinds of tool: neither of them gets discovered by the agent.
I found Supadata because I watched Claude struggle with JavaScript-heavy sites, went looking for something better, and installed it. I started building StackCLI because I spend my days in a terminal and Obsidian, and I didn’t want to leave that to open a web page every time I had a Note to post.
A key job for the human operating an agent right now is curating the toolkit. Watching for friction, finding better tools, installing them, telling the agent they exist. Agents pick pretty well from whatever is loaded. Give Claude both Supadata and WebFetch and it’ll reach for the right one. But Claude can’t go fetch Supadata on its own.
This is the current shape, not the permanent one. The better end state is probably one where the agent notices it needs an email address, finds Agentmail, and installs it without asking. We’re not there yet. Discovery is starting to work, but acquisition still belongs to humans. The trajectory is obvious though, and tools being built for this moment need to survive the shift. Which means being discoverable by LLMs as well as by humans.
Which has a slightly strange implication for anyone building one of these tools: you can’t sell to agents. You sell to the humans whose agents are currently failing at a task, and the signal that brings those humans to you is almost always their baseline tool being bad at something they care about. The failure of whatever they’ve got is your marketing channel. Firecrawl doesn’t beat WebFetch in a head-to-head comparison on a pricing page. It wins because someone watched Claude fail three times on a JavaScript site and went looking. The clumsiness of the default is what sends buyers in your direction.
Agent-first
Which brings me to the question I actually wanted to think about: if you’re building a new tool right now, do you build it for humans first or for agents first?
We’ve been through this kind of shift before. In 2011 the question was web-first or mobile-first, and the answer turned out to be mobile-first, not because desktop stopped mattering but because mobile was the tighter constraint. Designing for the tighter constraint forced clarity that everything else benefitted from. You couldn’t fit a desktop layout onto a phone. You had to strip it down to essentials. The sites that started on mobile scaled up to desktop gracefully. The sites that started on desktop squeezed down to mobile painfully. The tighter constraint won because it forced better decisions.
And worth being honest here: agent-first isn’t a new technology any more than mobile-first was. Supadata and Firecrawl aren’t using exotic new protocols. They’re well-designed REST APIs that noticed who their dominant caller was going to be and optimised for them. Markdown output tuned for LLM context windows. Minimal config. Sensible defaults. That’s all agent-first means at the API level. You don’t need new technology. You need to take the new caller seriously.
Agent-first is the same move, except the constraint is even tighter. Agents can stumble through a human interface using computer use, but they’re slow, unreliable, and expensive at it. Give them a structured interface and they’re much better. They need typed operations. They need clear descriptions of what each tool does. They need predictable inputs and outputs. If you design a tool for an agent, you’re forced to make the operations themselves legible. The data model, the verbs, the state, all of it, because there is nothing else for the agent to fall back on.
Once you’ve done that, humans can use your tool fine through a thin layer over the same core. The reverse is not true. If you design for humans first, with interactive editors and wizards and context menus and little affordances that work because humans can see and improvise, you end up with an agent interface that feels bolted on because it is bolted on. The operations the agent needs don’t exist yet. You have to invent them on top of a data model designed for something else. That’s the shape of most “we added an MCP server to our SaaS” announcements I’ve seen. The tools work, technically, but they feel translated, and the agent experience has sharp edges that shouldn’t be there.
PostHog is what it looks like when it’s done right. Their MCP server ships with over a hundred tools. No human would ever memorise that many verbs, but an agent matches the task to the description and picks the right one. Shipping an interface like that only makes sense once you’ve accepted that the caller isn’t going to be human.
So my honest answer to the build question: we’re on agent-first, and it’s the mobile-first moment of this decade. Not because humans stop mattering. The human surface is still where trust gets built, still where delegation gets approved, still where curators evaluate what they’re installing. But agents are the tightest caller in the system now, and designing for them is how you end up with primitives everyone else can use.
Start with the agent. Make the operations crisp. Make the data model something an LLM can reason about without needing a UI. Then layer whatever human surface the work actually requires on top. For invisible tools, that’s basically just docs. For delegation tools, it’s the scaffolding that lets the human trust the handoff. Match the surface to the work, and don’t build more than that.
The teams that get this right are going to feel, a few years from now, the way the teams that went mobile-first in 2012 felt by 2015. The others will be retrofitting, and the retrofits will feel like retrofits.
StackCLI
If any of this was useful and you publish on Substack, StackCLI is live. It’s in public alpha. You can schedule Notes from a terminal, or hand the scheduling off to an agent through MCP. A small menubar app handles the posting. Feedback welcome if you try it.