We need re-learn what AI agent development tools are in 2026
This article was written by Andrew Green, technical writer and industry analyst. We pay Andrew, but he refuses to write anything else but his own opinion. The big boys entered the market, OpenClaw appropriated the MCP security strategy, and everyone started vibe coding but only if they already knew how to code. It really feels like 2025 was the year of agents, mainly because the industry came to a consensus about how we expect an agent to behave. That and because we found we can bypass context window sizes by spawning sub-agents. When we first wrote the Enterprise AI agent development tools, we focused a lot on the building blocks of writing agents, such as RAG, memory, tools, and evaluations. One year later, all these capabilities appear to have been commoditized to some degree. We now expect most vendors to allow customers to use a document as context and grounding, or to integrate with Promptfoo (now acquired by OpenAI) for evaluations. Granted, there are some niche things, like reranking RAG documents based on semantic similarity, which are still differentiators. However, a lot of agent work today doesn’t even need RAG. Even things like web search, which you had to orchestrate explicitly, are now natively available with most vanilla LLM services like ChatGPT and Claude. MCP had a meteoric rise and then fizzled out. I appreciated Anthropic’s attempts at adding security features such as auth around MCP, but then OpenClaw threw all of that out the window. OpenClaw is not in the cards for any sensible organization considering its tendency to delete data and expose ALL the vulnerabilities. With this in mind, we need a rather drastic update on our framework for evaluating AI agent builders. So, I have a set of questions that I want to answer myself to understand how a 2026 version of the report will look. - What got commoditized or natively implemented in vanilla models or LLM services? - What stands from last year? - What is still relevant from last year but underappreciated? - What should change in our evaluation today? - What did the vendors do over the past year? - What about coding agents? What got commoditized or natively implemented in vanilla models or LLM services? Today, even basic LLM-as-a-service products come close to being agents. I mentioned web search above, but some of the others include: - Claude’s and ChatGPT’s Projects, which allow users to upload docs, code, and files to create themed collections that can be referenced multiple times. - Claude Connectors and ChatGPT apps, which connect to apps, files, and services. These connectors are built by third parties. - Native Skills.md, which are glorified prompt templates, but they still replace some additional work that would have been required in agent builders last year. - Honorable mentions to Claude Code and Codex which are not really part of the scope but need to be acknowledged This means all these capabilities are now table stakes, and we expect every agent builder to…

