DeepSeek-V4: a million-token context that agents can actually use
DeepSeek-V4: a million-token context that agents can actually use Focusing on long running agentic workloads. Running a frontier open model as an agent today breaks in predictable ways. The model stops. You reprompt. The trace blows past the context budget, or the KV cache fills the GPU, or tool-call round trips degrade halfway through a long task. V4 is built to fix these known failures, and point the way for the community to follow. This post covers three things: what the architecture does differently to make long-context inference cheap, the agent-specific post-training decisions that compound on top of it, and some takeaways from the paper that help reason about these changes. The KV cache problem for agents A 1M context window is just capacity, not performance. Whether you can use it depends on the cost of every forward pass at…