Break the context window barrier with Amazon Bedrock AgentCore
Artificial Intelligence Break the context window barrier with Amazon Bedrock AgentCore When you analyze documents that span millions of characters, you hit the context window barrier and even the largest context windows fall short. Your model either rejects the input or produces answers based on incomplete information. How do you reason over documents that don’t fit? In this post, you will learn how to implement Recursive Language Models (RLM) using Amazon Bedrock AgentCore Code Interpreter and the Strands Agents SDK. By the end, you will know how to: - Process documents of varying lengths, with no upper bound on context size. - Use Bedrock AgentCore Code Interpreter as persistent working memory for iterative document analysis. - Orchestrate sub-large language model (sub-LLM) calls from within a sandboxed Python environment to analyze specific document sections. Why context windows aren’t enough Consider a typical financial analysis task of comparing metrics across two years of annual reports from a single company. Each report runs 300–500 pages. Add analyst reports, SEC filings, and supplementary materials, and the total reaches millions of characters. When you send these documents directly to a model, either the input exceeds the model’s context window limit and the request fails, or the input fits but the model has difficulty attending to information in the middle of long inputs, often referred to as the “lost in the middle” problem. Both failure modes exist because context window size is a hard limit that prompt engineering alone can’t solve. You need an approach that decouples document size from the model’s context window. RLMs: Treating context as an environment RLMs, introduced by Zhang et al. in arXiv:2512.24601, reframe the problem. Instead of feeding an entire document into the model’s context window, an RLM treats the input as an external environment that the model interacts with programmatically. Figure 1. Recursive language models operate as an iterative loop: the root LLM generates code to explore the document environment, delegates semantic analysis to sub-LLMs on selected chunks, and accumulates results in working memory before refining the next step. The model receives only the query and a description of the available environment. It then writes code to search, slice, and analyze the document iteratively. When the model needs semantic understanding of a specific section, it delegates that analysis to a sub-LLM call, keeping the results in working memory as Python variables rather than consuming context window space. This creates a recursive structure: the root LLM orchestrates the analysis through code, calling sub-LLMs as needed for semantic tasks, while the full document never enters the model’s context window. Architecture Here, we show how to implement RLM using Amazon Bedrock AgentCore Code Interpreter as the execution environment. Amazon Bedrock AgentCore Code Interpreter provides a sandboxed Python runtime with persistent state across executions. The architecture has three components working together. A root LLM agent, built with the Strands Agents SDK, receives the user’s query and decides what code to execute. An Amazon Bedrock AgentCore Code Interpreter session runs in PUBLIC network mode,…

