AWS Machine Learning Blog·Agents·5d ago·by Shreyas Subramanian·~3 min read

Implementing programmatic tool calling on Amazon Bedrock

Artificial Intelligence Implementing programmatic tool calling on Amazon Bedrock Programmatic tool calling (PTC) is a paradigm shift in how large language models (LLMs) interact with external tools. In a traditional tool-calling workflow, each tool invocation requires a full round trip back to the model. The model calls a tool, receives the result, reasons about it, calls the next tool, and so on. For workflows that involve multiple tool calls, this creates compounding latency and token consumption because every intermediate result must pass through the model’s context window. PTC takes a different approach. Instead of orchestrating tool calls one at a time, the model writes code, typically Python, that invokes multiple tools programmatically within a sandboxed execution environment. The code can include loops, conditionals, filtering, and aggregation logic. The model is only sampled once to produce the code. The execution environment then handles tool invocations, and only the final processed result is returned to the model’s context. This dramatically reduces both latency and token usage for multi-tool workflows. PTC is particularly effective for large data processing, precise numerical calculations, multi-step process orchestration, and privacy-sensitive scenarios where raw data shouldn’t enter the model’s context. PTC originated as a provider-specific feature, but the underlying pattern—model generates code, sandbox executes it, only final output returns to context—is model-agnostic. In this post, we show three ways to implement PTC on Amazon Bedrock: a self-hosted Docker sandbox on ECS for maximum control, a managed solution using Amazon Bedrock AgentCore Code Interpreter, and an Anthropic SDK-compatible path through a proxy for teams that prefer that developer experience. Bottlenecks in traditional tool calling Consider this example: “Which engineering team members exceeded their Q3 travel budget?”With traditional tool calling (assuming no parallel function calling), the model must: - Call a tool to get the team member list – 20 people. - Call a tool to get expense records for each person – 20 separate tool calls, each returning 50–100 line items. - Call additional tools to retrieve budget thresholds. - Receive over 2,000 expense records into its context window. - Reason over the full dataset in natural language to filter, compare, and summarize. Each of those tool calls requires a full round trip through the model. The model generates a tool call, pauses, receives the result, reasons about it, generates the next tool call, and so on. This creates three compounding problems: - Token consumption: Every intermediate result, including thousands of expense line items the model will ultimately discard, passes through the context window. - Latency: Each tool invocation requires a full model inference cycle. 20 sequential tool calls means 20 inference round trips. - Accuracy: Asking a language model to filter, aggregate, and compare thousands of records in natural language is error-prone. These are operations that a few lines of Python would handle precisely. How PTC solves this PTC flips the pattern. The model writes a single Python code block that orchestrates the tool calls, processes the results, and returns only the final output. Using the same expense…

Implementing programmatic tool calling on Amazon Bedrock — image 2

#coding

read full article on AWS Machine Learning Blog →

0login to vote