LLM 0.32a0 is a major backwards-compatible refactor
LLM 0.32a0 is a major backwards-compatible refactor 29th April 2026 I just released LLM 0.32a0, an alpha release of my LLM Python library and CLI tool for accessing LLMs, with some consequential changes that I’ve been working towards for quite a while. Previous versions of LLM modeled the world in terms of prompts and responses. Send the model a text prompt, get back a text response. import llm model = llm.get_model("gpt-5.5") response = model.prompt("Capital of France?") print(response.text()) This made sense when I started working on the library back in April 2023. A lot has changed since then! LLM provides an abstraction over thousands of different models via its plugin system. The original abstraction—of text input that returns text output—was no longer able to represent everything I needed it to. Over time LLM itself has grown attachments to handle image, audio, and video input, then schemas for outputting structured JSON, then tools for executing tool calls. Meanwhile LLMs kept evolving, adding reasoning support and the ability to return images and all kinds of other interesting capabilities. LLM needs to evolve to better handle the diversity of input and output types that can be processed by today’s frontier models. The 0.32a0 alpha has two key changes: model inputs can be represented as a sequence of messages, and model responses can be composed of a stream of differently typed parts. Prompts as a sequence of messages LLMs accept input as text, but ever since ChatGPT demonstrated the value of a two-way conversational interface, the most common way to prompt them has been to treat that input as a sequence of conversational turns. The first turn might look like this: user: Capital of France? assistant: (The model then gets to fill out the reply from the assistant.) But each subsequent turn needs to replay the entire conversation up to that point, as a sort of screenplay: user: Capital of France? assistant: Paris user: Germany? assistant: Most of the JSON APIs from the major vendors follow this pattern. Here’s what the above looks like using the OpenAI chat completions API, which has been widely imitated by other providers: curl https://api.openai.com/v1/chat/completions \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-5.5", "messages": [ { "role": "user", "content": "Capital of France?" }, { "role": "assistant", "content": "Paris" }, { "role": "user", "content": "Germany?" } ] }' Prior to 0.32, LLM modeled these as conversations: model = llm.get_model("gpt-5.5") conversation = model.conversation() r1 = conversation.prompt("Capital of France?") print(r1.text()) # Outputs "Paris" r2 = conversation.prompt("Germany?") print(r2.text()) # Outputs "Berlin" This worked if you were building a conversation with the model from scratch, but it didn’t provide a way to feed in a previous conversation from the start. This made tasks like building an emulation of the OpenAI chat completions API much harder than they should have been. The llm CLI tool worked around this through a custom mechanism for persisting and inflating conversations using SQLite, but that never became a stable part of the LLM API—and…

