Timeahead

What it does

Haiku RAG is an agentic retrieval-augmented generation system for indexing documents and answering questions with citations. It combines LanceDB for vector storage, Pydantic AI for multi-agent orchestration, and Docling for document parsing. The system supports hybrid search (vector plus full-text), multimodal retrieval (embedding both text and figures in a shared vector space), and vision-aware QA when documents contain images. Beyond simple question answering, it provides research agents for iterative planning and synthesis, analysis agents for complex computational tasks via sandboxed Python, and conversational interfaces with multi-turn memory. Indices are local-first via embedded LanceDB, though cloud and object-storage backends are available.

Who it's for

Document analysts and researchers who need to extract structured insights from large collections. Teams building conversational document search features. Engineers integrating RAG capabilities into Claude Desktop or other AI assistants without running a separate backend.

Common use cases

Index PDFs and web documents, then search by keyword or semantic similarity with page-specific citations
Run multi-turn research workflows using agentic planning: decompose a research question into steps, execute searches, and synthesize results
Analyze document collections programmatically—count mentions, compute aggregations, compare claims across sources
Build conversational chatbots over proprietary documents with session memory and visual grounding
Expose document search tools to Claude via MCP for use within Claude Desktop or API calls

Setup pitfalls

Requires Python 3.12 or newer; existing Python 3.11 environments will not work
Needs an embedding provider configured (Ollama, OpenAI, VoyageAI, LM Studio, or vLLM); indexing will fail if none is available
Reads and writes to the filesystem for document cache and LanceDB indices; requires appropriate permissions and disk space for large document collections
Makes network calls for remote document fetching and embedding API calls; runs with high risk classification and should be sandboxed in security-sensitive environments