What it does
Knowledge-rag is a local document retrieval system that integrates with Claude Code via 12 MCP tools. It uses hybrid search combining BM25 (keyword matching), semantic vector similarity, and cross-encoder reranking to find relevant passages in your documents. The system supports 20+ file formats including PDFs, markdown, code, and Jupyter notebooks. All processing runs locally via ONNX embeddings with optional NVIDIA GPU acceleration—no cloud APIs, no external servers, no data leaving your machine.
Who it's for
Developers building AI-assisted workflows who have local documentation, codebases, or knowledge bases they want Claude to search without uploading. This includes teams with proprietary docs, security-conscious organizations, and engineers who want fast, local-only retrieval without managing a database server.
Common use cases
- Index internal documentation (API docs, architecture guides, runbooks) and search them natively from Claude Code prompts.
- Build code-aware workflows by indexing your codebase and having Claude reference relevant files during development.
- Create AI agents that ground their responses in your local knowledge without sharing data with cloud services.
- Search meeting notes, research papers, or project notebooks to inform code generation or documentation writing.
Setup pitfalls
- ONNX model loading: Versions prior to v3.8.0 loaded the embedding model (~200MB) at startup. Upgrade to v3.8.0+ for lazy loading, and ensure v3.8.1+ for a critical hotfix that prevents silent zero-vector corruption from model load failures.
- Filesystem access: The server reads and writes to your data directory for indexing and caching. Ensure appropriate file permissions and sufficient disk space for embedding caches, which scale with document count.
- Multi-process conflicts: Without the opt-in
KNOWLEDGE_RAG_SINGLE_INSTANCEenvironment variable, multiple Claude Code windows or IDE extensions can spawn parallel instances sharing the samedata_dir, potentially causing index contention.