What it does
Semble is a code search library that returns exact code snippets from natural-language or code-symbol queries. Rather than grepping and reading entire files, it indexes codebases in ~250 ms and answers queries in ~1.5 ms with high retrieval accuracy (0.854 NDCG@10, comparable to transformer-based models). All computation runs on CPU with no API keys, GPUs, or external services. It integrates as an MCP server for Claude Code, Cursor, Codex, and OpenCode, or as a bash tool for CLI workflows, reducing token usage by ~98% compared to grep-and-read patterns.
Who it's for
Agent developers and engineers running AI agents that need to explore codebases efficiently. Anyone building Claude Code workflows, Cursor extensions, Codex agents, or OpenCode integrations will benefit from faster, lower-latency code discovery without the token overhead of traditional grep-and-read.
Common use cases
- Query a codebase with natural language (e.g., "How is authentication handled?") and retrieve only relevant snippets without reading full files
- Locate specific functions or symbols (e.g.,
save_pretrained) by name instead of pattern matching - Discover code semantically similar to a known location using
find_relatedwith file path and line number - Reduce token usage in agent interactions by returning only necessary context rather than entire files
- Explore unfamiliar repositories quickly without deep grep knowledge or manual file inspection
Setup pitfalls
- Requires
uv(the Python installer) to be installed before setting up the MCP server; install it first if not present - Git URLs are cloned on demand and cached for the session; first run on large repositories may incur clone latency
- Filesystem write access is required for index caching and remote repo cloning—restrict sandbox permissions if operating in restricted environments
- Configuration format varies by agent harness (Claude Code, Cursor, Codex, OpenCode); follow the specific setup instructions for your tool to ensure tool discovery