What it does
Semble is a code search library optimized for agents that returns exact code snippets with semantic understanding, using 98% fewer tokens than traditional grep-and-read approaches. It indexes codebases in ~250ms on CPU and answers queries in ~1.5ms, with no external services or API keys required. The library achieves 0.854 NDCG@10 retrieval quality, on par with code-specialized transformer models at a fraction of the size.
Who it's for
Agents and developers building semantic code search into Claude Code, Cursor, or other MCP-compatible tools. Useful for anyone indexing and searching large codebases efficiently or integrating AI-powered code analysis into existing workflows.
Common use cases
- Semantic code search across a codebase without grepping individual files or reading full file contents.
- Integrating code search into Claude Code or Cursor as an MCP server for instant agent access to relevant snippets.
- Finding similar code patterns and implementations using the find-related command.
- Searching documentation and configuration files alongside code with the
--contentflag. - Reducing token usage in LLM prompts by returning only semantically relevant code chunks.
Setup pitfalls
- High risk classification: reads and writes to the filesystem — ensure it only indexes codebases you intend to expose.
- Respects
.gitignoreand.sembleignorefiles to filter indexed content; review these rules carefully to avoid unintended exclusions. - MCP server mode requires separate configuration per agent (Claude Code, Cursor, etc.) — manual setup if
semble installdoesn't detect your agent. - When searching git URLs, repositories are cloned on demand and cached locally; ensure sufficient disk space and network access.