What it does

Semble is a code search library optimized for agents that returns exact code snippets with semantic understanding, using 98% fewer tokens than traditional grep-and-read approaches. It indexes codebases in ~250ms on CPU and answers queries in ~1.5ms, with no external services or API keys required. The library achieves 0.854 NDCG@10 retrieval quality, on par with code-specialized transformer models at a fraction of the size.

Who it's for

Agents and developers building semantic code search into Claude Code, Cursor, or other MCP-compatible tools. Useful for anyone indexing and searching large codebases efficiently or integrating AI-powered code analysis into existing workflows.

Common use cases

Semantic code search across a codebase without grepping individual files or reading full file contents.
Integrating code search into Claude Code or Cursor as an MCP server for instant agent access to relevant snippets.
Finding similar code patterns and implementations using the find-related command.
Searching documentation and configuration files alongside code with the --content flag.
Reducing token usage in LLM prompts by returning only semantically relevant code chunks.

Setup pitfalls

High risk classification: reads and writes to the filesystem — ensure it only indexes codebases you intend to expose.
Respects .gitignore and .sembleignore files to filter indexed content; review these rules carefully to avoid unintended exclusions.
MCP server mode requires separate configuration per agent (Claude Code, Cursor, etc.) — manual setup if semble install doesn't detect your agent.
When searching git URLs, repositories are cloned on demand and cached locally; ensure sufficient disk space and network access.

semble

What it does

Who it's for

Common use cases

Setup pitfalls