A local MCP server that packages LLM evaluation gates as reusable CI/CD primitives, enabling AI agents to run datasets against models, score responses, and enforce quality thresholds.