Measure LLM performance on your own equipment. Run a 25-task benchmark against any model with your own API key, get a deterministic score, tier badge, and a place on the public leaderboard.