Reliability and audit-evidence testing for LLM agents - wrap any agent, assert behavior, measure determinism, check grounding, emit an audit-grade report.