Verify model outputs using granular feedback and test-time trajectory rewards to improve performance on complex coding and logic benchmarks.