Capture screen regions, perform OCR, and chat with a multimodal AI model through a persistent desktop overlay.