Vision-based desktop automation MCP server that controls any application via screenshot and AI vision, enabling UI automation through natural language commands.