Multi-Modal [2025-2] 백승우 - UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action BaekDaBang 2025. 10. 29. 14:29 UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action Multimodal agents for computer use rely exclusively on primitive actions (click, type, scroll) that require accurate visual grounding and lengthy execution chains, leading to cascading failures and performance bottlenecks. While other agents leverage rich arxiv.org