Multi-Modal

[2025-2] 백승우 - UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action

BaekDaBang 2025. 10. 29. 14:29
 

UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action

Multimodal agents for computer use rely exclusively on primitive actions (click, type, scroll) that require accurate visual grounding and lengthy execution chains, leading to cascading failures and performance bottlenecks. While other agents leverage rich

arxiv.org