Multi-Modal

[2025-2] 백승우 - UI-TARS: Pioneering Automated GUI Interaction with Native Agents

BaekDaBang 2025. 7. 30. 13:29
 

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e.g., keyboard and mouse operations). Unlike prevailing agent frameworks that depend on heavily wrapped commercial

arxiv.org