Multi-Modal [2026-1] 백승우 - Agentic Reward Modeling: Verifying GUI Agent via Online Proactive Interaction BaekDaBang 2026. 3. 24. 18:24 Agentic Reward Modeling: Verifying GUI Agent via Online Proactive Interaction Reinforcement learning with verifiable rewards (RLVR) is pivotal for the continuous evolution of GUI agents, yet existing evaluation paradigms face significant limitations. Rule-based methods suffer from poor scalability and cannot handle open-ended tasks, arxiv.org