본문 바로가기
  • 책상 밖 세상을 경험할 수 있는 Playground를 제공하고, 수동적 학습에서 창조의 삶으로의 전환을 위한 새로운 라이프 스타일을 제시합니다.
Multi-Modal

[2026-1] 백승우 - Agentic Reward Modeling: Verifying GUI Agent via Online Proactive Interaction

by BaekDaBang 2026. 3. 24.
 

Agentic Reward Modeling: Verifying GUI Agent via Online Proactive Interaction

Reinforcement learning with verifiable rewards (RLVR) is pivotal for the continuous evolution of GUI agents, yet existing evaluation paradigms face significant limitations. Rule-based methods suffer from poor scalability and cannot handle open-ended tasks,

arxiv.org