Multi-Modal32 [2025-2] 백승우 - GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning GUI Exploration Lab: Enhancing Screen Navigation in Agents via...With the rapid development of Large Vision Language Models, the focus of Graphical User Interface (GUI) agent tasks shifts from single-screen tasks to complex screen navigation challenges. However...openreview.net 2025. 11. 26. [2025-2] 백승우 - UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action UltraCUA: A Foundation Model for Computer Use Agents with Hybrid ActionMultimodal agents for computer use rely exclusively on primitive actions (click, type, scroll) that require accurate visual grounding and lengthy execution chains, leading to cascading failures and performance bottlenecks. While other agents leverage richarxiv.org 2025. 10. 29. [2025-2] 박제우 - ANOMALYCLIP: OBJECT-AGNOSTIC PROMPT LEARNING FOR ZERO-SHOT ANOMALY DETECTION https://arxiv.org/abs/2310.18961 AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly DetectionZero-shot anomaly detection (ZSAD) requires detection models trained using auxiliary data to detect anomalies without any training sample in a target dataset. It is a crucial task when training data is not accessible due to various concerns, eg, data privaarxiv.org 0. Abstract제로샷 이상탐지(ZS.. 2025. 9. 27. [2025-2] 백승우 - Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents Scalable Video-to-Dataset Generation for Cross-Platform Mobile AgentsRecent advancements in Large Language Models (LLMs) and Vision-Language Models (VLMs) have sparked significant interest in developing GUI visual agents. We introduce MONDAY (Mobile OS Navigation Task Dataset for Agents from YouTube), a large-scale datasetarxiv.org 2025. 8. 20. 이전 1 2 3 4 5 6 7 8 다음