본문 바로가기
  • 책상 밖 세상을 경험할 수 있는 Playground를 제공하고, 수동적 학습에서 창조의 삶으로의 전환을 위한 새로운 라이프 스타일을 제시합니다.

Multi-Modal21

[2025-2] 백승우 - Toward Autonomous UI Exploration: The UIExplorer Benchmark https://arxiv.org/abs/2506.17779 2025. 12. 3.
[2025-2] 백승우 - GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning GUI Exploration Lab: Enhancing Screen Navigation in Agents via...With the rapid development of Large Vision Language Models, the focus of Graphical User Interface (GUI) agent tasks shifts from single-screen tasks to complex screen navigation challenges. However...openreview.net 2025. 11. 26.
[2025-2] 백승우 - UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action UltraCUA: A Foundation Model for Computer Use Agents with Hybrid ActionMultimodal agents for computer use rely exclusively on primitive actions (click, type, scroll) that require accurate visual grounding and lengthy execution chains, leading to cascading failures and performance bottlenecks. While other agents leverage richarxiv.org 2025. 10. 29.
[2025-2] 박제우 - ANOMALYCLIP: OBJECT-AGNOSTIC PROMPT LEARNING FOR ZERO-SHOT ANOMALY DETECTION https://arxiv.org/abs/2310.18961 AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly DetectionZero-shot anomaly detection (ZSAD) requires detection models trained using auxiliary data to detect anomalies without any training sample in a target dataset. It is a crucial task when training data is not accessible due to various concerns, eg, data privaarxiv.org 0. Abstract제로샷 이상탐지(ZS.. 2025. 9. 27.