본문 바로가기
  • 책상 밖 세상을 경험할 수 있는 Playground를 제공하고, 수동적 학습에서 창조의 삶으로의 전환을 위한 새로운 라이프 스타일을 제시합니다.

Multi-Modal17

[2025-2] 백승우 - UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action UltraCUA: A Foundation Model for Computer Use Agents with Hybrid ActionMultimodal agents for computer use rely exclusively on primitive actions (click, type, scroll) that require accurate visual grounding and lengthy execution chains, leading to cascading failures and performance bottlenecks. While other agents leverage richarxiv.org 2025. 10. 29.
[2025-2] 박제우 - ANOMALYCLIP: OBJECT-AGNOSTIC PROMPT LEARNING FOR ZERO-SHOT ANOMALY DETECTION https://arxiv.org/abs/2310.18961 AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly DetectionZero-shot anomaly detection (ZSAD) requires detection models trained using auxiliary data to detect anomalies without any training sample in a target dataset. It is a crucial task when training data is not accessible due to various concerns, eg, data privaarxiv.org 0. Abstract제로샷 이상탐지(ZS.. 2025. 9. 27.
[2025-2] 백승우 - Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents Scalable Video-to-Dataset Generation for Cross-Platform Mobile AgentsRecent advancements in Large Language Models (LLMs) and Vision-Language Models (VLMs) have sparked significant interest in developing GUI visual agents. We introduce MONDAY (Mobile OS Navigation Task Dataset for Agents from YouTube), a large-scale datasetarxiv.org 2025. 8. 20.
[2025-2] 백승우 - UI-TARS: Pioneering Automated GUI Interaction with Native Agents UI-TARS: Pioneering Automated GUI Interaction with Native AgentsThis paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e.g., keyboard and mouse operations). Unlike prevailing agent frameworks that depend on heavily wrapped commercialarxiv.org 2025. 7. 30.