Multi-Modal25 [2025-2] 백승우 - Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents Scalable Video-to-Dataset Generation for Cross-Platform Mobile AgentsRecent advancements in Large Language Models (LLMs) and Vision-Language Models (VLMs) have sparked significant interest in developing GUI visual agents. We introduce MONDAY (Mobile OS Navigation Task Dataset for Agents from YouTube), a large-scale datasetarxiv.org 2025. 8. 20. [2025-2] 백승우 - UI-TARS: Pioneering Automated GUI Interaction with Native Agents UI-TARS: Pioneering Automated GUI Interaction with Native AgentsThis paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e.g., keyboard and mouse operations). Unlike prevailing agent frameworks that depend on heavily wrapped commercialarxiv.org 2025. 7. 30. [2025-1] 임재열 - Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks는 2017년 ICML에서 발표된,모델에 독립적인 meta learning 알고리즘을 제안한 논문입니다. [MAML]https://arxiv.org/abs/1703.03400 Model-Agnostic Meta-Learning for Fast Adaptation of Deep NetworksWe propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a vari.. 2025. 7. 5. [2025-1] 백승우 - GUI Agent by Script-based Automation 2025. 7. 4. 이전 1 2 3 4 5 6 7 다음