Multi-Modal

[2025-2] 백승우 - Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents

BaekDaBang 2025. 8. 20. 20:30
 

Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents

Recent advancements in Large Language Models (LLMs) and Vision-Language Models (VLMs) have sparked significant interest in developing GUI visual agents. We introduce MONDAY (Mobile OS Navigation Task Dataset for Agents from YouTube), a large-scale dataset

arxiv.org