Multi-Modal [2025-2] 백승우 - Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents BaekDaBang 2025. 8. 20. 20:30 Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents Recent advancements in Large Language Models (LLMs) and Vision-Language Models (VLMs) have sparked significant interest in developing GUI visual agents. We introduce MONDAY (Mobile OS Navigation Task Dataset for Agents from YouTube), a large-scale dataset arxiv.org