Multi-Modal32 [2026-1] 백승우 - How Mobile World Model Guides GUI Agents? How Mobile World Model Guides GUI Agents?Recent advances in vision-language models have enabled mobile GUI agents to perceive visual interfaces and execute user instructions, but reliable prediction of action consequences remains critical for long-horizon and high-risk interactions. Existing mobiarxiv.org 2026. 5. 19. [2026-1] 백승우 - Agent+P: Guiding UI Agents via Symbolic Planning Agent+P: Guiding UI Agents via Symbolic PlanningLarge Language Model (LLM)-based UI agents show great promise for UI automation but often hallucinate in long-horizon tasks due to their lack of understanding of the global UI transition structure. To address this, we introduce AGENT+P, a novel framework tarxiv.org 2026. 5. 19. [2026-1] 정재훈 - Multimodal UnsupervisedImage-to-Image Translation https://arxiv.org/pdf/1804.04732 1. Introduction - 기존 모델의 한계기존에 존재한 CycleGAN을 비롯한 모델들은 입력과 출력이 1:1로 대응되어야 하는 한계점을 가짐 - 현실의 Multimodality 반영 불가현실을 모사하는 것에는 한가지 정답이 아닌 다양한 정답지가 있을 수 있으나 현재 모델은 결정론적인 함수의 형태가 많음. 위의 한계점을 극복한 MUNIT의 모델을 연구팀은 제안하고자 함. 2. Multimodal Unsupervised Image-to-image Translation1. Assumtionxi ∈ Xi 이고, x1 = G_1 (c, s1) x2 = G_2 (c, s2)라 하자.여기서 c는 content code s는 style code를 의.. 2026. 5. 16. [2026-1] 백승우 - Agentic Reward Modeling: Verifying GUI Agent via Online Proactive Interaction Agentic Reward Modeling: Verifying GUI Agent via Online Proactive InteractionReinforcement learning with verifiable rewards (RLVR) is pivotal for the continuous evolution of GUI agents, yet existing evaluation paradigms face significant limitations. Rule-based methods suffer from poor scalability and cannot handle open-ended tasks,arxiv.org 2026. 3. 24. 이전 1 2 3 4 ··· 8 다음