NLP94 [2026-1] 정재훈 - AnEmpirical Evaluation of Geeric Convolutional and Recurrent Networksfor Sequence Modeling 더보기https://arxiv.org/abs/1803.01271 An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence ModelingFor most deep learning practitioners, sequence modeling is synonymous with recurrent networks. Yet recent results indicate that convolutional architectures can outperform recurrent networks on tasks such as audio synthesis and machine translation. Given aarxiv.org 0.BE.. 2026. 2. 7. [2026-1] 백승우 - Self-Improving Pretraining:using post-trained models to pretrain better models Self-Improving Pretraining: using post-trained models to pretrain better modelsEnsuring safety, factuality and overall quality in the generations of large language models is a critical challenge, especially as these models are increasingly deployed in real-world applications. The prevailing approach to addressing these issues involvearxiv.org 2026. 2. 4. [2026-1] 백승우 - UICOMPASS: UI Map Guided Mobile Task Automation via Adaptive Action Generation UICOMPASS: UI Map Guided Mobile Task Automation via Adaptive Action GenerationYuanzhang Lin, Zhe Zhang, He Rui, Qingao Dong, Mingyi Zhou, Jing Zhang, Xiang Gao, Hailong Sun. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.aclanthology.org 2026. 1. 28. [2025-2] 최민서 - SimPO: Simple Preference Optimization with a Reference-Free Reward [논문링크] https://arxiv.org/abs/2405.14734 SimPO: Simple Preference Optimization with a Reference-Free RewardDirect Preference Optimization (DPO) is a widely used offline preference optimization algorithm that reparameterizes reward functions in reinforcement learning from human feedback (RLHF) to enhance simplicity and training stability. In this work, we proposarxiv.org DPO에 대해 잘 모른다면 논문을 이해하는데 힘.. 2025. 12. 31. 이전 1 2 3 4 ··· 24 다음