Natural Language Processing63 [2025-1] 이재호 - Titans: Learning to Memorize at Test Time https://arxiv.org/abs/2501.00663Ali Behrouz, Peilin Zhong, and Vahab Mirrokni - Google Research Titans: Learning to Memorize at Test TimeOver more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention. While recurrent models aim to compress the data into a fixed-size memory (called hidden state), attention allows attending toarxiv.. 2025. 2. 8. [2025-1] 염제원 - RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs이 글에서는 “RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs” 논문을 간단히 정리한다. 해당 논문은 기존 RAG(Retrieval-Augmented Generation)에 별도 랭킹 모델을 사용하지 않고, 하나의 LLM만으로 질문과 문서 간의 적합도를 판단해 상위 문서를 선별(reranking)하고 답변까지 생성하는 새로운 방법을 제안한다.1. 배경과 문제 설정대형 언어 모델(LLM)은 거대한 파라미터로 다양한 질의에 답변할 수 있지만, 모든 지식을 파라미터에 내재화하기는 현실.. 2025. 2. 5. [2025-1] 김학선 - Secrets of RLHF in Large Language Models Part I: PPO https://arxiv.org/abs/2307.04964 Secrets of RLHF in Large Language Models Part I: PPOLarge language models (LLMs) have formulated a blueprint for the advancement of artificial general intelligence. Its primary objective is to function as a human-centric (helpful, honest, and harmless) assistant. Alignment with humans assumes paramount signarxiv.orgAbstractLLMs(대규모 언어 모델)의 목표가 인간 중심적인 보조자로 기능하는 것.. 2025. 2. 2. [2025-1] 김은서 - Direct Preference Optimization: Your Language Model is Secretly a Reward Model (2023) Direct Preference Optimization: Your Language Model is Secretly a... Direct Preference Optimization: Your Language Model is Secretly a Reward ModelWhile large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupervised nature of their training. Existing methods for gain.. 2025. 2. 2. 이전 1 ··· 3 4 5 6 7 8 9 ··· 16 다음