본문 바로가기

책상 밖 세상을 경험할 수 있는 Playground를 제공하고, 수동적 학습에서 창조의 삶으로의 전환을 위한 새로운 라이프 스타일을 제시합니다.

NLP91

[2025-2] 한영웅 - Investigating Data Contamination for Pre-training Language Models (Arxiv 2024) 1. Introduction문제 배경LLM의 뛰어난 성능은 모델 크기와 데이터 규모 덕분이라고 여겨짐GPT-3, PaLM, LLaMA 등 주요 LLM 연구들정말로 크기와 데이터만이 성능의 원인일까?"under-explored" : 충분히 연구되지 않은 중대한 측면데이터 오염(data contamination): 사전 훈련 코퍼스에 평가 데이터가 섞여 들어가는 현상지금까지의 LLM 성능 평가가 근본적으로 신뢰할 수 없을 가능성대부분의 사전 훈련 코퍼스가 비공개기존 접근법의 문제평가 수준 분석 (Evaluation-level Analysis):이미 훈련된 모델에 대해 사후적으로 분석평가 데이터를 깨끗한(clean) 부분과 오염된(contamination) 부분으로 나누어 성능 비교한계: 실제 훈련 과정에서의.. 2025. 8. 23.

[2025-2] 백승우 -Theory of Mind ReferenceMachine Theory of MindNeil C. Rabinowitz et al. (2018)https://arxiv.org/abs/1802.07740Theory of Mind May Have Spontaneously Emerged in Large Language ModelsMichal Kosinski (2023)https://arxiv.org/abs/2302.02083Theory of Mind for Multi-Agent Collaboration via Large Language ModelsHuao Li, Yu Quan Chong, Simon Stepputtis et al. (2024)https://arxiv.org/abs/2310.10701Theory of Mind in Large.. 2025. 8. 7.

[2025-2] 백승우 - ReTool: Reinforcement Learning for Strategic Tool Use in LLMs ReTool: Reinforcement Learning for Strategic Tool Use in LLMsWhile reasoning models (e.g., DeepSeek R1) trained with reinforcement learning (RL), excel in textual reasoning, they struggle in scenarios requiring structured problem-solving, such as geometric reasoning, concise computation, or complex equation solving-arxiv.org 2025. 7. 29.

[2025-2] 박지원 - QLORA QLORA: https://arxiv.org/abs/2305.14314 QLoRA: Efficient Finetuning of Quantized LLMsWe present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quanarxiv.org 1. 서론: 대규모 언어 모델(LLM) fine tuning 도전과 QLORA의 등.. 2025. 7. 17.

이전 1 2 3 4 5 6 7 8 ··· 23 다음

티스토리툴바