본문 바로가기
  • 책상 밖 세상을 경험할 수 있는 Playground를 제공하고, 수동적 학습에서 창조의 삶으로의 전환을 위한 새로운 라이프 스타일을 제시합니다.

분류 전체보기344

[2025-1] 백승우 - Perplexed by Perplexity: Perplexity-Based DataPruning With Small Reference Models Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference ModelsIn this work, we investigate whether small language models can determine high-quality subsets of large-scale text datasets that improve the performance of larger language models. While existing work has shown that pruning based on the perplexity of a largearxiv.org1. Methods전체 dataset 중에서 일부 data를 사용하여, perplexity를.. 2025. 3. 3.
[2025-1] 백승우 - Data Selection for Language Models via Importance Resampling Data Selection for Language Models via Importance ResamplingSelecting a suitable pretraining dataset is crucial for both general-domain (e.g., GPT-3) and domain-specific (e.g., Codex) language models (LMs). We formalize this problem as selecting a subset of a large raw unlabeled dataset to match a desired target diarxiv.org1. MethodDSIR FrameworkLarge raw dataset에서 target data의 distribution과 일치하.. 2025. 3. 3.
[2025-1] 임재열 - Playing Atari with Deep Reinforcement Learning 해당 논문은 2013년에 Google Deepmind에서 발표한 것으로 심층 강화학습의 시작을 알린 논문으로 여겨집니다. [Playing Atari with DRL]https://arxiv.org/abs/1312.5602 Playing Atari with Deep Reinforcement LearningWe present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-.. 2025. 3. 1.
[2025-1] 이재호 - Deep Reinforcement Learning with Double Q-learning https://arxiv.org/abs/1509.06461 Hado van Hasselt, Arthur Guez, David Silver - Google DeepMind Deep Reinforcement Learning with Double Q-learningThe popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they harm performance, and whether they can generally be prevent.. 2025. 2. 28.