LLM1 [2025-2] 박지원 - Benchmark Inflation: Revealing LLM PerformanceGaps Using Retro-Holdouts 원문) https://openreview.net/forum?id=WdA5H9ARaa#discussion Benchmark Inflation: Revealing LLM Performance Gaps Using...Public benchmarks are compromised, as the training data for many Large Language Models (LLMs) is contaminated with test data, suggesting a performance gap between benchmark scores and actual...openreview.net Intro- LLM의 벤치마크 데이터셋에 대한 점수 인플레이션 문제에 대해, 공개 벤치마크 데이터셋이 train data에 오.. 2025. 9. 4. 이전 1 다음