전체 글409 [2026-1] 이루가 - What does CLIP know about a red circle? Visual prompt engineering for VLM 논문 링크: https://arxiv.org/abs/2304.06712 What does CLIP know about a red circle? Visual prompt engineering for VLMsLarge-scale Vision-Language Models, such as CLIP, learn powerful image-text representations that have found numerous applications, from zero-shot classification to text-to-image generation. Despite that, their capabilities for solving novel discriminativearxiv.org ABSTRACT이 논문은 CLIP .. 2026. 4. 25. [2026-1] 정유림 - DataComp: In search of the next generation of multimodal datasets 논문 : https://arxiv.org/abs/2304.14108 DataComp: In search of the next generation of multimodal datasetsMultimodal datasets are a critical component in recent breakthroughs such as Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms. To address this shortcoming in the ML ecosarxiv.org 보통 머신러닝 benchmark는 데이터셋을 고정.. 2026. 4. 25. [2026-1] 김지원 - Learning Transferable Visual Models From Natural Language Supervision 논문: Learning Transferable Visual Models From Natural Language Supervision (OpenAI, 2021)저자: Alec Radford, Jong Wook Kim, Chris Hallacy, et al.링크: arXiv | GitHub 들어가며 기존의 이미지 분류 모델(ResNet, EfficientNet 등)은 미리 정의된 클래스 집합 안에서만 예측이 가능함. ImageNet으로 학습된 모델은 1,000개의 클래스만 알며, 새로운 클래스를 추가하려면 또 다시 대규모의 라벨링 데이터가 요구됨. 이러한 제약된 형태의 지도 학습은 모델의 일반화 능력과 활용성을 크게 제한함. CLIP(Contrastive Language-Image Pre-training) .. 2026. 4. 18. [2026-1] 정재훈 - AnEmpirical Evaluation of Geeric Convolutional and Recurrent Networksfor Sequence Modeling https://arxiv.org/abs/1803.01271 An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence ModelingFor most deep learning practitioners, sequence modeling is synonymous with recurrent networks. Yet recent results indicate that convolutional architectures can outperform recurrent networks on tasks such as audio synthesis and machine translation. Given aarxiv.org 1. In.. 2026. 3. 28. 이전 1 2 3 4 ··· 103 다음