Multi-Modal13 [2025-1] 백승우 - GUI Agent by Script-based Automation 2025. 7. 4. [2025-1]박제우 - Scaling Language-Image Pre-training via Masking https://arxiv.org/abs/2212.00794 Scaling Language-Image Pre-training via MaskingWe present Fast Language-Image Pre-training (FLIP), a simple and more efficient method for training CLIP. Our method randomly masks out and removes a large portion of image patches during training. Masking allows us to learn from more image-text pairs givearxiv.org https://blog.outta.ai/284 본 논문은 지난번 리뷰했던 자연어 지도 학습 모.. 2025. 5. 17. [2025-1] 박제우 - CLIP : Learning Transferable Visual Models From Natural Language Supervision https://arxiv.org/abs/2103.00020 Learning Transferable Visual Models From Natural Language SupervisionState-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text .. 2025. 5. 6. [2025-1] 유경석 - Bag of Tricks for Developing Diabetic Retinopathy Analysis Framework to Overcome Data Scarcity https://arxiv.org/pdf/2210.09558 AbstractDR screening : UW-OCTA를 사용하여 초기 DR 진단 가능Data collection의 어려움과 public dataset 부재로 Deep Learning based DR 분석 시스템 구축에 어려움 (Sub-par performance에 그침) → Data가 적더라도 Robust한 모델 구축 필요함DR analysis를 위한 empirical study 진행 : Lesion segmentation, Quality assessment, DR grading → DR analysis challenge에서 1st place 달성 각 model별로 robust training scheme 적용: Ensemble learnin.. 2025. 5. 2. 이전 1 2 3 4 다음