본문 바로가기
  • 책상 밖 세상을 경험할 수 있는 Playground를 제공하고, 수동적 학습에서 창조의 삶으로의 전환을 위한 새로운 라이프 스타일을 제시합니다.

Multi-Modal7

[2025-1] 백승우 - LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One DayConversational generative AI has demonstrated remarkable promise for empowering biomedical practitioners, but current investigations focus on unimodal text. Multimodal conversational AI has seen rapid progress by leveraging billions of image-text pairs froarxiv.org1. IntroductionCurrent investigations focus on un.. 2025. 3. 4.
[2025-1] 정인아 - CoCa: Contrastive Captioners are Image-Text Foundation Models https://arxiv.org/abs/2205.01917 CoCa: Contrastive Captioners are Image-Text Foundation ModelsExploring large-scale pretrained foundation models is of significant interest in computer vision because these models can be quickly transferred to many downstream tasks. This paper presents Contrastive Captioner (CoCa), a minimalist design to pretrain anarxiv.org  Intro문제Captioning과 Contrastive Learnin.. 2025. 1. 25.
[2024-2] 박서형 - DETR , Deformable DETR https://arxiv.org/abs/2005.12872 End-to-End Object Detection with TransformersWe present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression procedure or anchor genearxiv.org DETR은 transformer를 이용하여 post processing 없이 object dete.. 2024. 12. 28.
[2024-1] 백승우 - VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text We present a framework for learning multimodal representations from unlabeled data using convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer (VATT) takes raw signals as inputs and extracts multimodal representations t arxiv.org 1. Abstract VATT는 트랜스포머 아키텍처를 사용해, 레이블이 없.. 2024. 3. 4.