분류 전체보기90 [2024-1] 박태호 - Visual Question Answering https://arxiv.org/abs/1505.00468 VQA: Visual Question Answering We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the arxiv.org 초록 free-form and open-ended task의 해결 방식으로 VQA를 제안한다. VQA는 image와 na.. 2024. 3. 19. [2024-1] 김경훈 - SAM (Segment Anything Model) 원본 논문 링크 : https://arxiv.org/abs/2304.02643 Segment Anything We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M license arxiv.org https://github.com/facebookresearch/segment-anything GitHub - facebook.. 2024. 3. 19. [2024-1] 염제원 - HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models https://arxiv.org/abs/2309.02706 HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models Large Language Models (LLMs) trained on massive corpora demonstrate impressive capabilities in a wide range of tasks. While there are ongoing efforts to adapt these models to languages beyond English, the attention given to their evaluation methodologies r arxiv.org Abstract LLM 모델을 비영어권에 적용하려는 시도가 .. 2024. 3. 18. [2024-1] 백승우 - VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text We present a framework for learning multimodal representations from unlabeled data using convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer (VATT) takes raw signals as inputs and extracts multimodal representations t arxiv.org 1. Abstract VATT는 트랜스포머 아키텍처를 사용해, 레이블이 없.. 2024. 3. 4. 이전 1 ··· 10 11 12 13 14 15 16 ··· 23 다음