전체 글354 [2025-1] 박지원-You Said That?: Synthesising Talking Faces from Audio 원문) https://arxiv.org/abs/1705.02966 You said that?We present a method for generating a video of a talking face. The method takes as inputs: (i) still images of the target face, and (ii) an audio speech segment; and outputs a video of the target face lip synched with the audio. The method runs in real timearxiv.org 1. INTRODUCTION i) 개요 및 핵심 아이디어: 대상 얼굴의 이미지와 오디오 음성 segment를 input -> 얼굴이 오디오에 맞.. 2025. 5. 20. [2025-1] 박서형 - PSGAN ( Pedestrian-Synthesis-GAN: GeneratingPedestrian Data in Real Scene and Beyond ) https://arxiv.org/abs/1804.02047 Pedestrian-Synthesis-GAN: Generating Pedestrian Data in Real Scene and BeyondState-of-the-art pedestrian detection models have achieved great success in many benchmarks. However, these models require lots of annotation information and the labeling process usually takes much time and efforts. In this paper, we propose a method to gearxiv.org 1. Introduction Pedest.. 2025. 5. 17. [2025-1] 임재열- DRÆM – A discriminatively trained reconstruction embedding for surface anomaly detection DRAEM은 2021 ICCV에서 발표된 복원-원본 이미지 쌍을 활용해 anomaly detection을 학습하는 새로운 unsupervised 모델을 제안하는 논문입니다. [DRAEM]https://arxiv.org/abs/2108.07610 DRAEM -- A discriminatively trained reconstruction embedding for surface anomaly detectionVisual surface anomaly detection aims to detect local image regions that significantly deviate from normal appearance. Recent surface anomaly detection methods rely on .. 2025. 5. 17. [2025-1] 유경석 - FlexiViT: One Model for All Patch Sizes https://arxiv.org/pdf/2212.08013https://github.com/google-research/big_vision GitHub - google-research/big_vision: Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more. - google-research/big_visiongithub.comAbstractViT의 patch size는 speed와 accuracy를 결정하는 인자이지만, patch size를 변경하는 것.. 2025. 5. 17. 이전 1 ··· 13 14 15 16 17 18 19 ··· 89 다음