본문 바로가기
  • 책상 밖 세상을 경험할 수 있는 Playground를 제공하고, 수동적 학습에서 창조의 삶으로의 전환을 위한 새로운 라이프 스타일을 제시합니다.

Computer Vision107

[2025-1] 김유현 - Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets https://arxiv.org/abs/1606.03657 InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial NetsThis paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner. InfoGAN is a generative adversarial network that also maximizes tarxiv.org.. 2025. 1. 24.
[2025-1] 주서영 - Towards Robust Vision Transformer Towards Robust Vision Transformer Towards Robust Vision TransformerRecent advances on Vision Transformer (ViT) and its improved variants have shown that self-attention-based networks surpass traditional Convolutional Neural Networks (CNNs) in most vision tasks. However, existing ViTs focus on the standard accuracy and comarxiv.orgCVPR 20222025.01.18 기준 인용 횟수: 226회Introduction기존의 Vision Transform.. 2025. 1. 18.
[2025-1] 김유현 - Unsupervised Representation Learning With Deep Convolutional Generative Adversarial Networks https://arxiv.org/abs/1511.06434 Unsupervised Representation Learning with Deep Convolutional Generative Adversarial NetworksIn recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap betweenarxiv.org 0. Abstr.. 2025. 1. 18.
[2025-1] 전연주 - LDM: High-Resolution Image Synthesis with Latent Diffusion Models 논문 링크: 2112.107521. AbstractDiffusion Model (DM)은 고품질 이미지 생성에 탁월하지만, pixel space에서 직접 학습할 때 막대한 계산량과 시간이 소요됨본 논문에서는 먼저 강력한 Autoencoder를 사용해 이미지를 latent space로 압축한 뒤, 해당 공간에서 Diffusion Model을 학습하는 방안(Latent Diffusion Model, LDM)을 제안이 방식은 기존 pixel space 기반 Diffusion 대비 학습 비용과 추론(샘플 생성) 비용을 크게 절감함과 동시에, 다양한 조건(예: 텍스트, 세그멘테이션 맵 등)을 유연하게 적용할 수 있음Inpainting, Super-Resolution, Text-to-Image Synthesis 등.. 2025. 1. 17.