본문 바로가기
  • 책상 밖 세상을 경험할 수 있는 Playground를 제공하고, 수동적 학습에서 창조의 삶으로의 전환을 위한 새로운 라이프 스타일을 제시합니다.
Computer Vision

[2024-1] 염제원 - Siamese Neural Networks for One-Shot Image Recognition

by Scuttie 2024. 5. 6.

https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf

 

1-1. Upsides of this approach

  • Capable of learning generic image features useful for making predictions about unknown class distributions even when very few examples are 
    available.
  • Easily trained using standard optimization techniques on pairs sampled 
    from the source data.
  • Provide a competitive approach that does not rely upon domain-specific 
    knowledge by instead exploiting deep learning techniques.

1-2. Learning Strategy

  • Learn a neural network that can discriminate between the class-identity 
    of image pairs (standard verification task)
  • Output of model is the probability that input images are belong to the 
    same class.

1-3. Test Phase

  • Along the given one images for each novel classes and given test 
    images for evaluation, evaluate probability that given each test image 
    are belong to the same class for each novel classes.
  • Predict by class with the highest probability.

Visualization of the approach

2-1. Model Architecture

  • Siamese Network: Use two identical networks that shares weights, and measure distance between embeddings to calculate similarity between them.
  • Verification Stage: Train a ConvNet so that it successfully outputs appropriate embedding of input image.
  • Classification Stage: Given K-way 1-Shot Classifcation Problem, classify a image with highest similarity. 

Model Architecture

2-2. Prediction Vector

  • Prediction is sigmoid output of weighted L1-Distance

Prediction Vector and L1, L2 Distances

2-3. Loss Function

  • L1-Regularized BCE Loss 

Loss Function

2-4. Optimization

  • Gradient Descent with Momentum and Regularizer
  • Momentum Learning Schedule
  • Weight Intialization with Normal Distribution
  • Hyperparameter Optimization
  • Affine Distortion

3-1. Experiments

  • Used Omniglot dataset, a dataset containing 1623 characters from 50 different alphabets, 
    each one hand-drawn by a group of 20 different people.

Accuracy on Omniglot Verification Task

  • Hierarchical Bayesian Program Learning (HBPL), needs information of stroke order. Unlike HBPL, Convolutional Siamese Net does not need any domain knowledge.

Comparing best one-shot accuracy from each type of network against baselines

  • Tried experiment of genearlization to MNIST dataset, while learned only Omniglot Dataset.

Results from MNIST 10-versus-1 one-shot classifcation task