BMVC19' Classification is a Strong Baseline for Deep Metric Learning
Computer Vision paper challenge (1/30)
What problem do they solve?
Doing image retrieval in large scale.
background
Deep metric learning aims to learn a function mapping image pixels to embedding feature vectors that model the similarity between images.
Two major applications of metric learning are content-based image retrieval and face verification.
For the image retrieval tasks, the majority of current SOTA approaches are triplet-based nonparametric training
For the face verification tasks, the recent SOTA approaches have adopted classification-based parametric training
However, in this paper, Pinterest is looking for classification-based parametric training on image retrieval datasets
Model
Class Balanced Sampling
First step, they sample few instances per classes.
Why? The classification loss for metric learning usually suffer from worst approximated examples within the class. If we have multiple samples per class, it can alleviate this issue.
Layer Normalization
After sampling and ConvNet (GoogleNet pool5 layer ), they normalize the feature dimension of our embeddings to have a distribution of values centered at zero. This allows us to easily binarize embeddings via thresholding at zero.
Incorporating Layer Normalization in our training allows us to be robust against poor weight initialization of new parameters across model architectures
Normalized Softmax Loss
They remove the bias term in the last linear layer and add an L2 normalization module to the inputs and weights before softmax loss to optimize for cosine similarity.
Normalized softmax loss fits into the proxy paradigm when we view the class weight as proxy and choose the distance metric as cosine distance function.
More Detail: No Fuss Distance Metric Learning using Proxies
Other related blogs:
COLING’14: Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts
NAACL’19: Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence
NIPS’2017: Attention Is All You Need (Transformer)
NIPS’13: Distributed Representations of Words and Phrases and their Compositionality
Conference
ICCV: International Conference on Computer Vision
http://iccv2019.thecvf.com/submission/timeline
CVPR: Conference on Computer Vision and Pattern Recognition
ECCV: European Conference on Computer Vision
Top Conference Paper Challenge:
https://medium.com/@arthurlee_73761/top-conference-paper-challenge-2d7ca24115c6
My Website: