https://www.pinterest.com/pin/850547079609025256/?nic_v2=1a3dFhpjD

BMVC19' Classification is a Strong Baseline for Deep Metric Learning

Arthur Lee

2 min readSep 13, 2020

Computer Vision paper challenge (1/30)

paper link

What problem do they solve?

Doing image retrieval in large scale.

background

Deep metric learning aims to learn a function mapping image pixels to embedding feature vectors that model the similarity between images.

Two major applications of metric learning are content-based image retrieval and face verification.

For the image retrieval tasks, the majority of current SOTA approaches are triplet-based nonparametric training

For the face verification tasks, the recent SOTA approaches have adopted classification-based parametric training

However, in this paper, Pinterest is looking for classification-based parametric training on image retrieval datasets

Model

Class Balanced Sampling

First step, they sample few instances per classes.

Why? The classification loss for metric learning usually suffer from worst approximated examples within the class. If we have multiple samples per class, it can alleviate this issue.

Layer Normalization

After sampling and ConvNet (GoogleNet pool5 layer ), they normalize the feature dimension of our embeddings to have a distribution of values centered at zero. This allows us to easily binarize embeddings via thresholding at zero.

Incorporating Layer Normalization in our training allows us to be robust against poor weight initialization of new parameters across model architectures

Normalized Softmax Loss

They remove the bias term in the last linear layer and add an L2 normalization module to the inputs and weights before softmax loss to optimize for cosine similarity.

Normalized softmax loss fits into the proxy paradigm when we view the class weight as proxy and choose the distance metric as cosine distance function.

More Detail: No Fuss Distance Metric Learning using Proxies

Other related blogs:

COLING’14: Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts

NAACL’19: Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence

NIPS’2017: Attention Is All You Need (Transformer)

NIPS’13: Distributed Representations of Words and Phrases and their Compositionality

Conference

ICCV: International Conference on Computer Vision

http://iccv2019.thecvf.com/submission/timeline

CVPR: Conference on Computer Vision and Pattern Recognition

http://cvpr2019.thecvf.com/

ECCV: European Conference on Computer Vision

https://eccv2020.eu/

Top Conference Paper Challenge:
https://medium.com/@arthurlee_73761/top-conference-paper-challenge-2d7ca24115c6
My Website:
https://light0617.github.io/#/