https://www.pinterest.com/pin/295337688034642062/?nic_v2=1a3dFhpjD

WWW’17: Visual Discovery

Arthur Lee
4 min readAug 23, 2020

Recommendation system paper challenge (20/50)

paper link

Engineer blog related to this paper (Introducing a new way to visually search on Pinterest)

My review link for the engineering blog

What problem do they solve?

This paper introduce how do they design their machine learning systems.

FEATURE REPRESENTATION

VGG 16 fc6 layer performs best. Beside that, when they do binarization, it reduces the noise and make further improvement.

OBJECT DETECTION

Faster R-CNN is the model with state-of-the-art detection performance and favorable scalability with a high number of categories.

Single Shot Detection is a faster model than Faster R-CNN, saving cost.

Instead of using negative sampling, they applied Online Hard Example Mining to do sampling, but it would not give a win in a small data set.

PINTEREST RELATED PINS (visual feature matters on visual category)

How to measure user engagement?

closeups (see details), visit the associated Web link, long click, save pins onto their own board.

They are interested in Related Pins Save Propensity, (how many save / how many see) in related Pins

Convnet Features for recommendations

model: Rank-SVM to

feature:

control: existing features

treatment: fine-tuned VGG fc6 and fc8 visual similarity features + existing features

=> improving 4% engagement

If utilized AlexNet, will only gain 0.8% engagement

They noted that the engagement gain was stronger in predominantly visual categories, such as art (8.8%), tattoos (8.0%), illustrations (7.9%), and design (7.7%), and lower in categories which primarily rely on text, such as quotes (2.0%) and fitness planning (0.2%).

Object detection for recommendation

Users sometimes focus on few specific items in the image not the whole image. => they would like to try if object detection helps

Variant C: same with visual features in control but when we found there is a dominate visual object, we give more weight on visual similarity. The presence of visual objects alone indicates that visual similarity should be weighed more heavily

PINTEREST FLASHLIGHT

input: detection features or users cropping

output: image results + clickable tf-idf weighted annotations

Convnet Features for Retrieval

Because when users do cropping, there is no annotations for the area, they only applied deep learning embedding to do re-ranking in the candidate pool (nearest neighboring before) -> small computation cost

Object Detection for Retrieval

Applying Faster R-CNN to do real-time object detection.

Replacing cropping with clickable dot by applying Faster R-CNN

Advantages:

  1. easily to measure user engagement (click)
  2. simplify the user interface of Flashlight

The problems:

  1. irrelevant retrieval results
  2. bad feature detection

Solutions: Set threshold

  1. visual Hamming distance (4096 dimensional binarized convnet features),

2. top annotation score (aggregated tf-idf scored annotations from the visual search results)

3. category conformity (maximum portion of visual search in same category) => most important

PINTEREST LENS

It returns diverse set of engaging results semantically relevant to the query

The components:

  1. query understanding layer: visual and semantic features
  2. result blending

Object Search

Not only index the whole image but also objects.

Applying SSD object detector

Other related blogs:

KDD’15 Visual Search at Pinterest

RecSys16: Adaptive, Personalized Diversity for Visual Discovery

NAACL’19: Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence

RecSys ’17: Translation-based Recommendation

RecSys ’18: Causal Embeddings for Recommendation

Best paper in RecSys:

https://recsys.acm.org/best-papers/

KDD:

https://www.kdd.org/kdd2020/

My Website:

https://light0617.github.io/#/

--

--

Arthur Lee
Arthur Lee

Written by Arthur Lee

An machine learning engineer in Bay Area in the United States

No responses yet