https://www.pinterest.com/pin/295337688034642062/?nic_v2=1a3dFhpjD

WWW’17: Visual Discovery

Arthur Lee

4 min readAug 23, 2020

Recommendation system paper challenge (20/50)

paper link

Engineer blog related to this paper (Introducing a new way to visually search on Pinterest)

My review link for the engineering blog

What problem do they solve?

This paper introduce how do they design their machine learning systems.

FEATURE REPRESENTATION

VGG 16 fc6 layer performs best. Beside that, when they do binarization, it reduces the noise and make further improvement.

OBJECT DETECTION

Faster R-CNN is the model with state-of-the-art detection performance and favorable scalability with a high number of categories.

Single Shot Detection is a faster model than Faster R-CNN, saving cost.

Instead of using negative sampling, they applied Online Hard Example Mining to do sampling, but it would not give a win in a small data set.

PINTEREST RELATED PINS (visual feature matters on visual category)

How to measure user engagement?

closeups (see details), visit the associated Web link, long click, save pins onto their own board.

They are interested in Related Pins Save Propensity, (how many save / how many see) in related Pins

Convnet Features for recommendations

model: Rank-SVM to

feature:

control: existing features

treatment: fine-tuned VGG fc6 and fc8 visual similarity features + existing features

=> improving 4% engagement

If utilized AlexNet, will only gain 0.8% engagement

They noted that the engagement gain was stronger in predominantly visual categories, such as art (8.8%), tattoos (8.0%), illustrations (7.9%), and design (7.7%), and lower in categories which primarily rely on text, such as quotes (2.0%) and fitness planning (0.2%).

Object detection for recommendation

Users sometimes focus on few specific items in the image not the whole image. => they would like to try if object detection helps

Variant C: same with visual features in control but when we found there is a dominate visual object, we give more weight on visual similarity. The presence of visual objects alone indicates that visual similarity should be weighed more heavily

PINTEREST FLASHLIGHT

input: detection features or users cropping

output: image results + clickable tf-idf weighted annotations

Convnet Features for Retrieval

Because when users do cropping, there is no annotations for the area, they only applied deep learning embedding to do re-ranking in the candidate pool (nearest neighboring before) -> small computation cost

Object Detection for Retrieval

Applying Faster R-CNN to do real-time object detection.

Replacing cropping with clickable dot by applying Faster R-CNN

Advantages: