WWW’17: Visual Discovery
Recommendation system paper challenge (20/50)
Engineer blog related to this paper (Introducing a new way to visually search on Pinterest)
My review link for the engineering blog
What problem do they solve?
This paper introduce how do they design their machine learning systems.
FEATURE REPRESENTATION
VGG 16 fc6 layer performs best. Beside that, when they do binarization, it reduces the noise and make further improvement.
OBJECT DETECTION
Faster R-CNN is the model with state-of-the-art detection performance and favorable scalability with a high number of categories.
Single Shot Detection is a faster model than Faster R-CNN, saving cost.
Instead of using negative sampling, they applied Online Hard Example Mining to do sampling, but it would not give a win in a small data set.
PINTEREST RELATED PINS (visual feature matters on visual category)
How to measure user engagement?
closeups (see details), visit the associated Web link, long click, save pins onto their own board.
They are interested in Related Pins Save Propensity, (how many save / how many see) in related Pins
Convnet Features for recommendations
model: Rank-SVM to
feature:
control: existing features
treatment: fine-tuned VGG fc6 and fc8 visual similarity features + existing features
=> improving 4% engagement
If utilized AlexNet, will only gain 0.8% engagement
They noted that the engagement gain was stronger in predominantly visual categories, such as art (8.8%), tattoos (8.0%), illustrations (7.9%), and design (7.7%), and lower in categories which primarily rely on text, such as quotes (2.0%) and fitness planning (0.2%).
Object detection for recommendation
Users sometimes focus on few specific items in the image not the whole image. => they would like to try if object detection helps
Variant C: same with visual features in control but when we found there is a dominate visual object, we give more weight on visual similarity. The presence of visual objects alone indicates that visual similarity should be weighed more heavily
PINTEREST FLASHLIGHT
input: detection features or users cropping
output: image results + clickable tf-idf weighted annotations
Convnet Features for Retrieval
Because when users do cropping, there is no annotations for the area, they only applied deep learning embedding to do re-ranking in the candidate pool (nearest neighboring before) -> small computation cost
Object Detection for Retrieval
Applying Faster R-CNN to do real-time object detection.
Replacing cropping with clickable dot by applying Faster R-CNN
Advantages:
- easily to measure user engagement (click)
- simplify the user interface of Flashlight
The problems:
- irrelevant retrieval results
- bad feature detection
Solutions: Set threshold
- visual Hamming distance (4096 dimensional binarized convnet features),
2. top annotation score (aggregated tf-idf scored annotations from the visual search results)
3. category conformity (maximum portion of visual search in same category) => most important
PINTEREST LENS
It returns diverse set of engaging results semantically relevant to the query
The components:
- query understanding layer: visual and semantic features
- result blending
Object Search
Not only index the whole image but also objects.
Applying SSD object detector
Other related blogs:
KDD’15 Visual Search at Pinterest
RecSys16: Adaptive, Personalized Diversity for Visual Discovery
NAACL’19: Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence
RecSys ’17: Translation-based Recommendation
RecSys ’18: Causal Embeddings for Recommendation
Best paper in RecSys:
https://recsys.acm.org/best-papers/
KDD:
My Website: