KDD’15 Visual Search at Pinterest
Recommendation system paper challenge (19/50)
Engineer blog related to this paper (Introducing a new way to visually search on Pinterest)
What problem do they solve?
They developed a maintain a cost-effective, large-scale visual search system
It consists two features: Related Pins, Similar Looks
Related Pins
Given an image, the application will extract all components in this image and recommend to users.
For example, the image contains a bag and jeans. So it recommend both.
Disadvantage: sometimes, it is hard to detect all of the items in the image. Sometimes, a user would like to focus on one specific item.
The current model:
The current model considers user-to-board (user make annotations for the image) + content-based features together. The model trained offline.
However, for the new-created image and less popular image, they are lack of information for user-to-board. For example, we trained the model on 07/01 and monthly training, so for the images which is created in [07/01–07/31], the model will not have any user-to-board features for them (the features values are null). It makes the new-created image less popular.
The new model:
They employed local token index to detect near duplicated images (user does not want it!) and then applied FC6 features of the VGG 16-layer model to capture the visual embedding and return the top-k similar images.
Similar looks
There are two red points. Each point maps to an item. When a user click the point, the application will recommend products to the related item.
It can solve the issues I mentioned above for related Pins.
It localizes and classifies fashion objects and recommend products to users.
The model:
Two-step Object Detection and Localization
For each image, with user-to-board information, each image has many annotations, the application can extract the keywords from them.
Then the application apply object-detection algorithm for the image.
They applied text-filter and image-filter together to find the top-k recommendation.
Later, I will introduce their infrastructure to achieve the purpose of the applications.
Search Infrastructure
Scenario
- Similar Looks
- near-duplicate detection
- content recommendation
visually similar results:
First they feed visualjoins to different machines with Hadoop. Each machine has the subset of images information.
Each machine has two types of keys:
- token indices with vector-quantized features, image doc-id hashes as posting lists (most in disk, part in MEM)
- memory cached features (visual + meta-data: annotations)
Each machine then compute K-nearest-neighbor with Leaf-Ranker
Finally, the merge node returns top candidates by computing a score between the query image and each of the top candidate images based on additional metadata such as annotations.
How to update the new feature or new image uploaded?
They use Incremental Fingerprinting Service to make features up-to-date.
Scenario:
- new image uploaded (only compute features for new image)
- feature changed (backfill all features)
Approach:
Data sharding with epochs grouped by upload date and store in S3.
Let’s say, 2020/08/01, 2020/08/02, 2020/08/03
There are 3 types of features (global, local, deep) and each one has its version.
How to store the features for accessible
These features are merged to form a fingerprint containing all available features of an image
The fingerprints are copied into sharded sorted files for random access by image signature (MD5 hash).
These joined fingerprint files are regularly re-materialized, but the expensive feature computation needs only be done once per image.
The pipeline for incremental fingerprint updating
Other related blogs:
RecSys16: Adaptive, Personalized Diversity for Visual Discovery
NAACL’19: Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence
RecSys ’17: Translation-based Recommendation
RecSys ’18: Causal Embeddings for Recommendation
Best paper in RecSys:
https://recsys.acm.org/best-papers/
KDD:
My Website: