https://www.pinterest.com/pin/370491506845953892/?nic_v1=1a3dFhpjD&nic_v2=1a3dFhpjD

KDD’15 Visual Search at Pinterest

Arthur Lee

4 min readAug 7, 2020

Recommendation system paper challenge (19/50)

paper link

Engineer blog related to this paper (Introducing a new way to visually search on Pinterest)

What problem do they solve?

They developed a maintain a cost-effective, large-scale visual search system

It consists two features: Related Pins, Similar Looks

Related Pins

Given an image, the application will extract all components in this image and recommend to users.

For example, the image contains a bag and jeans. So it recommend both.

Disadvantage: sometimes, it is hard to detect all of the items in the image. Sometimes, a user would like to focus on one specific item.

The current model:

The current model considers user-to-board (user make annotations for the image) + content-based features together. The model trained offline.

However, for the new-created image and less popular image, they are lack of information for user-to-board. For example, we trained the model on 07/01 and monthly training, so for the images which is created in [07/01–07/31], the model will not have any user-to-board features for them (the features values are null). It makes the new-created image less popular.

The new model:

They employed local token index to detect near duplicated images (user does not want it!) and then applied FC6 features of the VGG 16-layer model to capture the visual embedding and return the top-k similar images.

Similar looks

There are two red points. Each point maps to an item. When a user click the point, the application will recommend products to the related item.

It can solve the issues I mentioned above for related Pins.

It localizes and classifies fashion objects and recommend products to users.

The model:

Two-step Object Detection and Localization

For each image, with user-to-board information, each image has many annotations, the application can extract the keywords from them.

Then the application apply object-detection algorithm for the image.

They applied text-filter and image-filter together to find the top-k recommendation.

Later, I will introduce their infrastructure to achieve the purpose of the applications.

Search Infrastructure

Scenario

Similar Looks
near-duplicate detection
content recommendation

visually similar results:

First they feed visualjoins to different machines with Hadoop. Each machine has the subset of images information.

Each machine has two types of keys:

token indices with vector-quantized features, image doc-id hashes as posting lists (most in disk, part in MEM)
memory cached features (visual + meta-data: annotations)

Each machine then compute K-nearest-neighbor with Leaf-Ranker

Finally, the merge node returns top candidates by computing a score between the query image and each of the top candidate images based on additional metadata such as annotations.