https://www.pinterest.com/pin/560698222337124238/

KDD 17': Visual Search at eBay

Arthur Lee

5 min readNov 16, 2020

🤗 Recommendation system paper challenge (26/50)

paper link

🤔 What problem do they solve?

When users upload their images, the system should recommend similar items.

😮 What are the challenges?

There are three things make this application unique from others.

Volatile Inventory: numerous items are listed and sold every minute. Thus, listings are short-lived
Scale: large scale
Data quality: Image quality is diverse in eBay inventory since it is a platform that enables both high volume and occasional sellers
Quality of query Image: eBay ShopBot allows users to upload query images, so quality is diverse too.

😎 Overview of the system

Instead of extracting features from the DNN and performing exhaustive search over the entire database, we search only among top predicted categories and then use semantic binary hash with Hamming distance for fast ranking. For speed and low memory footprint, we use shared topology for both category prediction and binary hash extraction, where we use just a single pass for inference.

😎 Modeling

Category Recognition

They apply ResNet-50 network on their data to train from scratch and focus on fine-grained categories.

Aspect Prediction

They apply XGBM model with image embedding and product attributes to do prediction. It is integrates visual appearance and categorical information into one representation.

It allows for fast inference with minimal resources using CPU only. We train a model for each aspect that can be inferred from an image.

Deep Semantic Binary Hash

Scalability is a key factor when designing a real world large-scale visual search system.

They represent images as binary signatures instead of real values in order to greatly reduce storage requirement and computation overhead.

Aspect-based Image Re-ranking

They treat different aspect with different weight.

How do they know the weights?

They check whether the predicted aspects match such ground-truth aspects and assign a “reward point” wi to each predicted aspect that has an exact match.

Finally, they apply blend formula considering visual similarity and aspect score together to decide the rank.

🤨 SYSTEM ARCHITECTURE

Image Ingestion and Indexing

The ingestion pipeline (Figure 6) detects image updates in near-real-time and maintains them in cloud storage. To reduce storage requirements, duplicate images (about a third) across listings are detected and cross-linked with MD5 hashes over image bits.

As new images arrive, they compute image hashes for the main listing image in micro-batches. Image hashes are stored in a distributed database (we use Google Bigtable), keyed by the image identifier.

For indexing, we generate daily image hash extracts from Bigtable for all available listings in the supported categories.

The batch extraction process runs as a parallel Spark job.

They write a separate file for each category for each job partition, and store these intermediate extracts in cloud storage. After all job partitions are complete, we download intermediate extracts for each category and concatenate them across all job partitions. Concatenated extracts are uploaded back to the cloud storage.

They update our DNN models frequently. To handle frequent updates, they have a separate parallel job that scans all active listings in batches, and recomputes image hashes from stored images.

They keep up to 2 image hashes in Bigtable for each image corresponding to the older and the newer DNN model versions, so the older image hash version can be still used in extracts while hash re-computation is running.

Image Ranking

They create an image ranking service and deploy it in a Kubernetes cluster. Given the huge amount of data, they have to split image hashes for all the images across the cluster containing multiple nodes.

They use Hazelcast(an open source in-memory data grid) for cluster awareness. So that each node in the cluster should have knowledge about other nodes in order to decide which part of the data to serve.

To guarantee that all nodes have the same data, they leverage Kubernetes to share single disk, in read-only mode, across multiple pods.

🥳 The Qualitative result in applications

EBAY SHOPBOT

Users to freely take a photo (from camera or photo album) and find similar products in eBay’s massive inventory.

APPLICATION: CLOSE5

Users can freely take photos of and add descriptions to their products they want to sell to create a listing on Close5, which can be viewed by nearby users.

They apply auto-categorization and Similar items on eBay to make the product more interesting.

🙃 Other related blogs:

KDD 18': Real-time Personalization using Embeddings for Search Ranking at Airbnb

KDD 18': Notification Volume Control and Optimization System at Pinterest

KDD 19': PinText: A Multitask Text Embedding System in Pinterest

CVPR19' Complete the Look: Scene-based Complementary Product Recommendation

COLING’14: Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts

NAACL’19: Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence

NIPS’2017: Attention Is All You Need (Transformer)

KDD’19: Learning a Unified Embedding for Visual Search at Pinterest

BMVC19' Classification is a Strong Baseline for Deep Metric Learning

KDD’18: Graph Convolutional Neural Networks for Web-Scale Recommender Systems

WWW’17: Visual Discovery at Pinterest

🤩 Conference

ICCV: International Conference on Computer Vision

http://iccv2019.thecvf.com/submission/timeline

CVPR: Conference on Computer Vision and Pattern Recognition

http://cvpr2019.thecvf.com/

KDD 2020

https://www.kdd.org/kdd2020/

Top Conference Paper Challenge:
https://medium.com/@arthurlee_73761/top-conference-paper-challenge-2d7ca24115c6
My Website:
https://light0617.github.io/#/