KDD 17': Visual Search at eBay
🤗 Recommendation system paper challenge (26/50)
🤔 What problem do they solve?
When users upload their images, the system should recommend similar items.
😮 What are the challenges?
There are three things make this application unique from others.
- Volatile Inventory: numerous items are listed and sold every minute. Thus, listings are short-lived
- Scale: large scale
- Data quality: Image quality is diverse in eBay inventory since it is a platform that enables both high volume and occasional sellers
- Quality of query Image: eBay ShopBot allows users to upload query images, so quality is diverse too.
😎 Overview of the system
Instead of extracting features from the DNN and performing exhaustive search over the entire database, we search only among top predicted categories and then use semantic binary hash with Hamming distance for fast ranking. For speed and low memory footprint, we use shared topology for both category prediction and binary hash extraction, where we use just a single pass for inference.
They apply ResNet-50 network on their data to train from scratch and focus on fine-grained categories.
They apply XGBM model with image embedding and product attributes to do prediction. It is integrates visual appearance and categorical information into one representation.
It allows for fast inference with minimal resources using CPU only. We train a model for each aspect that can be inferred from an image.
Deep Semantic Binary Hash
Scalability is a key factor when designing a real world large-scale visual search system.
They represent images as binary signatures instead of real values in order to greatly reduce storage requirement and computation overhead.
Aspect-based Image Re-ranking
They treat different aspect with different weight.
How do they know the weights?
They check whether the predicted aspects match such ground-truth aspects and assign a “reward point” wi to each predicted aspect that has an exact match.
Finally, they apply blend formula considering visual similarity and aspect score together to decide the rank.
🤨 SYSTEM ARCHITECTURE
Image Ingestion and Indexing
The ingestion pipeline (Figure 6) detects image updates in near-real-time and maintains them in cloud storage. To reduce storage requirements, duplicate images (about a third) across listings are detected and cross-linked with MD5 hashes over image bits.
As new images arrive, they compute image hashes for the main listing image in micro-batches. Image hashes are stored in a distributed database (we use Google Bigtable), keyed by the image identifier.
For indexing, we generate daily image hash extracts from Bigtable for all available listings in the supported categories.
The batch extraction process runs as a parallel Spark job.
They write a separate file for each category for each job partition, and store these intermediate extracts in cloud storage. After all job partitions are complete, we download intermediate extracts for each category and concatenate them across all job partitions. Concatenated extracts are uploaded back to the cloud storage.
They update our DNN models frequently. To handle frequent updates, they have a separate parallel job that scans all active listings in batches, and recomputes image hashes from stored images.
They keep up to 2 image hashes in Bigtable for each image corresponding to the older and the newer DNN model versions, so the older image hash version can be still used in extracts while hash re-computation is running.
They create an image ranking service and deploy it in a Kubernetes cluster. Given the huge amount of data, they have to split image hashes for all the images across the cluster containing multiple nodes.
They use Hazelcast(an open source in-memory data grid) for cluster awareness. So that each node in the cluster should have knowledge about other nodes in order to decide which part of the data to serve.
To guarantee that all nodes have the same data, they leverage Kubernetes to share single disk, in read-only mode, across multiple pods.
🥳 The Qualitative result in applications
Users to freely take a photo (from camera or photo album) and find similar products in eBay’s massive inventory.
Users can freely take photos of and add descriptions to their products they want to sell to create a listing on Close5, which can be viewed by nearby users.
They apply auto-categorization and Similar items on eBay to make the product more interesting.
🙃 Other related blogs:
WWW’17: Visual Discovery at Pinterest
ICCV: International Conference on Computer Vision
CVPR: Conference on Computer Vision and Pattern Recognition
Top Conference Paper Challenge: