KDD 19': Sampling-bias-corrected neural modeling for large corpus item recommendations
🤗 Recommendation system paper challenge (29/50)
🤔 What problem do they solve?
Large Scale item recommendation
Given a query x, we would like to recommend items y from M items and we can observe reward (watch time) for each pair (x, y)
Label: reward r (watch time)
Train data: a pair (query x, item y, reward r)
Model: two-tower model by optimizing the loss from in-batch negatives
😮 What are the challenges?
Data is hugely skewed, if we randomly sample negatives, we would sample the popular items more frequently.
In-batch loss comes from sampling bias when we have skewed data distribution. In this case, we will overly penalized the popular items
😎 Solution: Sampling Bias Corrected
We can update batch-SoftMax function with modified function to avoid overly penalized popular items
How do we get the whole frequency efficiently?
naively, we can count the whole training set and save them in a global hash table or apply Count–min sketch. But we can do better!
We can estimate the frequency within the batch.
Instead of counting frequency in a batch, they estimate by calculating the interval between two hints.
For example, if an item hits every 10 steps, we can confidently estimate the frequency is 0.1.
However, it is streaming, we can not get all information at the first time, what we can do is just like Bayesian (prior v.s. posterior), we have a prior first and when we see more and more data, we can gradually update our estimation.
The whole algorithm will be the following.
Basically idea, we only modify the batch-SoftMax function and keep other parts same
In this paper, they also prove how good is the estimator by proving it satisfying the Consistency.
🙃 Other related blogs:
KDD 19': Heterogeneous Graph Neural Network
KDD 19': Applying Deep Learning To Airbnb Search
KDD 18': Real-time Personalization using Embeddings for Search Ranking at Airbnb
KDD 18': Notification Volume Control and Optimization System at Pinterest
KDD 19': PinText: A Multitask Text Embedding System in Pinterest
CVPR19' Complete the Look: Scene-based Complementary Product Recommendation
NAACL’19: Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence
NIPS’2017: Attention Is All You Need (Transformer)
KDD’19: Learning a Unified Embedding for Visual Search at Pinterest
BMVC19' Classification is a Strong Baseline for Deep Metric Learning
KDD’18: Graph Convolutional Neural Networks for Web-Scale Recommender Systems
WWW’17: Visual Discovery at Pinterest
🤩 Conference
ICCV: International Conference on Computer Vision
http://iccv2019.thecvf.com/submission/timeline
CVPR: Conference on Computer Vision and Pattern Recognition
KDD 2020
Top Conference Paper Challenge:
https://medium.com/@arthurlee_73761/top-conference-paper-challenge-2d7ca24115c6
My Website: