KDD 19': Sampling-bias-corrected neural modeling for large corpus item recommendations
🤗 Recommendation system paper challenge (29/50)
🤔 What problem do they solve?
Large Scale item recommendation
Given a query x, we would like to recommend items y from M items and we can observe reward (watch time) for each pair (x, y)
Label: reward r (watch time)
Train data: a pair (query x, item y, reward r)
Model: two-tower model by optimizing the loss from in-batch negatives
😮 What are the challenges?
Data is hugely skewed, if we randomly sample negatives, we would sample the popular items more frequently.
In-batch loss comes from sampling bias when we have skewed data distribution. In this case, we will overly penalized the popular items
😎 Solution: Sampling Bias Corrected
We can update batch-SoftMax function with modified function to avoid overly penalized popular items
How do we get the whole frequency efficiently?
naively, we can count the whole training set and save them in a global hash table or apply Count–min sketch. But we can do better!
We can estimate the frequency within the batch.
Instead of counting frequency in a batch, they estimate by calculating the interval between two hints.
For example, if an item hits every 10 steps, we can confidently estimate the frequency is 0.1.
However, it is streaming, we can not get all information at the first time, what we can do is just like Bayesian (prior v.s. posterior), we have a prior first and when we see more and more data, we can gradually update our estimation.
The whole algorithm will be the following.
Basically idea, we only modify the batch-SoftMax function and keep other parts same
In this paper, they also prove how good is the estimator by proving it satisfying the Consistency.
🙃 Other related blogs:
WWW’17: Visual Discovery at Pinterest
ICCV: International Conference on Computer Vision
CVPR: Conference on Computer Vision and Pattern Recognition
Top Conference Paper Challenge: