Pinterest

KDD 19': Sampling-bias-corrected neural modeling for large corpus item recommendations

🤗 Recommendation system paper challenge (29/50)

paper link

🤔 What problem do they solve?

Large Scale item recommendation

Given a query x, we would like to recommend items y from M items and we can observe reward (watch time) for each pair (x, y)

Label: reward r (watch time)

Train data: a pair (query x, item y, reward r)

Model: two-tower model by optimizing the loss from in-batch negatives

😮 What are the challenges?

Data is hugely skewed, if we randomly sample negatives, we would sample the popular items more frequently.

In-batch loss comes from sampling bias when we have skewed data distribution. In this case, we will overly penalized the popular items

😎 Solution: Sampling Bias Corrected

We can update batch-SoftMax function with modified function to avoid overly penalized popular items

How do we get the whole frequency efficiently?

naively, we can count the whole training set and save them in a global hash table or apply Count–min sketch. But we can do better!

We can estimate the frequency within the batch.

Instead of counting frequency in a batch, they estimate by calculating the interval between two hints.

For example, if an item hits every 10 steps, we can confidently estimate the frequency is 0.1.

However, it is streaming, we can not get all information at the first time, what we can do is just like Bayesian (prior v.s. posterior), we have a prior first and when we see more and more data, we can gradually update our estimation.

The whole algorithm will be the following.

Basically idea, we only modify the batch-SoftMax function and keep other parts same

In this paper, they also prove how good is the estimator by proving it satisfying the Consistency.

🙃 Other related blogs:

KDD 19': Heterogeneous Graph Neural Network

KDD 19': Applying Deep Learning To Airbnb Search

KDD 18': Real-time Personalization using Embeddings for Search Ranking at Airbnb

KDD 18': Notification Volume Control and Optimization System at Pinterest

KDD 19': PinText: A Multitask Text Embedding System in Pinterest

CVPR19' Complete the Look: Scene-based Complementary Product Recommendation

NAACL’19: Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence

NIPS’2017: Attention Is All You Need (Transformer)

KDD’19: Learning a Unified Embedding for Visual Search at Pinterest

BMVC19' Classification is a Strong Baseline for Deep Metric Learning

KDD’18: Graph Convolutional Neural Networks for Web-Scale Recommender Systems

WWW’17: Visual Discovery at Pinterest

🤩 Conference

ICCV: International Conference on Computer Vision

http://iccv2019.thecvf.com/submission/timeline

CVPR: Conference on Computer Vision and Pattern Recognition

http://cvpr2019.thecvf.com/

KDD 2020

https://www.kdd.org/kdd2020/

Top Conference Paper Challenge:

https://medium.com/@arthurlee_73761/top-conference-paper-challenge-2d7ca24115c6

My Website:

https://light0617.github.io/#/

--

--

--

An machine learning engineer in Bay Area in the United States

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Neural Networks: The Basics

Customer Churn Prediction Using PySpark MLlib

Fashion-MNIST — A study in Neural Networks

Why Your Machine Learning Project Might Fail And How to Avoid It

Beyond BERT

The Journey of Open AI GPT models

Model-based Recommendation System with Matrix Factorization — ALS Model and The Math behind

Democratizing Machine Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Arthur Lee

Arthur Lee

An machine learning engineer in Bay Area in the United States

More from Medium

How Machines Learn Without Knowing the Right Answer

Review of Machine Learning with Python-From Linear Models to Deep Learning on Edx

Cost Function and Performance Metrics in Deep Learning

How is AI going to shape every aspect of our shared futures?