KDD 18': Real-time Personalization using Embeddings for Search Ranking at Airbnb
🤗 Recommendation system paper challenge (25/50)
🤔 What problem do they solve?
Search Ranking and Recommendation.
😮 What are the challenges?
There are three things make this application unique from others.
- a two-sided marketplace in which one needs to optimize for host and guest preferences
- a user rarely consumes the same item twice
- one listing can accept only one guest for a certain set of dates
😎 Modeling
Real-time Personalization -> capture super short-term information
They implemented a solution where embeddings of items that user most recently interacted with are combined in an online manner to calculate similarities to items that need to be ranked
Leveraging Conversions as Global Context
They split the session to two groups:
- session ending with booking
- session ending without booking
For the first group, they insert the final booked biz into each window.
v_l: embedding of the target listing
v_c: embedding of the context listing (neighboring or negative sampling)
For the second group, they keep to use the original word2vec model (time-window).
Adapting Training for Congregated Search
In the travel platform, users frequently search only within a certain market. Hence, they do another negative sampling from the market where the opportunity listing is. (here is v_mn)
And we combined the equation 4, so we can get equation 5:
User Type Embeddings
Why do they consider user type embedding?
Short answer: cold-user
long answer:
- super rare positive event: Booking sessions data Sb is much smaller than click sessions data S because bookings are less frequent events.
- model needs more session to learn (not enough): To learn a meaningful embedding for any entity from contextual information at least 5 − 10 occurrences of that entity are needed in the data, and there are many listing_ids on the platform that were booked less than 5 − 10 times.
- user time variant property: long time intervals may pass between two consecutive bookings by the user, and in that time user preferences, such as price point, may change, e.g. due to career change
How do we do it?
We can just do it by replacing v_l to v_user_type and v_listing_type.
Rejections as Explicit Negatives
In renting market, some biz would reject some guests with bad profile. Therefore, they should consider reject as the label.
How to handle rejection? label it as negative!
Final model:
Considering different scenario to create the embeddings and then put them into GBDT model
🤨 Experiments
Setting up Daily Training
They found they get better offline performance if we re-train listing embeddings from scratch every day, instead of incrementally continuing training on existing vectors.
The day-to-day vector differences do not cause discrepancies in our models because in our applications we use the cosine similarity as the primary signal and not the actual vectors themselves.
Hyperparameters configuration
Dimension of embedding = 32
window size = 5
10 iterations over the training data
Applying some change on original word2vec c code
End-to-end daily data generation and training pipeline is implemented using Airflow2 , which is Airbnb’s open-sourced scheduling platform
Offline Evaluation of Listing Embeddings
Similar Listings using Embeddings
online AB testing: CTR increased 23%.
Final A/B testing result
DCU (Discounted Cumulative Utility)
NDCU (Normalized Discounted Cumulative Utility)
More detail about DCU: blog
🥳 The Qualitative result of the model
evaluation tool:
- Check the similarity is similar if they are in the same group.
2. Take an example of similar listing (check if it makes sense)
3. Build a website to do the evaluation
🧐 Reference:
More detail about DCU: blog
word2vec paper:
demo video:
https://www.youtube.com/watch?v=aWjsUEX7B1I&ab_channel=KDD2018video
🙃 Other related blogs:
KDD 18': Notification Volume Control and Optimization System at Pinterest
KDD 19': PinText: A Multitask Text Embedding System in Pinterest
CVPR19' Complete the Look: Scene-based Complementary Product Recommendation
COLING’14: Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts
NAACL’19: Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence
NIPS’2017: Attention Is All You Need (Transformer)
KDD’19: Learning a Unified Embedding for Visual Search at Pinterest
BMVC19' Classification is a Strong Baseline for Deep Metric Learning
KDD’18: Graph Convolutional Neural Networks for Web-Scale Recommender Systems
WWW’17: Visual Discovery at Pinterest
🤩 Conference
ICCV: International Conference on Computer Vision
http://iccv2019.thecvf.com/submission/timeline
CVPR: Conference on Computer Vision and Pattern Recognition
KDD 2020
Top Conference Paper Challenge:
https://medium.com/@arthurlee_73761/top-conference-paper-challenge-2d7ca24115c6
My Website: