https://www.pinterest.com/pin/737253401494151374/

KDD 18': Real-time Personalization using Embeddings for Search Ranking at Airbnb

Arthur Lee

--

🤗 Recommendation system paper challenge (25/50)

paper link

🤔 What problem do they solve?

Search Ranking and Recommendation.

😮 What are the challenges?

There are three things make this application unique from others.

  1. a two-sided marketplace in which one needs to optimize for host and guest preferences
  2. a user rarely consumes the same item twice
  3. one listing can accept only one guest for a certain set of dates

😎 Modeling

Real-time Personalization -> capture super short-term information

They implemented a solution where embeddings of items that user most recently interacted with are combined in an online manner to calculate similarities to items that need to be ranked

Leveraging Conversions as Global Context

They split the session to two groups:

  1. session ending with booking
  2. session ending without booking

For the first group, they insert the final booked biz into each window.

v_l: embedding of the target listing

v_c: embedding of the context listing (neighboring or negative sampling)

For the second group, they keep to use the original word2vec model (time-window).

Adapting Training for Congregated Search

In the travel platform, users frequently search only within a certain market. Hence, they do another negative sampling from the market where the opportunity listing is. (here is v_mn)

And we combined the equation 4, so we can get equation 5:

User Type Embeddings

Why do they consider user type embedding?

Short answer: cold-user

long answer:

  • super rare positive event: Booking sessions data Sb is much smaller than click sessions data S because bookings are less frequent events.
  • model needs more session to learn (not enough): To learn a meaningful embedding for any entity from contextual information at least 5 − 10 occurrences of that entity are needed in the data, and there are many listing_ids on the platform that were booked less than 5 − 10 times.
  • user time variant property: long time intervals may pass between two consecutive bookings by the user, and in that time user preferences, such as price point, may change, e.g. due to career change

How do we do it?

We can just do it by replacing v_l to v_user_type and v_listing_type.

Rejections as Explicit Negatives

In renting market, some biz would reject some guests with bad profile. Therefore, they should consider reject as the label.

How to handle rejection? label it as negative!

Final model:

Considering different scenario to create the embeddings and then put them into GBDT model

🤨 Experiments

Setting up Daily Training

They found they get better offline performance if we re-train listing embeddings from scratch every day, instead of incrementally continuing training on existing vectors.

The day-to-day vector differences do not cause discrepancies in our models because in our applications we use the cosine similarity as the primary signal and not the actual vectors themselves.

Hyperparameters configuration

Dimension of embedding = 32

window size = 5

10 iterations over the training data

Applying some change on original word2vec c code

End-to-end daily data generation and training pipeline is implemented using Airflow2 , which is Airbnb’s open-sourced scheduling platform

Offline Evaluation of Listing Embeddings

Similar Listings using Embeddings

online AB testing: CTR increased 23%.

Final A/B testing result

DCU (Discounted Cumulative Utility)

NDCU (Normalized Discounted Cumulative Utility)

More detail about DCU: blog

🥳 The Qualitative result of the model

evaluation tool:

  1. Check the similarity is similar if they are in the same group.

2. Take an example of similar listing (check if it makes sense)

3. Build a website to do the evaluation

🧐 Reference:

More detail about DCU: blog

word2vec paper:

demo video:

https://www.youtube.com/watch?v=aWjsUEX7B1I&ab_channel=KDD2018video

🙃 Other related blogs:

KDD 18': Notification Volume Control and Optimization System at Pinterest

KDD 19': PinText: A Multitask Text Embedding System in Pinterest

CVPR19' Complete the Look: Scene-based Complementary Product Recommendation

COLING’14: Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts

NAACL’19: Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence

NIPS’2017: Attention Is All You Need (Transformer)

KDD’19: Learning a Unified Embedding for Visual Search at Pinterest

BMVC19' Classification is a Strong Baseline for Deep Metric Learning

KDD’18: Graph Convolutional Neural Networks for Web-Scale Recommender Systems

WWW’17: Visual Discovery at Pinterest

🤩 Conference

ICCV: International Conference on Computer Vision

http://iccv2019.thecvf.com/submission/timeline

CVPR: Conference on Computer Vision and Pattern Recognition

http://cvpr2019.thecvf.com/

KDD 2020

https://www.kdd.org/kdd2020/

Top Conference Paper Challenge:

https://medium.com/@arthurlee_73761/top-conference-paper-challenge-2d7ca24115c6

My Website:

https://light0617.github.io/#/

--

--

Arthur Lee

An machine learning engineer in Bay Area in the United States