KDD 18': Notification Volume Control and Optimization System at Pinterest
🤗 Recommendation system paper challenge (24/50)
🤔 What problem do they solve?
Push notifications are effective channels for online services to drive user engagement metrics and other business metrics.
One of the most important challenge is how to decide the frequency for each user?
Other questions: when to send ? which channel to send?
They propose a machine learning model to decide the frequency and discuss the system design trade-off in this paper.
If the frequency is too high: the short-term metric will be good while the long term would make users cancel the subscription. Moreover, if too many users cancel the subscriptions, email service providers will s treat Pinterest as a potential spammer.
😮 What are the challenges?
Focus on a long-term objective function instead of considering only short term. But how to figure out the long-term objective function?
Increasing volume has a diminishing return on the utility function, so they need to capture this nonlinear effect.
The optimization algorithm must to be efficient and scalable.
Other practical issues:
How to handle multiple notification channels including emails and push notifications?
How to manage the interaction between the volume control component and content ranking components such that new notification types and ranking models can be easily developed and tested?
🤔 Design principles & System Overview
Weekly Notification Budget
Every week, each user has fixed budgets (numbers of notifications) to get.
Notification Service
Everyday, the system to decide whether we send notification to this user.
Budget Pacer (who to send)
The role of the budget pacer is to fetch the weekly notification budget for each user and schedule it to each day, and the picked user will receive notifications if there is budget on that day.
Ranker (what to send)
After deciding to send notification to a user, which notification types is best?
A model predicting CTR with user engagement history, last time sent of the same notification type.
Delivery (when to send)
After the notification content is generated, the delivery time scheduler will schedule what time to send. After that, they have tracker to collect the engagement.
Volume Optimization Workflows
Hadoop clusters run several Spark ETL jobs.
Data ETL jobs take the input of notification tracking data, and many other data sources and compute features and labels used for training and scoring machine learning models.
After training the model, they save models to the model store.
Model will compute the budget data.
The budget data will be uploaded to the key-value store for online serving
Design Choices
First thing is to have an overall control of total notification volume for each user, and decouple the volume control component from the type ranking component, making the experimental results and launch decisions are much more rigorous.
Another idea is to support custom rules and user settings on notification frequency (e.g., at most once a week).
Besides that, monitoring, analytics and diagnostics of overall notification system health are easier since total volume is much more stable and predictable.
From the modeling perspective, it is also more reasonable to directly optimize for the total number of notifications each user gets in a period of time.
In our framework, they do not make independent assumption and aim to directly estimate the effect of multiple notifications.
😎 Modeling
With a given total number of notifications, they try to figure out the optimal distribution among users such that certain objective function is maximized.
There are two key factors to choose the right objective function.
First, what is the target business metric we would like to improve and what is the relation between the target metric and notification volume.
daily active users (DAU)!
Second, how to model the long term effect of notifications towards the target metric, which should consider the possibilities of both positive and negative actions from the user.
Utility of Notifications towards Target Business Metric
And the utility or reward function for sending k notification is simply p(a|u, k).
Modeling the Long Term Effect
In order to avoid worst case, the objective function p(a|u, ku ) should consider both positive and negative effect of notifications.
They found different users should have different features.
With the long term cost estimation model, and the model to predict the likelihood of unsubscribing, we will be able to approximate the long term effect of notification volume on the activeness of each user.
This way, they have a unified objective function that can capture our real goal, and balance the weights of positive and negative actions at user level.
Models
They apply XGBoost to train the model.
Optimization Algorithm
For each user, they first compute the optimal budget imax in the allowed range [kmin, kmax ], then they gradually increase budget from kmin to imax until the average incremental value of the remaining notifications is below the threshold. This algorithm can be easily implemented using Map-Reduce to scale.
How we find the threshold θ such that the total number of notifications is below K?
For each threshold run Algorithm 1 on users (or samples) to compute the total number of notifications, and choose the minimum threshold with the total number below K. It can run in parallel.
🤨 Experiments
🥳 The Qualitative result of the model
Shift notifications from more active users to less active users.
It makes sense the optimal volume is at the users with relative activities not always highest activities already.
🧐 Reference:
[7] Rupesh Gupta, Guanfeng Liang, and Rómer Rosales. 2017. Optimizing Email Volume For Sitewide Engagement. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, Singapore, November 06–10, 2017. 1947–1955.
[8] Rupesh Gupta, Guanfeng Liang, Hsiao-Ping Tseng, Ravi Kiran Holur Vijay, Xiaoyu Chen, and Rómer Rosales. 2016. Email Volume Optimization at LinkedIn. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13–17, 2016. 97–106.
🙃 Other related blogs:
KDD 19': PinText: A Multitask Text Embedding System in Pinterest
CVPR19' Complete the Look: Scene-based Complementary Product Recommendation
COLING’14: Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts
NAACL’19: Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence
NIPS’2017: Attention Is All You Need (Transformer)
KDD’19: Learning a Unified Embedding for Visual Search at Pinterest
BMVC19' Classification is a Strong Baseline for Deep Metric Learning
KDD’18: Graph Convolutional Neural Networks for Web-Scale Recommender Systems
WWW’17: Visual Discovery at Pinterest
🤩 Conference
ICCV: International Conference on Computer Vision
http://iccv2019.thecvf.com/submission/timeline
CVPR: Conference on Computer Vision and Pattern Recognition
KDD 2020
Top Conference Paper Challenge:
https://medium.com/@arthurlee_73761/top-conference-paper-challenge-2d7ca24115c6
My Website: