[Eng blog: Doordash-22] Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers

Arthur Lee

2 min readAug 8, 2022

🤗 Machine learning Engineer blog challenge (2/100)

Doordash — 2022

Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers

🤔 What problem do they solve?

他們想要找到一個model去決定要對哪些dasher(運送員)發訊息去送貨,特別在尖峰時刻,supply 不夠多時候,如下圖所示

🤔 What alternative do they consider but not use?

方法A: supervised classifier

基本概念:我們有歷史data,我們就直接跑一個supervised classifier對每個dasher得到一個”接受dash機率”,rank機率從高到低

缺點

non-stationary: 每個dasher偏好會一直隨時間改變
太exploit: 總是對hot dasher去要求他運送

😎 Proposal solution: bandit algorithm with Thompson Sampling

這邊作者又提到了三個方法

Epsilon-Greedy algorithm

一開始直接選最高的rank/接受率
有ε機率random選其他人;有1-ε機率選最佳的人

優點

簡單易懂

缺點

ε 是固定的,無法隨時間變動

Upper Confidence Bound (UCB) bandit algorithm

樣本p的機率會隨著樣本數N增加而接近真實的p
樣本p與真實p的差距在於一個delta

delta

多選中一次,delta會慢慢變小,會慢慢收斂

優點

很快找到最好的 Dashers
當有足夠data,很容易exploit instead of explore,既然已經收斂到p了,就完全沒有randomness

缺點

黑盒子,不好解釋
很愛選new dasher,因為delta很大

Thompson Sampling takes a Bayesian approach

一開始有個prior, 經由實驗每次都update posterior

prior (historical data, weight decay)+ update rule(gamma) + hyperparameters

優點

easy implement
很快找到最好的 Dashers
好解釋
根據prior, update rule, 未來彈性更大

缺點

需要設定prioir

如何挑參數？

length of each observation: 太長計算太久;太短不夠data去計算
What prior for new dasher: historical data, weight decay
imbalance in data: how much weight should we give success vs. failure?

🤔心得

本文提到的classifier的缺點,其實還是可以解決的

對於”non-stationary”, 可以用更頻繁的retrain model再加上一些time weight decay還是可以解決

對於exploit, 其實加一點randomness 還是可以解決

但我個人認為更重要的是cold-start problem, (limited data)

由於現在要針對每個dasher去預測他們接單的機率

每個dasher第一點,非常獨特性,很難用global feature去預測他們是否會接單,或是說沒有更仔細的features (總不能拿他們的出生到現在的所有data),所以用我們可以用到的feature去預測很困難
獨特性也算惹,如果有大量data還是可以跑個”dasher embedding”,但可以想見的,每個dasher可能加入時間很短,也不是每天都接單,data非常limited

在預測的目標非常獨特性,還有data limited下, bandit algorithm 就是個很好的解決方案

其他心得

之前在做Ads要放哪個image時候,我們也用bandit algorithm, 主要是那時後NLP, CV還不夠發達,現在可能不需要惹,藉由圖片的資訊跟圖片內的文字資訊可以推斷機率
我還有一個project做call rerouting是用supervised learning, 看到這文章發現問題非常相似,當使用者打電話打不通,會自動轉接成其他服務商,而如何rank這些服務商的問題,但後來想想,服務商其實有限,沒有dashers這麼多,又小眾,頻率更低,所以其實supervised learning可能就足夠了

🙃 Other related blogs:

[Eng blog: Doordash-21] Using Triplet Loss and Siamese Neural Networks to Train Catalog Item Embeddings

KDD 21':Learning to Embed Categorical Features without Embedding Tables for Recommendation (google)

KDD 19': Sampling-bias-corrected neural modeling for large corpus item recommendations

KDD 18': Notification Volume Control and Optimization System at Pinterest

CVPR19' Complete the Look: Scene-based Complementary Product Recommendation

NAACL’19: Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence

NIPS’2017: Attention Is All You Need (Transformer)

KDD’19: Learning a Unified Embedding for Visual Search at Pinterest

BMVC19' Classification is a Strong Baseline for Deep Metric Learning

KDD’18: Graph Convolutional Neural Networks for Web-Scale Recommender Systems

🤩 Conference

ICCV: International Conference on Computer Vision

http://iccv2019.thecvf.com/submission/timeline

CVPR: Conference on Computer Vision and Pattern Recognition

http://cvpr2019.thecvf.com/

Top Conference Paper Challenge:
https://medium.com/@arthurlee_73761/top-conference-paper-challenge-2d7ca24115c6
My Website:
https://light0617.github.io/#/