(14) Pinterest

[Eng blog: Doordash-22] Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers

🤗 Machine learning Engineer blog challenge (2/100)

Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers

🤔 What problem do they solve?

🤔 What alternative do they consider but not use?

方法A: supervised classifier


  • non-stationary: 每個dasher偏好會一直隨時間改變
  • 太exploit: 總是對hot dasher去要求他運送

😎 Proposal solution: bandit algorithm with Thompson Sampling

Epsilon-Greedy algorithm

  • 一開始直接選最高的rank/接受率
  • ε機率random選其他人;有1-ε機率選最佳的人


  • 簡單易懂


  • ε 是固定的,無法隨時間變動

Upper Confidence Bound (UCB) bandit algorithm

  • 樣本p的機率會隨著樣本數N增加而接近真實的p
  • 樣本p與真實p的差距在於一個delta


  • 多選中一次,delta會慢慢變小,會慢慢收斂


  • 很快找到最好的 Dashers
  • 當有足夠data,很容易exploit instead of explore,既然已經收斂到p了,就完全沒有randomness


  • 黑盒子,不好解釋
  • 很愛選new dasher,因為delta很大

Thompson Sampling takes a Bayesian approach

一開始有個prior, 經由實驗每次都update posterior

prior (historical data, weight decay)+ update rule(gamma) + hyperparameters


  • easy implement
  • 很快找到最好的 Dashers
  • 好解釋
  • 根據prior, update rule, 未來彈性更大


  • 需要設定prioir


  • length of each observation: 太長計算太久;太短不夠data去計算
  • What prior for new dasher: historical data, weight decay
  • imbalance in data: how much weight should we give success vs. failure?


對於”non-stationary”, 可以用更頻繁的retrain model再加上一些time weight decay還是可以解決

對於exploit, 其實加一點randomness 還是可以解決

但我個人認為更重要的是cold-start problem, (limited data)


  • 每個dasher第一點,非常獨特性,很難用global feature去預測他們是否會接單,或是說沒有更仔細的features (總不能拿他們的出生到現在的所有data),所以用我們可以用到的feature去預測很困難
  • 獨特性也算惹,如果有大量data還是可以跑個”dasher embedding”,但可以想見的,每個dasher可能加入時間很短,也不是每天都接單,data非常limited

在預測的目標非常獨特性,還有data limited下, bandit algorithm 就是個很好的解決方案


  • 之前在做Ads要放哪個image時候,我們也用bandit algorithm, 主要是那時後NLP, CV還不夠發達,現在可能不需要惹,藉由圖片的資訊跟圖片內的文字資訊可以推斷機率
  • 我還有一個project做call rerouting是用supervised learning, 看到這文章發現問題非常相似,當使用者打電話打不通,會自動轉接成其他服務商,而如何rank這些服務商的問題,但後來想想,服務商其實有限,沒有dashers這麼多,又小眾,頻率更低,所以其實supervised learning可能就足夠了

🙃 Other related blogs:

KDD 21':Learning to Embed Categorical Features without Embedding Tables for Recommendation (google)

KDD 19': Sampling-bias-corrected neural modeling for large corpus item recommendations

KDD 18': Notification Volume Control and Optimization System at Pinterest

CVPR19' Complete the Look: Scene-based Complementary Product Recommendation

NAACL’19: Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence

NIPS’2017: Attention Is All You Need (Transformer)

KDD’19: Learning a Unified Embedding for Visual Search at Pinterest

BMVC19' Classification is a Strong Baseline for Deep Metric Learning

KDD’18: Graph Convolutional Neural Networks for Web-Scale Recommender Systems

🤩 Conference


CVPR: Conference on Computer Vision and Pattern Recognition


Top Conference Paper Challenge:


My Website:




Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store