(14) Pinterest

[Eng blog: Doordash-22] Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers

🤗 Machine learning Engineer blog challenge (2/100)

Doordash — 2022

Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers

🤔 What problem do they solve?

他們想要找到一個model去決定要對哪些dasher(運送員)發訊息去送貨,特別在尖峰時刻,supply 不夠多時候,如下圖所示

🤔 What alternative do they consider but not use?

基本概念:我們有歷史data,我們就直接跑一個supervised classifier對每個dasher得到一個”接受dash機率”,rank機率從高到低

缺點

  • non-stationary: 每個dasher偏好會一直隨時間改變
  • 太exploit: 總是對hot dasher去要求他運送

😎 Proposal solution: bandit algorithm with Thompson Sampling

這邊作者又提到了三個方法

Epsilon-Greedy algorithm

  • 一開始直接選最高的rank/接受率
  • ε機率random選其他人;有1-ε機率選最佳的人

優點

  • 簡單易懂

缺點

  • ε 是固定的,無法隨時間變動

Upper Confidence Bound (UCB) bandit algorithm

  • 樣本p的機率會隨著樣本數N增加而接近真實的p
  • 樣本p與真實p的差距在於一個delta

delta

  • 多選中一次,delta會慢慢變小,會慢慢收斂

優點

  • 很快找到最好的 Dashers
  • 當有足夠data,很容易exploit instead of explore,既然已經收斂到p了,就完全沒有randomness

缺點

  • 黑盒子,不好解釋
  • 很愛選new dasher,因為delta很大

Thompson Sampling takes a Bayesian approach

一開始有個prior, 經由實驗每次都update posterior

prior (historical data, weight decay)+ update rule(gamma) + hyperparameters

優點

  • easy implement
  • 很快找到最好的 Dashers
  • 好解釋
  • 根據prior, update rule, 未來彈性更大

缺點

  • 需要設定prioir

如何挑參數?

  • length of each observation: 太長計算太久;太短不夠data去計算
  • What prior for new dasher: historical data, weight decay
  • imbalance in data: how much weight should we give success vs. failure?

🤔心得

本文提到的classifier的缺點,其實還是可以解決的

對於”non-stationary”, 可以用更頻繁的retrain model再加上一些time weight decay還是可以解決

對於exploit, 其實加一點randomness 還是可以解決

但我個人認為更重要的是cold-start problem, (limited data)

由於現在要針對每個dasher去預測他們接單的機率

  • 每個dasher第一點,非常獨特性,很難用global feature去預測他們是否會接單,或是說沒有更仔細的features (總不能拿他們的出生到現在的所有data),所以用我們可以用到的feature去預測很困難
  • 獨特性也算惹,如果有大量data還是可以跑個”dasher embedding”,但可以想見的,每個dasher可能加入時間很短,也不是每天都接單,data非常limited

在預測的目標非常獨特性,還有data limited下, bandit algorithm 就是個很好的解決方案

其他心得

  • 之前在做Ads要放哪個image時候,我們也用bandit algorithm, 主要是那時後NLP, CV還不夠發達,現在可能不需要惹,藉由圖片的資訊跟圖片內的文字資訊可以推斷機率
  • 我還有一個project做call rerouting是用supervised learning, 看到這文章發現問題非常相似,當使用者打電話打不通,會自動轉接成其他服務商,而如何rank這些服務商的問題,但後來想想,服務商其實有限,沒有dashers這麼多,又小眾,頻率更低,所以其實supervised learning可能就足夠了

🙃 Other related blogs:

[Eng blog: Doordash-21] Using Triplet Loss and Siamese Neural Networks to Train Catalog Item Embeddings

KDD 21':Learning to Embed Categorical Features without Embedding Tables for Recommendation (google)

KDD 19': Sampling-bias-corrected neural modeling for large corpus item recommendations

KDD 18': Notification Volume Control and Optimization System at Pinterest

CVPR19' Complete the Look: Scene-based Complementary Product Recommendation

NAACL’19: Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence

NIPS’2017: Attention Is All You Need (Transformer)

KDD’19: Learning a Unified Embedding for Visual Search at Pinterest

BMVC19' Classification is a Strong Baseline for Deep Metric Learning

KDD’18: Graph Convolutional Neural Networks for Web-Scale Recommender Systems

🤩 Conference

ICCV: International Conference on Computer Vision

http://iccv2019.thecvf.com/submission/timeline

CVPR: Conference on Computer Vision and Pattern Recognition

http://cvpr2019.thecvf.com/

Top Conference Paper Challenge:

https://medium.com/@arthurlee_73761/top-conference-paper-challenge-2d7ca24115c6

My Website:

https://light0617.github.io/#/

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Arthur Lee

Arthur Lee

An machine learning engineer in Bay Area in the United States