(10) Pinterest

[Eng blog: Doordash-21] Using Triplet Loss and Siamese Neural Networks to Train Catalog Item Embeddings

Arthur Lee

--

🤗 Machine learning Engineer blog challenge (1/100)

Doordash — 2021

Using Triplet Loss and Siamese Neural Networks to Train Catalog Item Embeddings (doordash.engineering)

🤔 What problem do they solve?

他們想要建構出 catalog(品項) embedding

首先什麼是catalog in Doordash (可能有誤解)?

  • 同一家餐廳有很多的品項, 像是salmon sushi, boba tea, 而不同餐廳也可以share同個品項,每一家珍奶店都有個東西叫 “boba tea”
  • 讓boba tea 有自己的representation, 不同家的boba tea share 一樣的representation -> 裡面blog可能沒明說(也許我看漏) 但我從他們後來用品項去建構store embedding,應該是考慮這狀況,而且這樣更可以solve cold-start issue (new store)
  • For example, boba tea, tacos, burritos

🤔 What alternative do they consider but not use?

方法A: Word2vec embeddings on entity IDs

基本概念:對於相同session,我們有a list of item (catalog), and try to make them closer to each other

Detail: 在doordash, 他們使用 customer history such as views or purchases.

具體如何 verify same session也是要注意的地方, 像是如果放太寬,這個使用者過去一年的購買紀錄,可能昨天買sushi,今天買boba tea,可能會把兩個不同東西當成類似商品;放太嚴格的話,資料可能不夠,而商品具有global特性,所以session內也不用filter location

缺點:就是cold-start problem

作者提到有兩個缺點;一個是new catalog,另一個是稀少catalog in training data

心得:

這做法前題假設是”same session上的商品是類似的”

假設使用者在same session上的行為是瀏覽類似的商品,就很rely on 使用者行為,對於cold-item, new item會有問題

關於缺點進一步解釋,由於train這個embedding使用catalog-id 去train,本身就喪失了字面上的資訊,只仰賴使用者的行為去推論(這也是缺點,因為是假設使用者在same session上的行為是瀏覽類似的商品, not pure independent to context information)

可以做的地方是可以在train之前把每個catalog去跑bert,得到字面上的general information,再去跑word2vec (transfer learning) 這樣子就有字面上的資訊了

方法B: Embeddings from deep neural networks trained on a supervised task

基本概念就是用”supervised learning”去學

Detail:

心得:

因為用了”supervised learning”就很仰賴”label quality”

蠻好奇LSTM是否在”短詞”效果如何,因為LSTM比起RNN最大進展在於對於long text可以用forget gate去進行優化,但通常商品詞彙都很短

而且由於沒有pre-train (沒有基礎的文字意義下),蠻好奇這個neural network在(新/少量)的商品效果如何,如果data夠大也許可以很好推論出新商品的意義(depened on 新商品的性質是否很相似於已有的training data) -> sparse data issue (文中也提到)

方法C: Fine tuning a pre-trained language model such as BERT

跟方法B類似,但前面多加了BERT

transfer-learning的效果,一來對於sparse data, or 新商品,我們至少有一些資訊

Detail:

缺點:

model 太大, training is slow and inference is slow

心得

用Bert是非常直覺的做法,但現實中常常要考慮latency還有offline productivity的問題,也是很重要的trade-off

😎 Proposal solution: using self-supervised learning to train embeddings

基本概念是 “用<query,商品>pair去學embedding”

we use a Siamese Neural Network ( also called a Twin network) architecture with triplet loss.

Detail:

先做出data set (triplet pair: anchor, positive, negative)

先對每個raw text,做trigram preprocessing

丟到 bidirectional LSTM去跑

丟到deep learning去跑triplet loss

心得

這做法跟方法A蠻類似的,都是用”engagement data”去train,那有什麼不同?

First: 字詞(char-level)資訊

他的方法A並沒有根據”字詞level”去做parsing,只針對”商品ID”所以本身缺乏”character-level”資訊,所以對於cold-item的performance會很糟糕

而這邊有做preprocessing,還跑了bi-LSTM,已經得到一些char-level的資訊了,所以會好很多

Second: word database is bigger and less formalization

query跟商品都是字詞,藉由”使用者行為”可以讓model去學到不僅僅是商品的詞彙範圍還有query的詞彙範圍

他用了”query的字詞庫”,相比於方法A只用了”商品字詞庫”;通常”商品字詞庫”更加正規化,也更加不refresh;而是使用者的”query”可以是千千萬萬,非正規化的字詞,而且根據流行可以會query新的東西,但doordash商店可能還沒有的,這都大大幫助去學習”wider” and “refresh”的字詞,又比起BERT更加domain-specific

🙃 Other related blogs:

KDD 21':Learning to Embed Categorical Features without Embedding Tables for Recommendation (google)

KDD 19': Sampling-bias-corrected neural modeling for large corpus item recommendations

KDD 18': Notification Volume Control and Optimization System at Pinterest

CVPR19' Complete the Look: Scene-based Complementary Product Recommendation

NAACL’19: Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence

NIPS’2017: Attention Is All You Need (Transformer)

KDD’19: Learning a Unified Embedding for Visual Search at Pinterest

BMVC19' Classification is a Strong Baseline for Deep Metric Learning

KDD’18: Graph Convolutional Neural Networks for Web-Scale Recommender Systems

🤩 Conference

ICCV: International Conference on Computer Vision

http://iccv2019.thecvf.com/submission/timeline

CVPR: Conference on Computer Vision and Pattern Recognition

http://cvpr2019.thecvf.com/

KDD 2020

Machine Learning — DoorDash Engineering Blog

Top Conference Paper Challenge:

https://medium.com/@arthurlee_73761/top-conference-paper-challenge-2d7ca24115c6

My Website:

https://light0617.github.io/#/

--

--

Arthur Lee

An machine learning engineer in Bay Area in the United States