https://www.pinterest.com/pin/121526889927572667/

Engineer Blog Note 3: embedding at Twitter

Arthur Lee
2 min readAug 2, 2020

blog link

My note:

  1. embedding is good to use -> easily extent

2. creating a pipeline and work flow is much important

Benefit:

  1. feature compression (comparing with 1-hot encoding)
  2. easy to compute online (Nearest neighbor search scenario)
  3. transfer learning (easily to apply to any deep learning model)

Engineering

  1. quality and relevant: how to measure the quality?
  2. Creation and consumption with ease: do we need to build an internal tool to monitor the performance of new embedding and easily understand the insight?
  3. Sharing and discoverability

Embedding pipeline

Item selection.

What kind of items should we consider to build an embedding?

There are follow up questions:

Is this embedding for general purpose or specific purpose?

How will other team use this embedding?

What are their use cases?

Data preprocessing.

  • skip-gram word embedding pipeline
  • user graph embeddings pipeline

Model fitting.

  • skip-gram word embedding pipeline: gradient-descent + negative-sampling
  • follow graph SVD pipeline: SVD -> embedding

Benchmarking.

  • User topic prediction: ROC-AUC for logistic regression
  • Metadata prediction: ROC-AUC for logistic regression
  • User follow Jaccard: verification

Feature store registration.

publishing the embeddings to the “feature store”

Other engineering blog note:

Engineer Blog Note 2: a real world visual discovery system

Engineer Blog Note 1: Contextual relevance in ads ranking

My Website:

https://light0617.github.io/#/

--

--

Arthur Lee
Arthur Lee

Written by Arthur Lee

An machine learning engineer in Bay Area in the United States

No responses yet