Engineer Blog Note 3: embedding at Twitter
2 min readAug 2, 2020
My note:
- embedding is good to use -> easily extent
2. creating a pipeline and work flow is much important
Benefit:
- feature compression (comparing with 1-hot encoding)
- easy to compute online (Nearest neighbor search scenario)
- transfer learning (easily to apply to any deep learning model)
Engineering
- quality and relevant: how to measure the quality?
- Creation and consumption with ease: do we need to build an internal tool to monitor the performance of new embedding and easily understand the insight?
- Sharing and discoverability
Embedding pipeline
Item selection.
What kind of items should we consider to build an embedding?
There are follow up questions:
Is this embedding for general purpose or specific purpose?
How will other team use this embedding?
What are their use cases?
Data preprocessing.
- skip-gram word embedding pipeline
- user graph embeddings pipeline
Model fitting.
- skip-gram word embedding pipeline: gradient-descent + negative-sampling
- follow graph SVD pipeline: SVD -> embedding
Benchmarking.
- User topic prediction: ROC-AUC for logistic regression
- Metadata prediction: ROC-AUC for logistic regression
- User follow Jaccard: verification
Feature store registration.
publishing the embeddings to the “feature store”
Other engineering blog note:
Engineer Blog Note 2: a real world visual discovery system
Engineer Blog Note 1: Contextual relevance in ads ranking
My Website:
https://light0617.github.io/#/