KDD 21':Learning to Embed Categorical Features without Embedding Tables for Recommendation (google)

🤗 Recommendation system paper challenge (30/50)

🤔 What problem do they solve?

  • 當item_ids太多,導致table太大塞不下memory
  • 有新的item出現,要一直成長我們的table
  • encoding又分為hashing + transform
  • hashing: 用大量的hashing (k個不同的hashing functions),把一個item_id轉乘k維的vector (一個hash轉乘一個數字,有k個hash就有k個數字,把他們concate一起形成k*1維的vector)
  • transform: 針對”k*1維的vector”去做normalization (uniform or Gaussian),主要是把entropy越高越好(每一個維度的數字分配越平均越好,1-hot-encoding就是entropy很低,1-d value-vector則是最高),output一樣是k*1維的vector
  • decoding 則是把output from transform, 拿來跑DNN (MLP)希望讓network去學怎麼把transform後的vector對應回我們要的embedding

😎設計巧思

定義好的encoding

  • Uniqueness
  • Equal Similarity
  • High dimensionality
  • High Shannon Entropy

設計decoding

  • Uniform: [-1, 1]
  • Gaussian distribution

心得

結語

🙃 Other related blogs:

🤩 Conference

--

--

An machine learning engineer in Bay Area in the United States

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store