Engineer Blog Note2: a real world visual discovery system
Engineer blog link
Paper link
I learn many things in this blog.
But here, I do not summarize the blog. I just take some notes I want to remember.
Difference between relevance and engagement
By word definition, they are different surely.
But in most case, people like to make relevance be a proxy of engagement. Furthermore, Utilizing click / conversion be a proxy of relevance and engagement.
In Pinterest, they use human to verify the relevance not with click. Most of companies want to save money or in their application, it is unnecessary to verify the relevance due to easily verification or user request is very clear, like search engine. However, I remember even Microsoft also hire human to verify the relevance to avoid blind.
What is difference between relevance and engagement. In what situation, they would be totally different direction, not 100% positive correlation?
Intuitively, high relevance leads high engagement. People do not click / action in irrelevant object. However, in some unclear user request application, like Pinterest, Facebook. There is no very clear intention of user request or no request (Facebook), the things they recommend would not most relevant but lead user engagement.
In my opinion, recommendation system also convey this idea: discovery / exploring. If recommendation system always recommend high rating / most popular things, there is no benefit of using recommendation system. Exploring can give user more diverse ideas, inspire user, higher engagement.
In Pinterest, users request an image, instead searching same image, they do exploring for user to inspire the idea (diversity, styling similar). So they using blender to mix the different items to recommend.
I think even in traditional recommendation system, we can apply blender to mix high relevant, interesting, diverse, trending to recommend to users. Making recommendation system interesting.
Image Search is hard
Image has higher size than small text, which means storing cost is higher and latency is higher. Even we use embedding, it is still high and with embedding, how to shard?
Naive idea is do data partition by category. When user search some request, first identify the category and go to specific DB to recall top candidate with most similar embedding. When get the top candidate, we have its index and embedding and use its index to find the original image index, then utilize the original image index to find the image and show it. But I still think it is not easy work and more challenge.
It makes me think about a good machine learning system (recommendation system) has to consider business insight, infrastructure (latency, storing), modeling together.
Why do I read this blog?
I bought Pinterest stock, so it is very reasonable for an investor to watch their product, technology.
Besides that, as a machine learning engineer, I still need to learn more how other companies think the problem and solve their problem. So that I can bring the value to my team.
About me
I am new in the industry, still have many things to learn.
When I have free time, I will read some papers. Recently, I feel like read more tech blog is more beneficial and practical for me now.
https://www.linkedin.com/in/khl1147/
My Website:
If you’re interested in discussing the these things and learn these things together, please let me know and add my Linkedin for connection!