RecSys’13: Hidden Factors and Hidden Topics: Understanding Rating Dimensions with Review Text
Recommendation system paper challenge (5/50)
Why this paper?
RecSys’13 and top popular recommendation system paper.
What problem do they solve?
Recommendation system with review texts information.
What will I solve this problem?
I will consider to apply model A (maybe just pick top helpful reviews) to pick top 10 review for each item.
Then I will apply model B (RNN based) to encoding each review and apply Max pooling for these 10 to become 1 vector. So that we can get1 embedding for each item.
In the end, we can optimize the objective function which maximizes the similarity between item vector and user vector and minimizes the distance between the item vector and other item vector within same topic.
What others solve this problem?
Some researched ‘Aspects’, which explains dimensions along which ratings and reviews vary. Yet Aspects does not explain the variation present across entire review corpora. Why given same favor, some people like it but others do not.
Other researchers annotate each restaurants as many aspects (services, price,…) and try to get these information from the review and give them the weights. They have to annotate manually.
Other researchers also extract the dimension of the review but they do not combine it with rating problem. Some do that for feature discover; others for summarization. The most similar work is doing for sentiment analysis.
What is the baseline models?
Offset only: It only considers overall average rating, constant value.
Latent factor recommender system: standard latent factor model,
Product topics learned using LDA: standard latent factor model but during training time, we fixed item vector from LDA generated.
What model do they propose?
They propose HFT (Hidden Factors as Topics) which combine ratings with review text for product recommendations. Not only it can improve the rating performance but also it can do genre discovery and to suggest informative reviews.
- It can do a better job for normal recommendation system, improving 6%
- It allows us to address the cold-start problem
- It can automatically discover product categories or genres
- HFT can automatically identify representative reviews, which is top helpful review annotated by human
HFT, user topics: Topics in review text are associated with user parameters
HFT, item topics: Topics in review text are associated with item parameters
- For each item, they extract all reviews, define d_i and define the topic_i as theta_i
- They want to link item vector (gamma_i) and item topic vector (theta_i) together.
- For simple, they apply monotonic transform for item vector and item topic vector and they utilized kappa to control the entropy of the information discussed.
Equation 7 is similar with standard gradient-based methods with fixing topic terms.
Equation 8 updates topic assignments for all documents and words in the document. It is similar LDA but it does not use Dirichlet distribution sampling.
They improved over 20% for ‘subjective’ category, like clothing, shoes.
It is totally enough for K= 5, implying most of time, there are only few items been discussed in the typical reviews.
Users do not review all products with the same likelihood, but rather they have a preference towards certain categories
Review text provides significant additional information and our hope is that by including review text the relationship between products and ratings can be more accurately modeled.
From a modeling perspective, there is a simple explanation as to why HFT discovers ‘genre-like’ topics.
- Users are likely to rate products of the same genre similarly.
- Genres also explain much of the variation in review data
3. It can also correctly identify the right category
Other related blogs:
Best paper in RecSys: