RecSys ’18: Impact of item consumption on assessment of recommendations in user studies
Recommendation system paper challenge (15/50)
What problem do they solve?
Analysis about how to do questionnaires of recommendation system
What do they found?
It is not easy to adequately measure user experience without allowing users to consume items.
For example, when a user saw the recommended song, what the user can see is the artist and title, it is hard for user to assess whether the recommended item is good or not before the user listen to.
How do they do experiments?
There are two groups for two topics. Music and Movie
For each topic, they set up two groups (S1, S2).
Music
Group1 S1
Do pre-consume questionnaire
First, participants have to select 3 out of 110 Spotify genres.
Next, they will recommend 5 items to the users and include Song titles, artists, album titles and covers were displayed.
Later, participants were asked to rate their satisfaction with each recommendation and fill in the questionnaire.
Do post-consume questionnaire
After filling the questionnaires, participants were asked to listen to each recommended song for at least 30 sec with the possibility to stop, pause and forward
In the end, they had to rate the recommendations and fill in the questionnaire
So in group S1, we will have 2 questionnaires for each user.
Group2 S2
Only do post-consume questionnaire
First, participants have to select 3 out of 110 Spotify genres.
Next, they will recommend 5 items to the users
participants were asked to listen to each recommended song for at least 30 sec with the possibility to stop, pause and forward
In the end, they had to rate the recommendations and fill in the questionnaire
So in group S2, we will have 1questionnaires for each user.
Result
Within-subject effects
If participants gave higher ratings to recommendations after listening to the recommended song? Yes!
Participants before consumption had difficulties to form a strong opinion
S1-pre and S1-Post follows normal distribution but S1-Pre has higher variance.
Between-subject effects
If S1-Pre and S2-Post show differences? NO!
Conclusion for music
It shows that the typical design of Recommendation System studies may contribute to an inaccurate picture compared to when users can experience items.
However, recommending known items not only decrease user satisfaction but also hurts recommendation goal.
How about Movie?
Group1 S1
Do pre-consume questionnaire
First, participants have to provide demographics and select 1 category out of “Horror, Mystery & Thriller”, “Comedy & Romance” or “Drama
Next, they will recommend 3 movies to the users and include movie titles, genres, posters, metadata on director and cast, and (subjective) description texts by the article’s author.
Later, participants were asked to rate their satisfaction with each recommendation and fill in the questionnaire.
Do post-consume questionnaire
After filling the questionnaires, participants were asked to choose 1 movie to watch and then pick another 1 and last 1 and records watching time.
In the end, they had to rate the recommendations and fill in the questionnaire
So in group S1, we will have 2 questionnaires for each user.
Group2 S2
Only do post-consume questionnaire
First, participants have to provide demographics and select 1 category out of “Horror, Mystery & Thriller”, “Comedy & Romance” or “Drama
Next, they will recommend 3 movies to the users
participants were asked to choose 1 movie to watch and then pick another 1 and last 1 and records watching time.
In the end, they had to rate the recommendations and fill in the questionnaire
So in group S2, we will have 1questionnaires for each user.
Conclusion for movie
Movies looks not too much difference between Pre-S1, Post-S1 because before watching movie, users already get enough information about the description texts by the article’s author.
Conclusion
Overall, it indicate that it highly depends on domain as well as type and amount of information provided alongside recommendations whether the actual experience can sufficiently be substituted.
Only when presenting adequate information, participants’ responses may be reliable
Other related blogs:
Beyond Clicks: Dwell Time for Personalization
RecSys’15: Context-Aware Event Recommendation in Event-based Social Networks
RecSys16: Adaptive, Personalized Diversity for Visual Discovery
RecSys ’16: Local Item-Item Models for Top-N Recommendation
NAACL’19: Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence
RecSys ’17: Translation-based Recommendation
RecSys ’18: HOP-Rec: High-Order Proximity for Implicit Recommendation
Best paper in RecSys:
https://recsys.acm.org/best-papers/
My Website: