RecSys ’18: Impact of item consumption on assessment of recommendations in user studies

Arthur Lee
4 min readJun 5, 2020


Recommendation system paper challenge (15/50)

paper link

What problem do they solve?

Analysis about how to do questionnaires of recommendation system

What do they found?

It is not easy to adequately measure user experience without allowing users to consume items.

For example, when a user saw the recommended song, what the user can see is the artist and title, it is hard for user to assess whether the recommended item is good or not before the user listen to.

How do they do experiments?

There are two groups for two topics. Music and Movie

For each topic, they set up two groups (S1, S2).


Group1 S1

Do pre-consume questionnaire

First, participants have to select 3 out of 110 Spotify genres.

Next, they will recommend 5 items to the users and include Song titles, artists, album titles and covers were displayed.

Later, participants were asked to rate their satisfaction with each recommendation and fill in the questionnaire.

Do post-consume questionnaire

After filling the questionnaires, participants were asked to listen to each recommended song for at least 30 sec with the possibility to stop, pause and forward

In the end, they had to rate the recommendations and fill in the questionnaire

So in group S1, we will have 2 questionnaires for each user.

Group2 S2

Only do post-consume questionnaire

First, participants have to select 3 out of 110 Spotify genres.

Next, they will recommend 5 items to the users

participants were asked to listen to each recommended song for at least 30 sec with the possibility to stop, pause and forward

In the end, they had to rate the recommendations and fill in the questionnaire

So in group S2, we will have 1questionnaires for each user.


Within-subject effects

If participants gave higher ratings to recommendations after listening to the recommended song? Yes!

Participants before consumption had difficulties to form a strong opinion

S1-pre and S1-Post follows normal distribution but S1-Pre has higher variance.

Between-subject effects

If S1-Pre and S2-Post show differences? NO!

Conclusion for music

It shows that the typical design of Recommendation System studies may contribute to an inaccurate picture compared to when users can experience items.

However, recommending known items not only decrease user satisfaction but also hurts recommendation goal.

How about Movie?

Group1 S1

Do pre-consume questionnaire

First, participants have to provide demographics and select 1 category out of “Horror, Mystery & Thriller”, “Comedy & Romance” or “Drama

Next, they will recommend 3 movies to the users and include movie titles, genres, posters, metadata on director and cast, and (subjective) description texts by the article’s author.

Later, participants were asked to rate their satisfaction with each recommendation and fill in the questionnaire.

Do post-consume questionnaire

After filling the questionnaires, participants were asked to choose 1 movie to watch and then pick another 1 and last 1 and records watching time.

In the end, they had to rate the recommendations and fill in the questionnaire

So in group S1, we will have 2 questionnaires for each user.

Group2 S2

Only do post-consume questionnaire

First, participants have to provide demographics and select 1 category out of “Horror, Mystery & Thriller”, “Comedy & Romance” or “Drama

Next, they will recommend 3 movies to the users

participants were asked to choose 1 movie to watch and then pick another 1 and last 1 and records watching time.

In the end, they had to rate the recommendations and fill in the questionnaire

So in group S2, we will have 1questionnaires for each user.

Conclusion for movie

Movies looks not too much difference between Pre-S1, Post-S1 because before watching movie, users already get enough information about the description texts by the article’s author.


Overall, it indicate that it highly depends on domain as well as type and amount of information provided alongside recommendations whether the actual experience can sufficiently be substituted.

Only when presenting adequate information, participants’ responses may be reliable

Other related blogs:

Beyond Clicks: Dwell Time for Personalization

RecSys’15: Context-Aware Event Recommendation in Event-based Social Networks

RecSys16: Adaptive, Personalized Diversity for Visual Discovery

RecSys ’16: Local Item-Item Models for Top-N Recommendation

NAACL’19: Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence

RecSys ’17: Translation-based Recommendation

RecSys ’17: Modeling the Assimilation-Contrast Effects in Online Product Rating Systems: Debiasing and Recommendations

RecSys ’18: HOP-Rec: High-Order Proximity for Implicit Recommendation

Best paper in RecSys:

My Website:



Arthur Lee
Arthur Lee

Written by Arthur Lee

An machine learning engineer in Bay Area in the United States

No responses yet