RecSys’10: Performance of recommender algorithms on top-n recommendation tasks

Arthur Lee
3 min readApr 23, 2020

--

Recommendation system paper challenge (2/50)

Paper Link

Why I write this blog?

The main reason is that I want to take some notes so that in the future, I can quickly recall it and do have to add so many favorite papers for that. Besides that, this paper is really great and popular!

Why this paper?

RecSys’10 and it has high citation.

Furthermore, it do huge contribution to the current research. At that time, most researchers applied RMSE to recommend items to users.

However, they found improvements in RMSE often do not translate into accuracy improvements. Moreover, they provide better evaluation methods and better models.

The data?

Movielens, Netflix

The metric?

Recall and precision

What problem they want to solve? Recommending top-k problem

They applied Recall and Precision to measure the performance, which directly connects to the real on-line application.

What model do they propose?

1. Non-normalized Cosine Neighborhood

The original neighboring method is this formula (CorNgbr):

They proposed this formula (NNCosNgbr):

They remove denominator, which normalizes to rating range [1,5].

First reason is rating number does not matter for ranking.

Second reason is d controls the confidence, giving more weights for more number of reviewing. In this way, the estimated of raring is more reliable.

Besides, they don’t use Pearson similarity (only compute overlap items). They consider cosine-similarity , taking missing value as 0.

2. PureSVD

Traditional SVD:

How to deal with missing rating?

Traditional approach is estimating them as baseline. -> Too huge data, not feasible.

Later, some researchers consider ignore them and regularization to avoid overfitting.

As a result, there are two baseline model:

AsySVD: It represents users as a combination of item features.

SVD++: It does not represent users as a combination of item features. Yet, it is with highest quality in RMSE-optimized factorization methods.

Their model:(PureSVD)

Hence, r is observed, Q is the item features, qi is the target item vector. These values we can easily compute and get the score.

Here, r is score instead of a real rating.

Result

--

--

Arthur Lee
Arthur Lee

Written by Arthur Lee

An machine learning engineer in Bay Area in the United States

No responses yet