RecSys’10: Multiverse Recommendation: N-dimensional Tensor Factorization for Context-aware Collaborative Filtering

Recommendation system paper challenge (3/50)

Paper Link

Why this paper?

RecSys’10 and top popular recommendation system paper.

Instead of popular Matrix Factorization, they apply Tensor Factorization which easily include context information.

The data?

The metric?


What problem they want to solve? Rating model with time (context) varying user preference

They applied MAE to measure the performance of rating.

How others solve this problem?

There are more and more context information we can retrieve, such as mobile location, time.

Traditional researchers applied

contextual pre-filtering: context drives data selection

contextual pre-filtering: filter items after they finish modeling

contextual modeling: context integrated into modeling

Factorization models have recently become one of the preferred approaches to Collaborative Filtering. timeSVD++ is better than non-temporal SVD model.

Other authors have also introduced time as a specific factor model.

Bayesian Probabilistic TF model to capture the temporal evolution of online shopping preferences.

Collaborative tag Recommendation: user, item, tag model to represent the target variable which in their case are tags coded as binary vectors.

HOSVD-decomposition: High Order Singular Value decomposition


  • Generalize efficient MF approaches to the N-dimensional case in a compact way
  • Include any number of contextual dimensions into the model itself
  • Benefit from several loss functions designed to fine-tune the optimization criteria
  • Train the model with a fast and straightforward algorithm
  • Take advantage of the sparsity of the data while still exploiting the interaction between all users-items and context.

What model do they propose? MULTIVERSE RECOMMENDATION

Difference from Existing HOSVD method:

  1. DO NO treat missing as 0, avoiding bias against unobserved. Existing HOSVD methods need dense matrix.
  2. Applying Regularized
  3. NO need orthogonality constrains on the factors


  1. No need for pre- or post-filtering: split, pre, post-filtering usually loss interaction information
  2. Computational simplicity
  3. Ability to handle N-dimensions

Loss Function:


Avoid overfitting, they also do regularization

Therefore, their objective function :

They apply Subspace descent (optimized one and fixed others) and SGD (stochastic gradient decent)

How to handle missing information?

They apply all or zero strategy.

  1. zero: update U and M with skipping all context
  2. all: update U and M with updating all context, using step size to control update rate

Baseline model

Reduction: classical Collaborative Filtering with contextual information to the representation of users and items and based on OLAP.

Splitting: It overcomes the computational issues of the reduction based approaches and provides a more dynamic solution.


They also apply semi-synthetic data on Yahoo.


Other related blogs:

Trust-aware recommender systems

Performance of recommender algorithms on top-n recommendation tasks

Best paper in RecSys:

My Website:



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Arthur Lee

Arthur Lee

An machine learning engineer in Bay Area in the United States