RecSys’10: Multiverse Recommendation: N-dimensional Tensor Factorization for Context-aware Collaborative Filtering
Recommendation system paper challenge (3/50)
Why this paper?
RecSys’10 and top popular recommendation system paper.
Instead of popular Matrix Factorization, they apply Tensor Factorization which easily include context information.
What problem they want to solve? Rating model with time (context) varying user preference
They applied MAE to measure the performance of rating.
How others solve this problem?
There are more and more context information we can retrieve, such as mobile location, time.
Traditional researchers applied
contextual pre-filtering: context drives data selection
contextual pre-filtering: filter items after they finish modeling
contextual modeling: context integrated into modeling
Factorization models have recently become one of the preferred approaches to Collaborative Filtering. timeSVD++ is better than non-temporal SVD model.
Other authors have also introduced time as a specific factor model.
Bayesian Probabilistic TF model to capture the temporal evolution of online shopping preferences.
Collaborative tag Recommendation: user, item, tag model to represent the target variable which in their case are tags coded as binary vectors.
HOSVD-decomposition: High Order Singular Value decomposition
- Generalize efficient MF approaches to the N-dimensional case in a compact way
- Include any number of contextual dimensions into the model itself
- Benefit from several loss functions designed to fine-tune the optimization criteria
- Train the model with a fast and straightforward algorithm
- Take advantage of the sparsity of the data while still exploiting the interaction between all users-items and context.
What model do they propose? MULTIVERSE RECOMMENDATION
Difference from Existing HOSVD method:
- DO NO treat missing as 0, avoiding bias against unobserved. Existing HOSVD methods need dense matrix.
- Applying Regularized
- NO need orthogonality constrains on the factors
- No need for pre- or post-filtering: split, pre, post-filtering usually loss interaction information
- Computational simplicity
- Ability to handle N-dimensions
Avoid overfitting, they also do regularization
Therefore, their objective function :
They apply Subspace descent (optimized one and fixed others) and SGD (stochastic gradient decent)
How to handle missing information?
They apply all or zero strategy.
- zero: update U and M with skipping all context
- all: update U and M with updating all context, using step size to control update rate
Reduction: classical Collaborative Filtering with contextual information to the representation of users and items and based on OLAP.
Splitting: It overcomes the computational issues of the reduction based approaches and provides a more dynamic solution.
They also apply semi-synthetic data on Yahoo.
Other related blogs:
Best paper in RecSys: