eggie5/recsys.md

## recsys.md

      
    Raw
  

              recsys.md
            
          
    Background

Let's start by pointing out that the method usually referred to as "SVD" that is used in the context of recommendations is not strictly speaking the mathematical Singular Value Decomposition of a matrix but rather an approximate way to compute the low-rank approximation of the matrix by minimizing the squared error loss. A more accurate, albeit more generic, way to call this would be Matrix Factorization. 
SVD

The basic idea is that we want to decompose our original and very sparse matrix into two low-rank matrices that represent user factors and item factors. This is done by using an iterative approach to minimize the loss function. The most common way to do this is by using Stochastic Gradient Descent, but others such as ALS are also possible. The actual loss function to minimize includes a general bias term and  two bias for both the user and the item. 
Data

We start w/ a user/item or utility matrix. $R$
In order to make recommendations on products a user hasn't seen yet, we need to fill in the blanks (sparse matrix).
Decomposition

I want to find the two vectors $u$ and $i$ to which if a linear combination is taken $i^Tu$ we can reconstruct the $R$ matrix.
I can calculate the decomposition using SVD or using matrix factorization w/ SGD. Typically in machine learning application we will use SGD matrix factorization technique.
Another alternative method to WRMF is BPR which still uses SGD but w/ a more sohpicasted cost function.
Rating Function

$$
\hat{r}(user,item)=rating
$$
Cost Function

$$
MSE=\frac{\sum_t{[\hat{r}(user,item) - r(user,item)]^2}}{T}
$$
So we want to minimize this cost function:
$$
argmin_\theta \sum_{(u,i)\in{train}}{[\hat{r}(u,i) - r(u,i)]^2}
$$
Vanilla SVD or The Latent Factor Model:

$$
\hat{r}(u,i)=p_u^Tq_i
$$
where $q_i$ is a vector associated w/ each item $i$ and $p_u$ is a vector associated w/ each user $u$.  $p$ and $q$ are the parameters we want to optimize, so we can update our cost function w/ a regularized term:
$$
argmin_{p,q} \sum_{(u,i)\in{train}}{[\hat{r}(u,i) - r(u,i)]^2} + \lambda(||p_u||^2 +||q_i||^2)
$$
Train the model w/ GD to get the parameters:
$$
p_u \leftarrow p_u + \eta(q_i-\lambda p_u)\
q_i \leftarrow q_i + \eta(p_u-\lambda q_i)\
$$
It is also common to add $\beta_u$ and $\beta_i$ bias terms to the cost function to offset average ratings.
Now that we have the user and item factors computed we can complete the matrix:
$$
\hat{r} = p_u^T q_i
$$
SVD++ (Implicit Feedback)

In the conventional collaborative filtering algorithm, either we use user-based neighborhood model or item-based neighborhood model i.e. the predicted rating of an item by an user is either determined by ratings given by the same user to similar items (found by pearson correlation) or by rating received on the item by users similar to the current user. So here the rating behavior is concentrated on a very small subset of potential similar items or similar users but discards the edge cases contained in the other ratings.
In SVD++ both  the methods are combined into one to improve precision and recall of the recommender system.  Check this paper out for more details.