We will use R package recommenderlab and **100k-MovieLense** dataset. The data was collected through the MovieLens web site
(movielens.umn.edu) during Sept 1997 - Apr 1998. The data set contains ~100k ratings (1-5) from 943 users on 1664 movies. Each user has rated at least 19 movies. Note that the ratings matrix is stored with users corresponding to rows and movies corresponding to columns (different from what we had in the lectures).

In [2]:

```
library(recommenderlab)
data(MovieLense)
MovieLense
nusers=dim(MovieLense)[1]
nmovies=dim(MovieLense)[2]
```

In [3]:

```
#check how many movies have the users rated
summary(rowCounts(MovieLense))
```

In [4]:

```
MovieLenseMeta[1:5,1:10] #metadata about movies (feature vectors) are also available - we don't use them here!
```

We can visualise a part of the ratings matrix. There is lots of missing data!

In [5]:

```
image(MovieLense[sample(nusers,25),sample(nmovies,25)])
```

In [6]:

```
## create 90/10 split (known/unknown)
evlt <- evaluationScheme(MovieLense, method="split", train=0.9,
given=12)
evlt
tr <- getData(evlt, "train"); tr
tst_known <- getData(evlt, "known"); tst_known
tst_unknown <- getData(evlt, "unknown"); tst_unknown
```

Create a UBCF recommender, using Pearson similarity and 50 nearest neighbours.

In [7]:

```
## create a user-based CF recommender using training data
rcmnd_ub <- Recommender(tr, "UBCF",
param=list(method="pearson",nn=50))
## create predictions for the test users using known ratings
pred_ub <- predict(rcmnd_ub, tst_known, type="ratings"); pred_ub
## evaluate recommendations on "unknown" ratings
acc_ub <- calcPredictionAccuracy(pred_ub, tst_unknown);
as(acc_ub,"matrix")
```

In [8]:

```
#compare predictions with true "unknown" ratings
as(tst_unknown, "matrix")[1:8,1:5]
as(pred_ub, "matrix")[1:8,1:5]
```

Now, let us repeat the same thing with IBCF. On this dataset, it does not work as well.

In [9]:

```
## repeat with the item-based approach
rcmnd_ib <- Recommender(tr, "IBCF",
param=list(method="pearson",k=50))
pred_ib <- predict(rcmnd_ib, tst_known, type="ratings")
acc_ib <- calcPredictionAccuracy(pred_ib, tst_unknown)
acc <- rbind(UBCF = acc_ub, IBCF = acc_ib); acc
```

`ALS`

). We will use latent attributes of dimension $k=20$.

In [10]:

```
rcmnd_als <- Recommender(tr, "ALS",
param=list(n_factors=20))
pred_als <- predict(rcmnd_als, tst_known, type="ratings")
acc_als <- calcPredictionAccuracy(pred_als, tst_unknown)
acc <- rbind(UBCF = acc_ub, IBCF = acc_ib, ALS = acc_als); acc
```

In [12]:

```
recommenderRegistry$get_entries(dataType = "realRatingMatrix")
```