Neural Networks for Recommendation Systems¶
In this notebook we will look at how to use a neural network approach to making recommendations
- The user/item pairings are the main source of data used to create recommendations
- Scalar product of both the
embeddings will be our relevancy scores - User film interactions will be
feedback & negative samples which will be created randomly are ournegative
samples - The dataset is split into two,
will be used to train a model on historical user data,test
will be used to provide user recommendations - What we will be telling the model is to learn and differentiate between the films they actually watched apart from those they haven’t (ideally)
- We have already looked at
in a previous notebook , well be simplifying things a little here, not including user and item features and will keep things more simple.
We have summarised all preprocessing steps before the definition of datasets
and dataloaders
, so the sections associated with preprocessing take up less space
will help us keep things more compact, by utilising existing methods for preprocessingMovieLensPrepare
: Initialising this class will read the datasetpreprocess
: Calling this method will define filtration of low item count envents for each user, reset the user and item identifiers and convert the time based feature into something we can work withsplit_data
: Calling this method will create two subsets based on the last 20% of datetime splitfilter_test
: Calling this method will remove all the poorly rated events for each user
class MovieLensPrepare:
def __init__(self):
rs = MovieLens('100k') = rs.ratings
self.u_features = rs.users
self.i_features = rs.items
def preprocess(self):
data =
u_features = self.u_features
i_features = self.i_features
data = MinCountFilter(num_entries=20).transform(data)
# interactions and user & item features must be synchronised
data = data[data[config.USER_COL].isin(u_features[config.USER_COL].unique())]
data = data[data[config.ITEM_COL].isin(i_features[config.ITEM_COL].unique())]
print(f"Number of unique users {data['user_id'].nunique()}")
print(f"Number of unique items {data['item_id'].nunique()}")
# interactions and user & item features must be synchronised
data = data[data[config.USER_COL].isin(u_features[config.USER_COL].unique())]
data = data[data[config.ITEM_COL].isin(i_features[config.ITEM_COL].unique())]
data[config.TIMESTAMP] = pd.to_datetime(data['timestamp'],unit='s') = data
def split_data(self):
data =
u_features = self.u_features
i_features = self.i_features
splitter = TimeSplitter(time_threshold=0.2, # 20% into test subset
train,test = splitter.split(data)
print('train size',train.shape[0])
print('test size', test.shape[0])
# user features and item features must be present in interactions dataset and only
u_features = u_features[u_features[config.USER_COL].isin(train[config.USER_COL].unique())]
i_features = i_features[i_features[config.ITEM_COL].isin(train[config.ITEM_COL].unique())]
# encoders for users
encoder_user = LabelEncoder()[config.USER_COL])
# encoders for items
encoder_item = LabelEncoder()[config.ITEM_COL])
train[config.USER_COL] = encoder_user.transform(train[config.USER_COL])
train[config.ITEM_COL] = encoder_item.transform(train[config.ITEM_COL])
test[config.USER_COL] = encoder_user.transform(test[config.USER_COL])
test[config.ITEM_COL] = encoder_item.transform(test[config.ITEM_COL])
u_features[config.USER_COL] = encoder_user.transform(u_features[config.USER_COL])
i_features[config.ITEM_COL] = encoder_item.transform(i_features[config.ITEM_COL])
self.train = train
self.test = test
self.u_features = u_features
self.i_features = i_features
def filter_test(self):
filter_rating = LowRatingFilter(value=4)
self.test = filter_rating.transform(self.test)
The parameters which we will be using are as follows, mainly noting that we are using rating
as our feedback column
from dataclasses import dataclass
class config:
USER_COL : str = 'user_id'
ITEM_COL : str = 'item_id'
RATING_COL : str = 'rating'
TIMESTAMP : str = 'timestamp'
NUM_EPOCHS : int = 30
K = 10
SEED = 123
config = config()
1 | Load Dataset¶
We will be using a simplified dataset MovieLens
with 100,000 interactions of user_id
with films item_id
Today, we will be focusing on recommendations using only interactions
2 | Preprocessing¶
contains a handy & quick way for preprocessing interactionsMinCountFilter
can be used for filtering our interactions that have less than num_entries- Lets use this method for removing user interactions with less than 20 items
3 | Splitting Dataset in time¶
- The next step after preprocessing the dataset to our liking is to split it into subsets, so we can train the model on one subset and use another for model validation (20%)
- replay has a function named
, which we will use to create our subsets
class TimeSplitter(replay.splitters.base_splitter.Splitter) | TimeSplitter(time_threshold: Union[datetime.datetime, str, float], query_column: str = 'query_id', drop_cold_users: bool = False, drop_cold_items: bool = False, item_column: str = 'item_id', timestamp_column: str = 'timestamp', session_id_column: Optional[str] = None, session_id_processing_strategy: str = 'test', time_column_format: str = '%Y-%m-%d %H:%M:%S')
4 | Rating Filter¶
We want to recommend only items that have been rated highly, so for the test
subset, we will be using LowRatingFilter
to remove iteractions with low ratings
So what we have going into the next part
(training subsettstudy.test
(test subset)
Let's take a look at a sample from the training set
user_id | item_id | rating | timestamp | |
1000138 | 5399 | 789 | 4 | 2000-04-25 23:05:32 |
1000153 | 5399 | 2162 | 4 | 2000-04-25 23:05:54 |
999873 | 5399 | 573 | 5 | 2000-04-25 23:05:54 |
1000007 | 5399 | 1756 | 4 | 2000-04-25 23:06:17 |
1000192 | 5399 | 1814 | 5 | 2000-04-25 23:06:17 |
5 | Create Torch Dataset¶
We need to create a torch dataset from our matrix of interactions data
, which will be passing data to our model
- The dataset
for each index inputs theuser_id
(which will be our positive feedback) from the interaction dataset : (positive_item_id) - Additionally for this user
, we generate an additional number of randomitem_id
which will be the negative samples, which the user hasn't watched, we’ll be adding 10 to the 1 positive - Both of these are concatenated into a single array vector (
) - Lastly we also return the labels, corresponding to either the
positive (1)
ornegative (0)
sample id
from import Dataset, DataLoader
class TowerTrain(Dataset):
def __init__(self,
# user, item = data[[config.USER_COL,config.ITEM_COL]].to_numpy()
self.num_negatives = num_negatives
self.num_items = len(np.unique([:, 1]))
self.i_features = i_features
self.u_features = u_features
def __len__(self):
return len(
# get item of row in data
def __getitem__(self, idx):
# index to -> user_id, item_id
user_id, pos_item_id =[idx, 0],[idx, 1]
# create positive, negative samples
# torch tensor for each item_id (pos sample) create 10 neg samples
items = torch.tensor(np.hstack([pos_item_id,
# set all labels to 0
labels = torch.zeros(self.num_negatives + 1, dtype=torch.float32)
labels[0] = 1. # positive label
return {'user_ids': torch.tensor([user_id], dtype=torch.int32),
'item_ids': items,
'labels': labels}
To demonstrate the output of the data class, let’s create the dataset
and subsequent dataloader
, setting a batch size of 2
# create dataset
ds_train = TowerTrain(study.train)
# create data loader
dl_train = DataLoader(ds_train,
batch = next(iter(dl_train))
As we can see we get a batch of user identifiers, their array of items and corresponding labels to specify which item is a positive or negative sample
{'user_ids': tensor([[ 91],
[320]], dtype=torch.int32),
'item_ids': tensor([[ 565, 1534, 1389, 1406, 1346, 1122, 1041, 106, 1147, 1593, 1238],
[ 317, 96, 113, 638, 47, 73, 1568, 942, 224, 111, 1433]],
'labels': tensor([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])}
6 | Model Definition¶
We will be creating a subclass SimpleTower
, which only includes the embeddings of both user_id
and item_id
when we’ll define them in the main class
- We can recall that for matrix factorisation approaches, we get the score matrix by using the scalar product of user and item embedding, similarly we will take the same approach to calculate the score for each user/item combination in the row
- The
method, when called simply returns the user/item row of the corresponding embedding matrix - And calculates the dot product between the
matrices returning the array for all user/item combinations (batch,11)
# subclass contains only embedding layer but we can
# expand on this by importing user, item features
class SimpleTower(nn.Module):
def __init__(self, num_embeddings, emb_dim):
self.emb = nn.Embedding(num_embeddings, emb_dim)
def forward(self, ids, features=None):
return self.emb(ids)
class BaseTwoHead(nn.Module):
def __init__(self,
self.emb_dim = emb_dim
self.user_tower = SimpleTower(emb_dim=emb_dim, **user_config) # (emb_dim,n_users)
self.item_tower = SimpleTower(emb_dim=emb_dim, **item_config) # (emb_dim,n_items)
# forward method defines two 'towers'
# and the scalar product of the two
# which will gives us the scores
def forward(self, batch):
item_emb = self.item_tower(batch["item_ids"]) # (batch,1,16)
user_emb = self.user_tower(batch["user_ids"]) # (batch,11,16)
dot_product = (user_emb * item_emb).sum(dim=-1) # (batch,11)
return dot_product
# methods for extracting embeddings
def infer_users(self, batch):
return self.user_tower(batch["user_ids"])
def infer_items(self, batch):
return self.item_tower(batch["item_ids"])
We’ll be defining several dictionaries, which will store the common settings, setting for users and items
: stores the common embedding dimension size emb_dimuser_config
: stores data about the useritem_config
: stores data about the item
# model parameters
embed_config = {'emb_dim' : 16} # embedding dimension
user_config = {'num_embeddings' : study.train[config.USER_COL].max() + 1,} # number of users
item_config = {'num_embeddings' : study.train[config.ITEM_COL].max() + 1,} # number of items
# import the embedding dimension
model = BaseTwoHead(**embed_config,
(user_tower): SimpleTower(
(emb): Embedding(751, 16)
(item_tower): SimpleTower(
(emb): Embedding(1616, 16)
Model forward
- The output of the model will give us the logits for each of the 11 items, for each user row
tensor([[ 1.6632, 5.8888, 0.0997, 7.6885, 8.2156, 4.0495, 3.0272,
1.9775, -1.8750, 4.3952, 0.2714],
[ 5.3873, -10.4797, -4.2230, -0.4488, 0.9215, -5.0823, -0.5018,
4.9579, 0.8251, -6.3608, -4.5723]], grad_fn=<SumBackward1>)
7 | Preparing Loaders¶
We have already previously created a sample dataloder, now let’s create both for the two subsets
# create train dataset
ds_train = TowerTrain(study.test)
# create test data loader
dl_train = DataLoader(ds_train,
# create test dataset
ds_test = TowerTrain(study.test)
# create test data loader
dl_test = DataLoader(ds_test,
8 | Modeling Iteration¶
Let’s define the optimiser and loss function which are pretty much standard across other binary classification problems
And also the training loop is standard, we’ll be looping through a fixed number of epoch
and passing the batches and predicting, calculating the loss, do a step of backpropagation
, calculating the gradients and updating the model weights via the optimiser
train_loss_per_epoch = []
test_loss_per_epoch = []
# loop through all epochs
for epoch in tqdm(range(config.NUM_EPOCHS)):
# training loop for all batches
train_loss = 0.0
for iteration, batch in enumerate(dl_train):
preds = model(batch)
loss = loss_fn(preds, batch['labels'])
train_loss += loss.item()
train_loss /= len(dl_train)
# evaluation loop for all batches
test_loss = 0
for iteration, batch in enumerate(dl_test):
preds = model(batch)
loss = loss_fn(preds, batch['labels'])
test_loss += loss.item()
# evaluation of loss
test_loss /= len(dl_test)
- So in turn, our model is learning to classify between
samples for each row of data - Once the model is finished learning, we can utilise the model methods and extract the embeddings from the two towers.
- And save the model as well for future use!
9 | Generating user recommendations¶
Time has come to use our trained model
- We will be making recommendations by using the model that we trained on the train dataset and using the test users to make predictions
- To make predictions, we will extract the embedding matrix weights for user and items, calculate the scores, get the top k results for each user based on the largest score values
9.1. Load Weights¶
First things first, we need to load the model weights, and put it in inference mode
model = BaseTwoHead(**config, user_config=user_config, item_config=item_config)
9.2. Get test users¶
Get the user identifiers that are in the test test, the test set was saved in study.test
user_id | |
0 | 2 |
1 | 233 |
2 | 736 |
3 | 49 |
4 | 600 |
9.3. Extract Weights¶
Extract the embedding weights for all users and items which is located in the model
# extract the user / item embedding weights
user_embed = model.user_tower.emb.weight.detach().cpu().numpy()
item_embed = model.item_tower.emb.weight.detach().cpu().numpy()
user_embed.shape, item_embed.shape
9.4. Scalar product¶
Calculate the scores for each user & item combination by calculating the scalar product of them
# calcualate the scores (751,1616)
scores = user_embed[test_users[config.USER_COL].values] @ item_embed.T
[[-2.219962 -2.8183699 -1.2701275 ... -1.7878596 -2.3029149
-5.1351438 ]
[-0.2002018 -3.269224 -3.5974343 ... -5.4825845 -4.0557184
-4.9202886 ]
[-0.24603942 -1.9250925 -1.2330636 ... -4.066546 -3.6852539
-6.3292623 ]
[ 1.3434778 -2.2150192 -1.8992031 ... -4.7611713 -4.1526904
-5.917045 ]
[ 0.067677 -2.6156569 -2.6362207 ... -3.8871505 -3.1315584
-3.5736673 ]
[-1.3127992 -1.5567051 -1.1855109 ... -2.6913378 -3.2935755
-5.5215263 ]]
9.5. Get highest scores¶
Get the highest value indicies (idx) & their corresponding values (scores). The scores correspond to the index of the item in the encoder encoder_item
, which we stored in class instance study
# get top 10 idx by value & get its value
ids = np.argpartition(scores, -config.K)[:, -config.K:]
scores = np.take_along_axis(scores, ids, axis=1)
array([[ 1.3017656 , 1.4262905 , 1.4305891 , 1.5401053 , 1.5945268 ,
1.9945638 , 1.9178314 , 2.8111196 , 1.5959901 , 2.221249 ],
[-0.02534078, 1.0504715 , 0.6823742 , 0.6663627 , -0.00748574,
0.5298525 , 0.49601346, 0.32487705, 0.04160966, 0.02862556],
[ 1.7142106 , 1.8349895 , 2.43454 , 2.896079 , 3.0631516 ,
2.1554096 , 1.8832399 , 2.087269 , 3.876807 , 2.2215443 ],
[ 0.2731401 , 0.30537376, 0.3488819 , 0.53589934, 1.0000901 ,
0.77159363, 0.6785181 , 0.7471067 , 0.55528575, 1.0426229 ],
[ 0.89288795, 0.92402935, 0.97583646, 0.98947227, 1.0060023 ,
1.1556187 , 1.4170016 , 1.4296795 , 1.7379148 , 1.2944818 ]],
9.6. Recommendations Matrix¶
Prepare the usual format, user_id
, item_id
and rating rating
, which will enable us to quickly evaluate the metrics using experiment
function from replay. We need to add both lists to each user & expand them together
# prepare recommendations matrix
def prepare_recs(test_users,
predict = test_users.copy()
predict[config.ITEM_COL] = rec_item_ids.tolist() # add list of indicies for each user
predict['rating'] = rec_relevances.tolist() # add rating list of scores for each user
predict = predict.explode(column=[config.ITEM_COL, 'rating']).reset_index(drop=True) # expand both lists
predict[config.ITEM_COL] = predict[config.ITEM_COL].astype(int)
predict['rating'] = predict['rating'].astype("double")
return predict
model_recommendations = prepare_recs(test_users, # user columns
rec_item_ids=ids, # indicies of top 10 in scores
rec_relevances=scores) # scores of top 10
user_id | item_id | rating | |
0 | 2 | 302 | 1.30177 |
1 | 2 | 218 | 1.42629 |
2 | 2 | 233 | 1.43059 |
3 | 2 | 139 | 1.54011 |
4 | 2 | 6 | 1.59453 |
We'll evaluate the prediction & test overlapping items using hitrate, to measure how well the model predicts at least one relevant recommendation for users. NDCG, for the evaluation of how well the model can correcly order the relevant items & coverage to measure how well the model predicts a range of items from all available items
metrics = Experiment(
[NDCG(config.K), HitRate(config.K), Coverage(config.K)],
metrics.add_result("dssm_model", model_recommendations)
NDCG@10 | HitRate@10 | Coverage@10 | |
dssm_model | 0.0345152 | 0.221053 | 0.191213 |
Thank you for reading!
Any questions or comments about the above post can be addressed on the mldsai-info channel or to me directly shtrauss2, on shtrausslearning or shtrausslearning or simply below!