Neural Networks for Recommendation Systems

In this notebook we will look at how to use a neural network approach to making recommendations

  • The user/item pairings are the main source of data used to create recommendations
  • Scalar product of both the user_id and item_id embeddings will be our relevancy scores
  • User film interactions will be positive feedback & negative samples which will be created randomly are our negative samples
  • The dataset is split into two, train will be used to train a model on historical user data, test will be used to provide user recommendations
  • What we will be telling the model is to learn and differentiate between the films they actually watched apart from those they haven’t (ideally)
  • We have already looked at DSSM in a previous notebook , well be simplifying things a little here, not including user and item features and will keep things more simple.

Neural Collaborative Filtering

In this post we'll cover some of the basics of recommendation system approaches utilising neural networks.

  • Collaborative filtering (CF) is a recommendation generation method that relies on user-item interactions to make predictions about what a user might like based on the historical data of user interaction with the item.

  • We covered collaborative filtration using matrix factorisation is the notebook.

  • Specifically we looked at methods such as SVD in order to generate the user and item matrices, these two matrices are then multiplied together to get the corresponding scores for each user, item pairing; a model-based approach.

  • Neural Collaborative Filtering (NCF) bears some similarity to CF whilst leveraging the benefits of deep learning techniques to enhance recommendation performance.

Uplift Modeling Basics

Uplift modeling is a predictive modeling technique that aims to identify the individuals who are most likely to respond positively to a specific treatment or intervention. This technique is particularly useful in marketing and customer relationship management, where the goal is to target customers who are likely to be influenced by a marketing campaign or offer. By distinguishing between those who are positively influenced by the treatment and those who are not, uplift modeling helps organizations optimize their targeting strategies and maximize the return on investment of their marketing efforts.

SQL Analytics Problem

An interview question related to SQL knowledge from a financial bank which I thought was interesting so decided to share

  • The first part consists of standard SQL knowledge questions
  • The second part consists of a problem in which we will need to create some code for monitoring the number of hours an employee has worked, which we will be doing with python and posgres

PySpark Daily Summary II

Continuing on where we left off last post, I'll be exploring pypspark on a daily basis, just to get more used to it. Here I will be posting summaries that cover roughtly 10 days worth of posts that I make on Kaggle, so that would equate to three posts a month

PySpark Daily Summary I

Something I decided would be fun to do on a daily basis; write pyspark code everyday and post about it, this is mainly because I don't use it as often as I would like, so this is my motivation. If you too want to join in, just fork the notebook (on Kaggle) and practice various pyspark codings everyday! Visit my telegram channel if you have any questions or just post them here!

Here I will be posting summaries that cover roughtly 10 days worth of posts that I make on Kaggle, so that would equate to three posts a month

Coding Linear Regression

Посмотрим на некий обзор главных моментов которые дадут нам возможность реализовать линейные модели в python и numpy. Посмотрим как отличается линейная регрессия от логистической, и как можно добавлять регуляризацию для этих моделей, чтобы можно было контролировать обобщающию способность модели. В этом разделе фокус на линейной регрессии

Coding Logistic Regression

Посмотрим на некий обзор главных моментов которые дадут нам возможность реализовать линейные модели в python и numpy. Посмотрим как отличается линейная регрессия от логистической, и как можно добавлять регуляризацию для этих моделей, чтобы можно было контролировать обобщающию способность модели. В этом разделе фокус на логистической регрессии

Prediction of Product Stock Levels

In this project, we work with a client Gala Groceries, who has contacted Cognizant for logistics advice about product storage

  • Specifically, they are interested in wanting to know how better stock the items that they sell.
  • Our role is to take on this project as a data scientist and understand what the client actually needs. This will result in the formulation/confirmation of a new project statement, in which we will be focusing on predicting stock levels of products.
  • Such a model would enable the client to estimate their product stock levels at a given time & make subsequent business decisions in a more effective manner reducing understocking and overstocking losses.

Prediction of customer stable funds volume

Твоей сегодняшней задачей как стажера нашего отдела будет научиться прогнозировать объем стабильных средств клиентов без сроков погашения, в данном конкретном случае это расчетные счета клиентов.

  • Почему это важно? Номинально, все средства на расчетных счетах клиенты могут в любой момент забрать из Банка, а в ожидании этого Банк не может их использовать в долгосрочном / среднесрочном плане (например, для выдачи кредитов)
  • Получается, что в такой ситуации Банк ничего не зарабатывает, но платит клиентам проценты по средствам на их счетах, пусть и не высокие, но в масштабах бизнеса Банка эти убытки могут быть значительны