PySpark Pivoting

Today's post covers the following:

  • Basic pivot operation
  • Pivot with multiple aggregations
  • Conditional pivoting
  • Pivoting with specified column values

PySpark Data Filtration

Todays post covers the following:

  • Filtration by column value (one or multiple conditions)
  • String related filtration using like / contains
  • Missing data filtration
  • List based filtration using isin
  • General data clearning operations

PySpark Pipelines

Todays post covers the following:

  • Missing data treatment classification pipeline
  • Feature scaling using ScandardScaler classification pipeline
  • TF-IDF corpus classification pipeline
  • PCA dimensionality reduction classification pipeline

Neural Networks for Recommendation Systems

In this notebook we will look at how to use a neural network approach to making recommendations

  • The user/item pairings are the main source of data used to create recommendations
  • Scalar product of both the user_id and item_id embeddings will be our relevancy scores
  • User film interactions will be positive feedback & negative samples which will be created randomly are our negative samples
  • The dataset is split into two, train will be used to train a model on historical user data, test will be used to provide user recommendations
  • What we will be telling the model is to learn and differentiate between the films they actually watched apart from those they haven’t (ideally)
  • We have already looked at DSSM in a previous notebook , well be simplifying things a little here, not including user and item features and will keep things more simple.

Neural Collaborative Filtering

In this post we'll cover some of the basics of recommendation system approaches utilising neural networks.

  • Collaborative filtering (CF) is a recommendation generation method that relies on user-item interactions to make predictions about what a user might like based on the historical data of user interaction with the item.

  • We covered collaborative filtration using matrix factorisation is the notebook.

  • Specifically we looked at methods such as SVD in order to generate the user and item matrices, these two matrices are then multiplied together to get the corresponding scores for each user, item pairing; a model-based approach.

  • Neural Collaborative Filtering (NCF) bears some similarity to CF whilst leveraging the benefits of deep learning techniques to enhance recommendation performance.

Uplift Modeling Basics

Uplift modeling is a predictive modeling technique that aims to identify the individuals who are most likely to respond positively to a specific treatment or intervention. This technique is particularly useful in marketing and customer relationship management, where the goal is to target customers who are likely to be influenced by a marketing campaign or offer. By distinguishing between those who are positively influenced by the treatment and those who are not, uplift modeling helps organizations optimize their targeting strategies and maximize the return on investment of their marketing efforts.

SQL Analytics Problem

An interview question related to SQL knowledge from a financial bank which I thought was interesting so decided to share

  • The first part consists of standard SQL knowledge questions
  • The second part consists of a problem in which we will need to create some code for monitoring the number of hours an employee has worked, which we will be doing with python and posgres