Training ML Models with PySpark
In this post, we will introduce ourselves to pyspark
- We are continuing on from the previous post PySpark Titanic Preprocessing, where we did some basic data preprocessing, here we will continue on with the modeling stage of our project
- We will be using
spark.ml.classificationto train binary classification models - There are quite a number of differences from
pandas, for example the formulation of aVectorAssemblercolumns, which combines all column features into one