Training ML Models with PySpark
In this post, we will introduce ourselves to pyspark
- We are continuing on from the previous post PySpark Titanic Preprocessing, where we did some basic data preprocessing, here we will continue on with the modeling stage of our project
- We will be using
spark.ml.classification
to train binary classification models - There are quite a number of differences from
pandas
, for example the formulation of aVectorAssembler
columns, which combines all column features into one