Hyperparameter Tuning with Pipelines
This post is the last of the three posts on the titanic classification problem in pyspark
- In the last post, we started with a clearned dataset, which we prepared for machine learning, by utilising
StringIndexer
&VectorAssembler
, and then the model training stage itself. - These steps are a series of stages in the construction of a model, which we can group into a single
pipline
.pyspark
likesklearn
has such pipeline classes that help us keep things organised