Skip to content

July, 2025

PySpark Pivoting

Today's post covers the following:

  • Basic pivot operation
  • Pivot with multiple aggregations
  • Conditional pivoting
  • Pivoting with specified column values

PySpark Data Filtration

Todays post covers the following:

  • Filtration by column value (one or multiple conditions)
  • String related filtration using like / contains
  • Missing data filtration
  • List based filtration using isin
  • General data clearning operations

PySpark Pipelines

Todays post covers the following:

  • Missing data treatment classification pipeline
  • Feature scaling using ScandardScaler classification pipeline
  • TF-IDF corpus classification pipeline
  • PCA dimensionality reduction classification pipeline