By Anthony Vilarim Caliani
This is an experiment using Spark array functions.
In this example I'm using a Fraudulent Transactions Data dataset, so thanks to Chitwan Manchanda for sharing his dataset.
If you want to execute it follow the next steps.
Download the dataset from Kaggle, then add the file into the folder ./data/raw/
Now, install python dependencies.
You must have Java installed in order to make it work correctly.
# 👇 Setting PyEnv version
pyenv local 3.10.6
# 👇 Virtual Environment
python -m venv .venv \
&& source .venv/bin/activate \
&& python -m pip install --upgrade pip
# 👇 Dependencies
pip install pyspark==3.3.1
spark-submit main.py