Skip to content

Instantly share code, notes, and snippets.

View byaminov's full-sized avatar

Bulat Yaminov byaminov

View GitHub Profile
@byaminov
byaminov / strings-spark.py
Created April 11, 2019 08:45
Running Spark benchmarks to compare its string operations with Vaex and Pandas
"""
Benchark ran on my laptop:
spark-submit --master local[*] benchmarks/strings-spark.py
To run it:
* Download and install Spark 2.4.0 (https://spark.apache.org/downloads.html)
* Run the Vaex & Pandas benchmark (https://github.com/vaexio/vaex/blob/master/benchmarks/strings.py),
the test.parquet file will be created
* Set `args_n` constant in this script to the same value you used for `n` variable,
e.g. `python strings.py -n8`.