Skip to content

Instantly share code, notes, and snippets.

Bulat Yaminov byaminov

  • Utrecht
View GitHub Profile
@byaminov
byaminov / strings-spark.py
Created Apr 11, 2019
Running Spark benchmarks to compare its string operations with Vaex and Pandas
View strings-spark.py
"""
Benchark ran on my laptop:
spark-submit --master local[*] benchmarks/strings-spark.py
To run it:
* Download and install Spark 2.4.0 (https://spark.apache.org/downloads.html)
* Run the Vaex & Pandas benchmark (https://github.com/vaexio/vaex/blob/master/benchmarks/strings.py),
the test.parquet file will be created
* Set `args_n` constant in this script to the same value you used for `n` variable,
e.g. `python strings.py -n8`.
You can’t perform that action at this time.