Skip to content

Instantly share code, notes, and snippets.

@vsmelov
Last active March 2, 2018 07:35
Show Gist options
  • Save vsmelov/25cc93e2091c4cbc5deb8aaf156d3298 to your computer and use it in GitHub Desktop.
Save vsmelov/25cc93e2091c4cbc5deb8aaf156d3298 to your computer and use it in GitHub Desktop.
decorator for cache function result to parquet
def cache2parquet(func):
""" decorator for cache function result to parquet
i.e.
# define:
@cache2parquet
def smart_and_slow_calculations(spark, **other_kwargs):
# some smart and slow code
return df
# use:
spark = SparkSession.builder.getOrCreate()
df = smart_and_slow_calculations(spark=spark, param1=42)
Notice! One of kwargs must be 'spark' with using spark context
"""
import os
import functools
dump_path = '_{}.dump'.format(func.__name__)
@functools.wraps(func)
def wrapped(*args, **kwargs):
if os.path.exists(dump_path):
return kwargs['spark'].read.parquet(dump_path)
else:
df = func(*args, **kwargs)
df.write.parquet(dump_path)
return df
return wrapped
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment