Skip to content

Instantly share code, notes, and snippets.

@kmader
Created May 8, 2017 15:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kmader/c4df1401b42dae3bcb655733b26c05d2 to your computer and use it in GitHub Desktop.
Save kmader/c4df1401b42dae3bcb655733b26c05d2 to your computer and use it in GitHub Desktop.
pyspark change environment and serializer

Change Configuration / Environment

Often it is important to change a core configuration setting in pyspark before running (like a serializer, PYTHONHASHSEED for python3 users, or

from pyspark import SparkContext
from pyspark.serializers import PickleSerializer
new_conf = sc._conf.setExecutorEnv('PYTHONHASHSEED', '1234')
sc.stop()
sc = pyspark.SparkContext(conf = new_conf, serializer = PickleSerializer())
@kmader
Copy link
Author

kmader commented May 8, 2017

A quick solution to a few serialization problems and issues with the exception

Exception: Randomness of hash of string should be disabled via PYTHONHASHSEED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment