Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save kittipatkampa/067888bcb87de4c61bbf1d99febfa2a4 to your computer and use it in GitHub Desktop.
Save kittipatkampa/067888bcb87de4c61bbf1d99febfa2a4 to your computer and use it in GitHub Desktop.
### This may cause Py4JJavaError: An error occurred while calling o1019.fit.: java.lang.StackOverflowError
train_df = train_df.select(cols)
train_df.cache()
train_df.checkpoint()
train_df.show(n=3, truncate=False, vertical=True)
#... many cache() and .checkpoint() thingies in between, but not relevant to train_df at all
model_pred = pipeline_pred.fit(train_df)
### However, the problem above can be resolved by just moving
### cache() and show() right before .fit() like this:
train_df = train_df.select(cols)
#... many cache() and .checkpoint() thingies in between, but not relevant to train_df at all
train_df.cache()
# Note that .checkpoint() is not even used here:
train_df.show(n=3, truncate=False, vertical=True)
model_pred = pipeline_pred.fit(train_df)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment