Skip to content

Instantly share code, notes, and snippets.

@BryanCutler
Last active September 16, 2020 02:30
Show Gist options
  • Save BryanCutler/bc73d573b7e46a984ff8b6edf228e298 to your computer and use it in GitHub Desktop.
Save BryanCutler/bc73d573b7e46a984ff8b6edf228e298 to your computer and use it in GitHub Desktop.
How to create a Spark DataFrame from Pandas or NumPy with Arrow
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@aschmu
Copy link

aschmu commented Jul 9, 2019

Hi ! there seems to be some some missing imports !
e.g where does the spark object comme from ?
otherwise nice gist !

@BryanCutler
Copy link
Author

Thanks @aschmu! The variable spark is a default SparkSession. I forgot to mention you should be running Jupyter with a PySpark kernel. I put a sample script on how I do this here https://gist.github.com/BryanCutler/b7f10167c4face19e03330a07b24ce21 in case it could be of help. Thanks for the feedback!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment