Skip to content

Instantly share code, notes, and snippets.

@farooqarahim
Created January 7, 2021 19:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save farooqarahim/d79f827b72ed6113b616cd19e9eb33a8 to your computer and use it in GitHub Desktop.
Save farooqarahim/d79f827b72ed6113b616cd19e9eb33a8 to your computer and use it in GitHub Desktop.
PySpark Read CSV
import findspark
findspark.init()
from pyspark.sql import SparkSession
# Connect to Remote Spark Deployment
# spark = SparkSession \
# .builder.master('spark://master-node:7077') \
# .appName("read-csv") \
# .getOrCreate()
spark = SparkSession \
.builder \
.appName("read-csv") \
.getOrCreate()
df = spark.read.option("header",True).csv('./csv-file.csv')
type(df)
df.printSchema()
# df.show(10,False)
df.dtypes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment