Skip to content

Instantly share code, notes, and snippets.

@bh1995
Last active January 11, 2021 21:40
Show Gist options
  • Save bh1995/419bd602bdfb8d21458c778a072c56b6 to your computer and use it in GitHub Desktop.
Save bh1995/419bd602bdfb8d21458c778a072c56b6 to your computer and use it in GitHub Desktop.
listings_path = 'your_path/listings.csv'
reviews_path = 'your_path/reviews.csv'
listings_df = spark.read \
.option('multiLine', 'True') \
.option('escape', '"') \
.option("mode", "DROPMALFORMED")\
.csv(listings_path, header=True)
reviews_scheme = StructType([StructField('listing_id', IntegerType(), True),
StructField('id', IntegerType(), True),
StructField('date', DateType(), True),
StructField('reviewer_id', IntegerType(), True),
StructField('reviewer_name', StringType(), True),
StructField('comments', StringType(), True)])
reviews_df = spark.read \
.option('multiLine', 'True') \
.option('escape', '"') \
.option("mode", "DROPMALFORMED")\
.csv(reviews_path, header=True, schema=reviews_scheme)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment