Skip to content

Instantly share code, notes, and snippets.

@1ambda
Created December 25, 2021 07:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save 1ambda/7c3c2a714542e8a1aa2afe5c75ed0e79 to your computer and use it in GitHub Desktop.
Save 1ambda/7c3c2a714542e8a1aa2afe5c75ed0e79 to your computer and use it in GitHub Desktop.
from pyspark.sql.types import *
from pyspark.sql.functions import *
from pyspark.sql.window import Window
dfCalendar = spark.read.load("./airbnb_calendar.csv",
format="csv", inferSchema=True, header=True,
quote='"', escape='"', sep=',', multiline=True)
dfListing = spark.read.load("./airbnb_listings.csv",
format="csv", inferSchema=True, header=True,
quote='"', escape='"', sep=',', multiline=True)
dfCalendar = dfCalendar.cache()
dfListing = dfListing.cache()
dfCalendar.printSchema()
dfCalendar.show()
dfListing.printSchema()
dfListing\
.select("id", "listing_url", "name", "description", "property_type", "city", "review_scores_rating", "price")\
.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment