Skip to content

Instantly share code, notes, and snippets.

@1ambda
Created December 23, 2021 23:41
Show Gist options
  • Save 1ambda/3daf8f7f27a011d7767d4f1bbf41f9d2 to your computer and use it in GitHub Desktop.
Save 1ambda/3daf8f7f27a011d7767d4f1bbf41f9d2 to your computer and use it in GitHub Desktop.
# listing_name 이 삭제되었습니다.
schemaAvroListingV2 = """
{
"type": "record",
"name": "AirbnbListing",
"namespace": "com.airbnb",
"fields": [
{"name": "listing_id", "type": "int"},
{"name": "listing_url", "type": "string"},
{"name": "listing_summary", "type": ["string", "null"]},
{"name": "listing_desc", "type": ["string", "null"]}
]
}
"""
# 신규 버전의 스키마로 (v2) 과거 버전으로 쓰여진 (V1) 데이터를 읽습니다.
dfListingAvroV2 = spark.read.format("avro")\
.option("avroSchema", schemaAvroListingV2)\
.load("/FileStore/raw/airbnb_listings_avro")
dfListingAvroV2.printSchema()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment