Skip to content

Instantly share code, notes, and snippets.

@1ambda
Created December 23, 2021 23:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save 1ambda/09212c9d351b88c5431e991da473eb60 to your computer and use it in GitHub Desktop.
Save 1ambda/09212c9d351b88c5431e991da473eb60 to your computer and use it in GitHub Desktop.
# 일반적으로는 Schema Registry 또는 .avsc 파일을 읽어 스키마를 사용합니다.
# 예시를 위해 String 으로 만든 Schema 를 사용합니다.
schemaAvroListingV1 = """
{
"type": "record",
"name": "AirbnbListing",
"namespace": "com.airbnb",
"fields": [
{"name": "listing_id", "type": "int"},
{"name": "listing_url", "type": "string"},
{"name": "listing_name", "type": "string"},
{"name": "listing_summary", "type": ["string", "null"]},
{"name": "listing_desc", "type": ["string", "null"]}
]
}
"""
# Avro 파일 포맷으로 저장할때 옵션을 통해 Schema 를 지정합니다.
dfListingSelected\
.repartition(2)\
.write\
.mode("overwrite")\
.format("avro")\
.option("avroSchema", schemaAvroListingV1)\
.save("./airbnb_listings_avro")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment