Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
How to process Event Hub Archive's files using Spark
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("spark-avro-json-sample") \
.config('spark.hadoop.avro.mapred.ignore.inputs.without.extension', 'false') \
avroDf ="com.databricks.spark.avro").load(in_path)
jsonRdd ="string")) x: x[0])
data = # in real world it's better to specify a schema for the JSON
#do whatever you want with `data`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment