Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("spark-avro-json-sample") \
.config('spark.hadoop.avro.mapred.ignore.inputs.without.extension', 'false') \
in_path = '/mnt/iotsmarthousedatalake/rawdata/sandbox/eventhubiotsmarthouse/eventhubiotsmarthouse/eventhubiotsmarthouse/0/*/*/*/*/*/*.avro'
avroDf ="com.databricks.spark.avro").load(in_path)
jsonRdd ="string")) x: x[0])
data = # in real world it's better to specify a schema for the JSON
#do whatever you want with `data`
Copy link

nadinebenharrath commented Jun 4, 2021

can you please the in_path , is it the location to the file i want to load for reading ??
thank you

Copy link

raj4j2ee commented Jul 27, 2021

From where can I get in_path as in screenshot if I take it contains only storage/container//.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment