Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
This captures how one can create a streaming dataframe for atomically written csv files
from pyspark.sql.types import *
file_schema=StructType([StructField("record_str", StringType())])
file_stream_df = spark.readStream.option("sep", "\n")\
.option("header", "false").schema(file_schema)\
.csv("dbfs:/dbfs/mnt/test_data/streaming_db/streaming_data/*")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.