This captures how one can create a streaming dataframe for atomically written csv files
from pyspark.sql.types import * | |
file_schema=StructType([StructField("record_str", StringType())]) | |
file_stream_df = spark.readStream.option("sep", "\n")\ | |
.option("header", "false").schema(file_schema)\ | |
.csv("dbfs:/dbfs/mnt/test_data/streaming_db/streaming_data/*") |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment