Skip to content

Instantly share code, notes, and snippets.

Oskar Rynkiewicz oskarryn

Block or report user

Report or block oskarryn

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
View sstreaming-spark-final.py
'''
spark/bin/spark-submit \
--master local --driver-memory 4g \
--num-executors 2 --executor-memory 4g \
--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0 \
sstreaming-spark-final.py
'''
from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark.sql.functions import expr
View sstreaming-spark-out.py
'''
spark/bin/spark-submit \
--master local --driver-memory 4g \
--num-executors 2 --executor-memory 4g \
--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0 \
sstreaming-spark-out.py
'''
from pyspark.sql import SparkSession
from pyspark.sql.types import *
@oskarryn
oskarryn / spark_sstreaming_util.py
Created Mar 4, 2019
Util for Spark Structured Streaming. Extract data from kafka message's value and specify schema
View spark_sstreaming_util.py
def parse_data_from_kafka_message(sdf, schema):
"""Extract data from kafka message's value and specify schema
Parameters
----------
sdf : pyspark.sql.DataFrame
DataFrame obtained from Kafka stream, for which df.isStreaming is True
schema : OrderedDict or pyspark.sql.types.StructType
Dictionary that preserves order, containing columns' names as keys and data types as values.
Alternatively, pyspark.sql.types.StructType could define schema.
You can’t perform that action at this time.