Skip to content

Instantly share code, notes, and snippets.

from pyspark.sql import Row
my_df_schema = my_df.schema
def replace_content(a_row):
a_row_dict = a_row.asDict()
# Modify the contents of the dict
a_row_dict["key"] = "new value"
@jose-goncabel
jose-goncabel / volume-ec2-backup.sh
Created June 21, 2018 16:47
Backup Script for EC2 EBS Volumes - The script takes snapshots of volumes if marked with certain tags and deletes the old ones once the are older than the specified days. Perfect for Jenkins.
#!/bin/bash
# The script will generate new snapshots
# every execution. It does not look at the amount
# of snapshots, only at their age.
# ---------- CONSTANTS ----------
retention_period_days=30
last_date_to_back=$(date --date "$retention_period_days days ago")
last_date_to_back_seconds=$(date +%s --date "$retention_period_days days ago")
@jose-goncabel
jose-goncabel / multiple-gpu-example.py
Created May 21, 2018 16:46
Keras + Tensorflow + Spark - A PySpark script of how to use multiple GPUs for prediction within a Spark environment loading a pre-trained Keras model on each worker.
#####
# IMPORTS
#####
from pyspark import TaskContext
import os
#####
# PATHS
#####
path_model = "/path/to/pretrained/model.h5"