Skip to content

Instantly share code, notes, and snippets.

@justinnaldzin
justinnaldzin / spark_dataframe_size_estimator.py
Created July 18, 2018 19:29
Estimate size of Spark DataFrame in bytes
# Function to convert python object to Java objects
def _to_java_object_rdd(rdd):
""" Return a JavaRDD of Object by unpickling
It will convert each Python object into Java object by Pyrolite, whenever the
RDD is serialized in batch or not.
"""
rdd = rdd._reserialize(AutoBatchedSerializer(PickleSerializer()))
return rdd.ctx._jvm.org.apache.spark.mllib.api.python.SerDe.pythonToJava(rdd._jrdd, True)
# Convert DataFrame to an RDD
@elvismdev
elvismdev / jenkins_wpsvn_deploy.sh
Last active August 30, 2022 18:35 — forked from BFTrick/deploy.sh
Deploy from Jenkins to WordPress.org SVN
# In your Jenkins job configuration, select "Add build step > Execute shell", and paste this script contents.
# Replace `______your-plugin-name______`, `______your-wp-username______` and `______your-wp-password______` as needed.
# main config
WP_ORG_USER="______your-wp-username______" # your WordPress.org username
WP_ORG_PASS="______your-wp-password______" # your WordPress.org password
PLUGINSLUG="______your-plugin-name______"
CURRENTDIR=`pwd`
MAINFILE="______your-plugin-name______.php" # this should be the name of your main php file in the wordpress plugin
@nunomorgadinho
nunomorgadinho / gist:b2d8e5b8f5fec5b0ed946b24fa288a91
Created February 10, 2017 13:30
PHPCS + WordPress Coding Standards
# Install brew
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
# Install composer
brew install homebrew/php/composer
### PHPCS
composer global require "squizlabs/php_codesniffer=*"
# Add to your .bash_profile
@inchoate
inchoate / including_external_package_in_dataflow.md
Last active February 2, 2024 11:40
Adding an extra package to a Python Dataflow project to run on GCP

The Problem

The documentation for how to deploy a pipeline with extra, non-PyPi, pure Python packages on GCP is missing some detail. This gist shows how to package and deploy an external pure-Python, non-PyPi dependency to a managed dataflow pipeline on GCP.

TL;DR: You external package needs to be a python (source/binary) distro properly packaged and shipped alongside your pipeline. It is not enough to only specify a tar file with a setup.py.

Preparing the External Package

Your external package must have a proper setup.py. What follow is an example setup.py for our ETL package. This is used to package version 1.1.1 of the etl library. The library requires 3 native PyPi packages to run. These are specified in the install_requires field. This package also ships with custom external JSON data, declared in the package_data section. Last, the setuptools.find_packages function searches for all available packages and returns that

@tim-tang
tim-tang / run-tests.sh
Created October 28, 2016 11:30 — forked from criccomini/run-tests.sh
run-tests.sh
#!/bin/bash
# Runs airflow-dags tests.
# Set Nose defaults if no arguments are passed from CLI.
CWD="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
NOSE_ARGS=$@
if [ -z "$NOSE_ARGS" ]; then
NOSE_ARGS=" \
--with-coverage \
@nathairtras
nathairtras / callback_retry_clear_subdag.py
Last active May 28, 2021 15:47
Callback to clear Airflow SubDag on retry
import logging
from airflow.models import DagBag
def callback_subdag_clear(context):
"""Clears a subdag's tasks on retry."""
dag_id = "{}.{}".format(
context['dag'].dag_id,
context['ti'].task_id,
)
execution_date = context['execution_date']
@dnordby
dnordby / shopify-loadmore.md
Last active December 29, 2021 18:03
Shopify <li> loadmore (products)

Use Shopify pagination to trigger loadmore event

Markup

{% paginate collection.products by 16 %}
<div class="products__collection">
  <ul class="product collection__grid"">
    {% for product in collection.products %}
      {% assign prod_id = forloop.index | plus:paginate.current_offset %}
      {% include 'product-grid-item' with prod_id %}
@criccomini
criccomini / airflow-supervisord.conf
Created June 22, 2016 14:54
airflow-supervisord.conf
; Configuration for Airflow webserver and scheduler in Supervisor
[program:airflow]
command=/bin/airflow webserver
stopsignal=QUIT
stopasgroup=true
user=airflow
stdout_logfile=/var/log/airflow/airflow-stdout.log
stderr_logfile=/var/log/airflow/airflow-stderr.log
environment=HOME="/home/airflow",AIRFLOW_HOME="/etc/airflow",TMPDIR="/storage/airflow_tmp"

Setting up a SSL Cert from Comodo

I use Namecheap.com as a registrar, and they resale SSL Certs from a number of other companies, including Comodo.

These are the steps I went through to set up an SSL cert.

Purchase the cert

@JesterXL
JesterXL / Streams in JavaScript Sample Code.md
Last active March 29, 2018 13:49
Streams in JavaScript Sample Code

Streams in JavaScript

The slides for this code.

If you're trying to learn the basics of the Array methods, this thorough tutorial with answers is a great place to start.

Found some WONDERFUL documentation if you're curious what type of observable to use or method to use on your data. It shows it in a big table of user stories.

Lee Campbell has some examples as well.