Skip to content

Instantly share code, notes, and snippets.

View matthewpick's full-sized avatar

Matthew Pick matthewpick

View GitHub Profile
worker_processes 1;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
@matthewpick
matthewpick / extra_paycheck_calculation.rb
Last active September 8, 2017 03:10
Based on a bi-weekly pay cycle, determine which months you will receive an extra paycheck.
require 'active_support/time'
def paycheck_count(begin_date, years)
month_count = {}
end_date = begin_date + years.years
time_counter = begin_date
while time_counter < end_date do
key = "#{time_counter.year}.#{time_counter.month}"

Running MySQL in a Docker container

Docker

Step 1

Clone the mysql dockerfile repo

git clone https://github.com/docker-library/mysql.git
cd mysql/5.7
@matthewpick
matthewpick / .htaccess
Created July 29, 2019 02:30
Docker-Compose Wordpress + MySQL + Large File uploads (htaccess)
# BEGIN WordPress
php_value upload_max_filesize 20280M
php_value post_max_size 20280M
php_value memory_limit 256M
php_value max_execution_time 300
php_value max_input_time 300
# END WordPress
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 25 in stage 79.0 failed 4 times, most recent failure: Lost task 25.3 in stage 79.0 (TID 8326, ip-10-4-40-120.ec2.internal, executor 1): java.io.FileNotFoundException: No such file or directory: s3a://mybucket/mypath/delta_table/part-00018-d3f8bcb6-f5de-4d7d-88d7-becd5d3d9874-c000.snappy.parquet
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:160)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:211)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:130)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedI
import io.delta.tables.DeltaTable
import org.apache.spark.sql.{AnalysisException, DataFrame, SparkSession}
object DeltaWriter {
def generateSymlinkManifest(deltaPath: String, sparkSession: SparkSession): Unit = {
val deltaTable = DeltaTable.forPath(sparkSession, deltaPath)
deltaTable.generate("symlink_format_manifest")
}
@matthewpick
matthewpick / emr_open_spark_web_ui.py
Last active June 17, 2021 20:01
EMR Cluster - Quickly open Hadoop UI and Spark UI for all application ids
import logging
import boto3
import webbrowser
logger = logging.getLogger(__name__)
def main():
import logging
import sys
from collections import deque
from concurrent.futures import ThreadPoolExecutor
from time import sleep
logging.basicConfig(stream=sys.stdout, level=logging.INFO, format="%(threadName)s:%(message)s")
logging.getLogger().setLevel(logging.INFO)
log = logging.getLogger(__name__)
@matthewpick
matthewpick / aws_logging_util.py
Last active April 25, 2024 06:23
AWS Lambda universal logging formatter (retain aws_request_id in log output)
import logging
import os
def setup_logging_format(log_level=logging.INFO, override_existing_loggers=True):
"""
Logger formatter that works locally and in AWS lambda
:param log_level: Level of logging
:param override_existing_loggers: Flag for overriding the formatting of all existing loggers