Skip to content

Instantly share code, notes, and snippets.

View matthewpick's full-sized avatar

Matthew Pick matthewpick

View GitHub Profile
@matthewpick
matthewpick / aws_logging_util.py
Last active January 11, 2024 23:14
AWS Lambda universal logging formatter (retain aws_request_id in log output)
import logging
import os
def setup_logging_format(log_level=logging.INFO, override_existing_loggers=True):
"""
Logger formatter that works locally and in AWS lambda
:param log_level: Level of logging
:param override_existing_loggers: Flag for overriding the formatting of all existing loggers
import logging
import sys
from collections import deque
from concurrent.futures import ThreadPoolExecutor
from time import sleep
logging.basicConfig(stream=sys.stdout, level=logging.INFO, format="%(threadName)s:%(message)s")
logging.getLogger().setLevel(logging.INFO)
log = logging.getLogger(__name__)
@matthewpick
matthewpick / emr_open_spark_web_ui.py
Last active June 17, 2021 20:01
EMR Cluster - Quickly open Hadoop UI and Spark UI for all application ids
import logging
import boto3
import webbrowser
logger = logging.getLogger(__name__)
def main():
import io.delta.tables.DeltaTable
import org.apache.spark.sql.{AnalysisException, DataFrame, SparkSession}
object DeltaWriter {
def generateSymlinkManifest(deltaPath: String, sparkSession: SparkSession): Unit = {
val deltaTable = DeltaTable.forPath(sparkSession, deltaPath)
deltaTable.generate("symlink_format_manifest")
}
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 25 in stage 79.0 failed 4 times, most recent failure: Lost task 25.3 in stage 79.0 (TID 8326, ip-10-4-40-120.ec2.internal, executor 1): java.io.FileNotFoundException: No such file or directory: s3a://mybucket/mypath/delta_table/part-00018-d3f8bcb6-f5de-4d7d-88d7-becd5d3d9874-c000.snappy.parquet
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:160)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:211)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:130)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedI
@matthewpick
matthewpick / .htaccess
Created July 29, 2019 02:30
Docker-Compose Wordpress + MySQL + Large File uploads (htaccess)
# BEGIN WordPress
php_value upload_max_filesize 20280M
php_value post_max_size 20280M
php_value memory_limit 256M
php_value max_execution_time 300
php_value max_input_time 300
# END WordPress

Running MySQL in a Docker container

Docker

Step 1

Clone the mysql dockerfile repo

git clone https://github.com/docker-library/mysql.git
cd mysql/5.7
@matthewpick
matthewpick / extra_paycheck_calculation.rb
Last active September 8, 2017 03:10
Based on a bi-weekly pay cycle, determine which months you will receive an extra paycheck.
require 'active_support/time'
def paycheck_count(begin_date, years)
month_count = {}
end_date = begin_date + years.years
time_counter = begin_date
while time_counter < end_date do
key = "#{time_counter.year}.#{time_counter.month}"
worker_processes 1;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;