Skip to content

Instantly share code, notes, and snippets.

View zmjjmz's full-sized avatar

Zachary Jablons zmjjmz

View GitHub Profile
@zmjjmz
zmjjmz / breakage
Created April 30, 2019 21:40
from_json nightmare
An error occurred while calling o4971.count.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 255.0 failed 4 times, most recent failure: Lost task 3.3 in stage 255.0 (TID 840, ip-172-32-98-36.ec2.internal, executor 1): java.lang.ClassCastException: java.lang.Boolean cannot be cast to java.lang.String
at org.apache.spark.sql.catalyst.json.JSONOptions$$anonfun$27.apply(JSONOptions.scala:84)
at scala.Option.map(Option.scala:146)
at org.apache.spark.sql.catalyst.json.JSONOptions.<init>(JSONOptions.scala:84)
at org.apache.spark.sql.catalyst.json.JSONOptions.<init>(JSONOptions.scala:43)
at org.apache.spark.sql.catalyst.expressions.JsonToStructs.parser$lzycompute(jsonExpressions.scala:555)
at org.apache.spark.sql.catalyst.expressions.JsonToStructs.parser(jsonExpressions.scala:552)
at org.apache.spark.sql.catalyst.expressions.JsonToStructs.nullSafeEval(jsonExpressions.scala:585)
at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:331)
@zmjjmz
zmjjmz / main log
Created April 15, 2019 20:35
batch transform logs
[2019-04-15 20:17:14 +0000] [22] [INFO] Starting gunicorn 19.9.0
[2019-04-15 20:17:14 +0000] [22] [INFO] Listening at: unix:/tmp/gunicorn.sock (22)
[2019-04-15 20:17:14 +0000] [22] [INFO] Using worker: gevent
[2019-04-15 20:17:14 +0000] [33] [INFO] Booting worker with pid: 33
[2019-04-15 20:17:14 +0000] [34] [INFO] Booting worker with pid: 34
[2019-04-15 20:17:14 +0000] [42] [INFO] Booting worker with pid: 42
[2019-04-15 20:17:14 +0000] [50] [INFO] Booting worker with pid: 50
[2019-04-15 20:17:15 +0000] [52] [INFO] Booting worker with pid: 52
[2019-04-15 20:17:15 +0000] [54] [INFO] Booting worker with pid: 54
[2019-04-15 20:17:15 +0000] [62] [INFO] Booting worker with pid: 62
@zmjjmz
zmjjmz / failure_stacktrace
Created April 5, 2019 21:03
new glue failure
An error was encountered:
Session 0 unexpectedly reached final status 'dead'. See logs:
stdout:
stderr:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/share/aws/glue/etl/jars/glue-assembly.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
@zmjjmz
zmjjmz / gluedevendpt.py
Last active October 15, 2020 14:34
glue_devendpt
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from pyspark.sql import functions as sf
from pyspark.sql import types as st
from awsglue.dynamicframe import DynamicFrame
@zmjjmz
zmjjmz / repro.py
Last active April 4, 2019 19:52
PyArrow chunked array output thingie
import os
import pandas as pd
import pyarrow.parquet as pq
import pyarrow as pa
import numpy as np
from tqdm import tqdm
TEST_DIR = 'jaggedbug_testpath'
@zmjjmz
zmjjmz / tokenize_layer_tf2.py
Last active March 8, 2019 23:15
TF2 Lookup Table attempt
import itertools
import numpy
import tensorflow
class TokenizeLookupLayer(tensorflow.keras.layers.Layer):
"""
Layer that encapsulates the following:
- Tokenizing sentences by space (or given delimiter)
- Looking up the words with a given vocabulary list / table
@zmjjmz
zmjjmz / tf2_upgrade_test.py
Created March 7, 2019 23:39
TF2 Upgrade Script testing - part 1
import itertools
import numpy
import tensorflow
class TokenizeLookupLayer(tensorflow.keras.layers.Layer):
"""
Layer that encapsulates the following:
- Tokenizing sentences by space (or given delimiter)
- Looking up the words with a given vocabulary list / table
@zmjjmz
zmjjmz / sagemaker_multiin_repro.py
Created September 11, 2018 23:28
Sagemaker Multi-input repro
import os
import json
import numpy
import tensorflow
from tensorflow.python.estimator.export.export import build_raw_serving_input_receiver_fn
print("Tensorflow version: {0}".format(tensorflow.VERSION))
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
@zmjjmz
zmjjmz / keras_summary_repro.py
Created August 31, 2018 01:35
Demonstration of keras summary writing bug
import glob
import os
import shutil
import numpy
import tensorflow
import pandas as pd
from tensorflow.python.estimator.export.export_output import PredictOutput
from tensorflow.python.saved_model import signature_constants
from protobuf_to_dict import protobuf_to_dict
@zmjjmz
zmjjmz / keras_unconsumed_out_repro.py
Created August 22, 2018 21:50
Keras unconsumed output failure reproduction script.
import numpy
import tensorflow
print("Tensorflow version: {0}".format(tensorflow.VERSION))
DATA_SIZE = 1024
BATCH_SIZE = 32
N_EPOCHS = 1
EMBED_DIM = 100