Skip to content

Instantly share code, notes, and snippets.

View schmohlio's full-sized avatar

Matthew Schmohl schmohlio

View GitHub Profile
@schmohlio
schmohlio / Json2JavaMap.scala
Created July 18, 2016 23:34
parse Json to Generic Java Map in Scala, similar to Jackson.
import com.google.gson.Gson
import java.util.{Map => JMap, LinkedHashMap}
type GenericDecoder = String => JMap[String, Object]
val decoder: GenericDecoder = {
// Gson instances are apparently thread-safe, so curry...
val gson: Gson = new Gson()
// LinkedHashMap preserves ordering. use HashMap if not required.
x => gson.fromJson(x, (new LinkedHashMap[String, Object]()).getClass)
@schmohlio
schmohlio / JsonFlatten.fs
Created April 29, 2016 03:38
flatten Json to arrays and primitive types
open Newtonsoft.Json
let rec private _flattenJson (delim:string) (pkey:string) (json:JsonValue): (string * JsonValue) seq =
json.Properties
|> Seq.collect (fun (k, v) ->
match v with
| JsonValue.Record r -> _flattenJson delim k (JsonValue.Record r)
| _ -> [(pkey + delim + k, v)] |> Seq.ofList
)
/// flatten a Json nested children and stop at primitive or Array. specify delimiter for nested key names.
@schmohlio
schmohlio / remove_empty_part_files.sh
Last active May 3, 2017 06:22
delete or warn on empty part files in hadoop, by extension.
#!/bin/bash -e
""" USAGE: ./remove_empty_part_files.sh <qualified hdfs dir path> """
HDFS=$1
echo "checking for empty files in $HDFS..."
IFS=$'\n'
for i in `hadoop fs -ls $HDFS/* | grep -e "$HDFS/.*" | awk '{print $0}'` ; do
file=$(echo $i | awk '{print $8}')
size=$(echo $i | awk '{print $5}')
if [ $size -eq 0 ]; then
@schmohlio
schmohlio / build_diagnostic.log
Created January 27, 2016 14:52
vim-sharp logs
XBuild Engine Version 12.0
Mono, Version 4.2.0.0
Copyright (C) 2005-2013 Various Mono authors
Loading default tasks for ToolsVersion: 4.0 from /usr/local/Cellar/mono/4.2.0.179/lib/mono/4.5/Microsoft.Common.tasks
Build started 1/27/2016 9:48:30 AM.
__________________________________________________
Extension path '/Library/Frameworks/Mono.framework/External/xbuild' not found, ignoring.
Extension path '/Library/Frameworks/Mono.framework/External/xbuild' not found, ignoring.
Project "MYPROJECT.fsproj" (default target(s)):
@schmohlio
schmohlio / EventMachines.md
Created January 8, 2016 16:19 — forked from eulerfx/EventMachines.md
The relationship between state machines and event sourcing

A state machine is defined as follows:

  • Input - a set of inputs
  • Output - a set of outputs
  • State - a set of states
  • S0 ∈ S - an initial state
  • T : Input * State -> Output * State - a transition function

If you model your services (aggregates, projections, process managers, sagas, whatever) as state machines, one issue to address is management of State. There must be a mechanism to provide State to the state machine, and to persist resulting State for subsequent retrieval. One way to address this is by storing State is a key-value store. Another way is to use a SQL database. Yet another way is event sourcing. The benefit of even sourcing is that you never need to store State itself. Instead, you rely on the Output of a service to reconstitute state. In order to do that, the state machine transition function needs to be factored into two functions as follows:

@schmohlio
schmohlio / gist:f3d6866b9b3174f1fb1a
Created June 3, 2015 02:17
copy all missing servers based on instructions
#!/usr/bin/env python
'''
DataSync
Makes instructions to copy datasets to servers missing backups
based on input data.
- Ensure that each data center has a copy of every data set.
- Every dataset is included in at least 1 data center.
#!/usr/bin/env python
'''
JumbleSorter.py
sorts a list of strings and integers, but keeps types at nth element
in list constant in result.
only implement for stdin for now, i.e.,
@schmohlio
schmohlio / gist:fe200a77628e28355bb4
Last active August 29, 2015 14:19
finding max number of meeting rooms
meetings_sample = [(0, 30), (60, 90), (20, 65)]
[(0,True),(30,False),(60,True),(90,False),(20,True),(65,False)]
def count_rooms(meeting_events):
meeting_events.sort(lambda x,y: x[0]<y[0]) # O(nlogn), need to add sorting here so that False is first
# not sure if that sort is left associative or right
# associative
ongoing_meetings = [] # pretend stack
max_num_meetings = 0 # initialize
stack_size = 0
@schmohlio
schmohlio / two_largest
Last active August 29, 2015 14:19
finding sum of two largest values in list (unordered).
val sample = List(1,2,3,3,5,4,6,7,4)
// a good default if sample is always positive integers.
val START = (-1, -1)
// this is what I meant by default foldLeft (or scanLeft) value
val two_largest = test.foldLeft(START) { (acc: (Int, Int), n: Int) =>
val (smaller, larger) = acc
if (n > larger)
(larger, n)
@schmohlio
schmohlio / gist:44b36146a54800334e77
Last active August 29, 2015 14:19
flatten nested array, where each level of the the tree represents a different category to search by, and preserve traversal metadata.
/**
* @keys n-length list of "labels", where the @key[i] is the category label of the ith level of @dat
* @dat a nested associative array, tree-like, with n+1 levels, where 1..n levels of the tree represent categories
* and the n+1 level is the actual request data.
* returns: list of associative arrays, each representing rows to be inserted into database.
*
* flattens nested array into a list of "rows", where the |rows| == |leaves in tree|.
* each "row" contains the data within the leaves, plus n additional key-value pairs represented by
* label => category (i.e. [...country => 'US', type => 'tablet'] when @keys = ['country', 'type'], and @dat has n+1==3 levels.)
**/