Skip to content

Instantly share code, notes, and snippets.

View reynoldsm88's full-sized avatar

Michael Reynolds reynoldsm88

  • Two Six Labs
  • New York City
View GitHub Profile
@reynoldsm88
reynoldsm88 / example.py
Created July 25, 2022 21:41
prefect 2.0 task analysis example
########################################################################
# Source interfaces
########################################################################
@dataclass
class Source:
__metaclass__ = ABCMeta
@abstractmethod
def poll() -> List[ Tuple[ str,str ] ]:
return NotImplemented()
@reynoldsm88
reynoldsm88 / build.sbt
Created April 20, 2022 14:58
SBT - enable runMain for Spark applications (with provided dependencies)
runMain in Compile := Defaults.runMainTask( fullClasspath in Compile, runner in(Compile, run) ).evaluated
@reynoldsm88
reynoldsm88 / build.sbt
Created February 24, 2022 21:59
SBT - use provided classpath in runMain
runMain in Compile := Defaults.runMainTask( fullClasspath in Compile, runner in(Compile, run) ).evaluated
@reynoldsm88
reynoldsm88 / shingleprints.py
Created December 2, 2021 20:15 — forked from dustinboswell/shingleprints.py
Computing shingleprints for a document
def min_max_hashes(text, window=60):
hashes = [murmurhash(text[i:i+window]) for i in range(len(text)-window+1)]
return [min(hashes), max(hashes)]
def shingleprints(text):
min1, max1 = min_max_hashes(text[0:len(text)/2])
min2, max2 = min_max_hashes(text[len(text)/2:])
# combine pairs, using your favorite hash-value combiner
return [hash_combine(min1, min2),
hash_combine(min1, max2),
@reynoldsm88
reynoldsm88 / minhash.py
Created December 2, 2021 19:55 — forked from dustinboswell/minhash.py
Rough code for comparing document similarity with MinHash
def minhash(text, window=25): # assume len(text) > 50
hashes = [murmurhash(text[i:i+window]) for i in range(len(text)-window+1)]
return set(sorted(hashes)[0:20])
def similarity(text1, text2):
hashes1 = minhash(text1)
hashes2 = minhash(text2)
return len(hashes1 & hashes2) / len(hashes1)
A = "one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen seventeen eighteen"
@reynoldsm88
reynoldsm88 / SparkAgglomerativeClustering.java
Created July 12, 2021 18:40
Agglomerative clustering in Spark
// sourced from http://users.eecs.northwestern.edu/~cji970/pub/cjinBigDataService2015.pdf
JavaRDD<String> subGraphIdRDD = sc.textFile(idFileLoc,numGraphs);
JavaPairRDD<Integer, Edge> subMSTs = subGraphIdRDD.flatMapToPair(new LocalMST(filesLoc, numSplits));
numGraphs = numSplits * numSplits / 2;
numGraphs = (numGraphs + (K - 1)) / K;
JavaPairRDD<Integer, Iterable<Edge>> mstToBeMerged = subMSTs.combineByKey( new CreateCombiner(), new Merger(),new KruskalReducer(numPoints),numGraphs);
while (numGraphs > 1) {
@reynoldsm88
reynoldsm88 / wait-til-available.sh
Created June 9, 2021 21:20
wait until a service is listening on a port to execute a script
until [ -n "$APP_UP" ]; do
echo 'service not available yet'
APP_UP=$(netstat -an | grep 1234)
sleep 3
done
@reynoldsm88
reynoldsm88 / _aws_golang_examples.md
Created January 26, 2021 21:45 — forked from eferro/_aws_golang_examples.md
golang aws: examples

AWS Golang SDK examples

@reynoldsm88
reynoldsm88 / build.sbt
Created November 9, 2020 19:55
sbt collect jars?
// https://stackoverflow.com/questions/5564690/tell-sbt-to-collect-all-my-dependencies-together
val libraryJarPath = outputPath / "lib"
def collectJarsTask = {
val jars = mainDependencies.libraries +++ mainDependencies.scalaJars
FileUtilities.copyFlat(jars.get, libraryJarPath, log)
}
lazy val collectJars = task { collectJarsTask; None } dependsOn(compile)
@reynoldsm88
reynoldsm88 / AsyncOkhttpScalaFutures.scala
Created June 23, 2020 14:22
Use Scala futures for async requests in okhttp
val client : OkHttpClient = new OkHttpClient.Builder().build()
val TEXT : MediaType = MediaType.get( "text/plain; charset=utf-8" )
def asyncRequest( text : String ) : Future[ String ] = {
val body = RequestBody.create( text, TEXT )
val request = new Request.Builder().url( "http://michael.com" ).post( body ).build()
val promise : Promise[ String ] = Promise[ String ]()
client.newCall( httpRequest ).enqueue( new Callback {
override def onFailure( call : Call, e : IOException ) : Unit = promise.failure( e )