Skip to content

Instantly share code, notes, and snippets.

View mamonu's full-sized avatar
🎯
Focusing

Theodore M mamonu

🎯
Focusing
View GitHub Profile
@mamonu
mamonu / pysparkfixtureexample.py
Created October 29, 2018 15:15
pyspark fixture example
@pytest.fixture(scope="session")
def spark_context(request):
""" fixture for creating a spark context
Args:
request: pytest.FixtureRequest object
"""
conf = (SparkConf().setMaster("local[2]").setAppName("pytest-pyspark-local-testing"))
sc = SparkContext(conf=conf)
request.addfinalizer(lambda: sc.stop())
@mamonu
mamonu / monoids-and-reductions.md
Last active May 2, 2018 12:44 — forked from ludflu/monoids-and-reductions.md
Monoids and map-side reductions using Spark's aggregateByKey

In a classic hadoop job, you've got mappers and reducers. The "thing" being mapped and reduced are key-value pairs for some arbitrary pair of types. Most of your parallelism comes from the mappers, since they can (ideally) split the data and transform it without any coordination with other processes.

By contrast, the amount of parallelism in the reduction phase has an important limitation: although you may have many reducers, any given reducer is guaranteed to receive all the values for some particular key.

So if there are a HUGE number of values for some particular key, you're going to have a bottleneck because they're all going to be processed by a single reducer.

However, there is another way! Certain types of data fit into a pattern:

  • they can be combined with other values of the same type to form new values.
  • the combining operation is associative. For example, integer addition: ((1 + 2) + 3) == (1 + (2 + 3)) - they have an identity value. (f
@mamonu
mamonu / AliasSampling.ipynb
Created February 12, 2018 02:10 — forked from jph00/AliasSampling.ipynb
Fast weighted sampling using the alias method in numba
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@mamonu
mamonu / keybase.md
Created November 30, 2017 17:14
keybase.md

Keybase proof

I hereby claim:

  • I am mamonu on github.
  • I am mamonu (https://keybase.io/mamonu) on keybase.
  • I have a public key ASAaMDZv4UsSo-R8axywMsNGasd119RopVvi6x-Ci_rDago

To claim this, I am signing this object:

@mamonu
mamonu / NDS_RPG_Scraping.ipynb
Created July 2, 2017 17:50
NDS RPG score scraping
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@mamonu
mamonu / nord_modular_osx.md
Created April 26, 2017 17:28 — forked from arirusso/nord_modular_osx.md
Use the original Nord Modular Editor with OSX

Use the original Nord Modular Editor with OSX

Required

  • Homebrew

Compatibility

Confirmed working with

@mamonu
mamonu / Settings.for.Zoom.MS50G
Last active April 14, 2017 16:11
zoom MS50G presets :)
zvex lofi:
1 Line select
2 Rack Comp THRSH 50 Ratio 5 Level 30 ATTCK 10
3 COMP Sense 10 Tone 5 Level 40 ATTCK Fast
4 Graphic EQ 160 -6 400 2 800 -2 3.2 -4 6.4 -8 12 -12 Level 100
5 Vibrato Depth 100 Rate 10 Bal 90 Tone 3 Level 120
-----------------------------------------
@mamonu
mamonu / solragain.sh
Last active February 6, 2017 16:19
delete all and re-index
http://localhost:8080/solr/update?stream.body=<delete><query>*:*</query></delete>
http://localhost:8080/solr/update?stream.body=<commit/>
#how to add linenumbers in bash
awk '{ print FNR "," $0 }' train_set.csv > clean.csv
#edit clean.csv and change 1 from header to line...did it by hand ...could automate this :)
@mamonu
mamonu / SNER_prepare.sh
Created February 1, 2017 13:22
Stanford NER preparer script
#!/bin/bash
mkdir my_project
cd my_project
echo " . . . Downloading file stanford-ner-2014-08-27.zip"
# NOTE: need to update link for further versions
wget http://nlp.stanford.edu/software/stanford-ner-2014-08-27.zip
echo " . . . Unpacking stanford-ner-2014-08-27.zip"