Skip to content

Instantly share code, notes, and snippets.

View DavidMertz's full-sized avatar

David Q Mertz DavidMertz

  • KDM Training
  • Maine, USA
  • 17:59 (UTC -04:00)
View GitHub Profile
@DavidMertz
DavidMertz / gist:65f0aeed79bce905fd37
Last active August 29, 2015 14:23
Customizing git to checkin Jupyter Notebooks w/o output

Configure your system to look like following to make sure that Notebooks will always be pushed to GitHub without their output. This will make for smaller saved files, but more relevantly, also ones that are easier to diff (and that git will merge more cleanly).

% cat ~/.gitattributes
*.ipynb    filter=dropoutput_ipynb

% cat ~/bin/ipynb_output_filter.py
#!/usr/bin/env python
import sys
version = None
@DavidMertz
DavidMertz / dqm-bio.txt
Last active February 5, 2024 21:59
David Mertz bio
David is founder of KDM Training, a partnership dedicated to educating developers and data
scientists in machine learning and scientific computing. He created the data science training
program for Anaconda Inc. and was a senior trainer for them. With the advent of deep neural
networks he has turned to training our robot overlords as well.
He was honored to work for 8 years with D. E. Shaw Research, who have built the world's fastest,
highly-specialized (down to the ASICs and network layer), supercomputer for performing molecular
dynamics.
David was a Director of the PSF for six years, and remains co-chair of its Trademarks
absolutise
absolutised
absolutises
absolutive
absolutize
absolutized
absolutizes
abstemious
abstemiously
abstentious
@DavidMertz
DavidMertz / MonotonicExpandingSequence.py
Created January 1, 2019 02:28
MonotonicExpandingSequence
# A bit of cleverness to create a new sequence for infinite iterators
from collections.abc import Sequence
class MonotonicExpandingSequence(Sequence):
def __init__(self, iterator):
self.iterator = iterator
self._cache = []
def __getitem__(self, index):
while len(self._cache) <= index:
@DavidMertz
DavidMertz / dtcvt.jl
Last active January 2, 2019 16:52
Command line date-conversion pipe utility
#!/usr/bin/env julia
####################################################
### Released to the Public Domain by David Mertz ###
####################################################
#= Usage examples:
$ cat schedule
| Course title | Time (EST) | Date |
|-----------------------------------|------------|----------|
@DavidMertz
DavidMertz / gist:1a4aac0e889097d7bf80d8d41a3a644d
Last active April 2, 2019 02:44
Most common suffixes in several languages
% head -30 suffix-frequency-en.txt
('es', 34752)
('ed', 20197)
('ng', 18910)
('ing', 18619)
('er', 10745)
('rs', 10744)
('ns', 8716)
('ts', 8534)
('ly', 8492)
@DavidMertz
DavidMertz / thread_kill run
Created July 21, 2019 19:24
Python loops that might terminate
(base) 530-tmp % python thread_kill.py
[1, 2, 3, 4, 5, 1]
[1, 2, 3, 4, 5, 1, 2]
[1, 2, 3, 4, 5, 1, 2, 3]
[1, 2, 3, 4, 5, 1, 2, 3, 4]
[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
[1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1]
[1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2]
[1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3]
[1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4]
@DavidMertz
DavidMertz / Final-Will.txt.asc
Last active January 9, 2022 03:42
Final Will of David Mertz
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
January 8, 2022
I, David Quintyn Mertz, residing at 45 Main Street, Dexter Maine 04930,
declare this to be my Will, and I revoke any and all wills and codicils
I previously made.
Proof of not-earlier-date: New York Times headline for today's date:
@DavidMertz
DavidMertz / Medical-Power-Of-Attorney.txt.asc
Created January 16, 2020 02:44
Medical Power of Attorney and Medical Directions for End-of-Life
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
DURABLE POWER OF ATTORNEY FOR HEALTH CARE
I, David Quintyn Mertz, of 45 Main Street, Dexter ME 04930 USA, being of sound
mind, voluntarily create this Durable Power of Attorney for Health Care.
PRIOR DESIGNATIONS
@DavidMertz
DavidMertz / Cleaning Data
Last active February 26, 2020 20:44
Cleaning Data
Cleaning Data for Effective Data Science
Doing the other 80% of the work
In order for something to become clean, something else must become dirty.
–Imbesi's Law of the Conservation of Filth
It is something of a truism in data science, data analysis, or machine learning
that most of the work needed to do your actual work lies in cleaning your data.
The subtitle of this work alludes to a commonly assigned percentage. A keynote
speaker I listened to at a data science conference a few years ago made a joke—