Skip to content

Instantly share code, notes, and snippets.

@peterhurford
peterhurford / install_xelatex_on_mac.txt
Created Aug 21, 2020
How to install latex and xelatex on Mac so that Jupyter "Download as PDF" will work
View install_xelatex_on_mac.txt
brew install pandoc
brew tap homebrew/cask
brew cask install basictex
eval "$(/usr/libexec/path_helper)"
# Update $PATH to include `/usr/local/texlive/2020basic/bin/x86_64-darwin`
sudo tlmgr update --self
sudo tlmgr install texliveonfly
sudo tlmgr install xelatex
sudo tlmgr install adjustbox
sudo tlmgr install tcolorbox
@peterhurford
peterhurford / pytest-fixture-modularization.md
Created Jul 28, 2016
How to modularize your py.test fixtures
View pytest-fixture-modularization.md

Using py.test is great and the support for test fixtures is pretty awesome. However, in order to share your fixtures across your entire module, py.test suggests you define all your fixtures within one single conftest.py file. This is impractical if you have a large quantity of fixtures -- for better organization and readibility, you would much rather define your fixtures across multiple, well-named files. But how do you do that? ...No one on the internet seemed to know.

Turns out, however, you can define fixtures in individual files like this:

tests/fixtures/add.py

import pytest

@pytest.fixture
@peterhurford
peterhurford / git-101-exercises.md
Last active Jul 2, 2021
Git 101, with Exercises
View git-101-exercises.md

Git 101, with Exercises

Git is the key tool we use to allow multiple people to work on the same code base. Git takes care of merging everyone's contributions smoothly. Hence, learning how to use Git is critical to contributing to open source.

Exercises

Exercise 1: Go through the Try Git Guide

Exercise 2: Learn How to file a github issue.

@peterhurford
peterhurford / readable-code.md
Last active Jun 30, 2021
How do you write readable code?: 13 Principles
View readable-code.md

How do you write readable code?: 13 Principles

"Programs should be written for people to read, and only incidentally for machines to execute." -- Structure and Interpretation of Computer Programs

"How would you define good code? [...] After lots of interviews we started wondering if we could come out with a definition of good code following a pseudo-scientific method. [...] The population is defined by all the software developers. The sample consists of 65 developers chosen by convenience. [...] The questionnaire consists in a single question: “What do you feel makes code good? How would you define good code?”. [...] Of those, the most common answer by far was that the code has to be Readable (78.46%), almost 8 of each 10 developers believe that good code should be easy to read and understand." -- "What is Good Code: A Scientific Definition"

@peterhurford
peterhurford / better-programming.md
Last active Jun 25, 2021
One Year Out: How I Became a Better Programmer
View better-programming.md

I've been coding professionally for a year now, having started work on the 30th of June. In that year I've programmed professionally in Ruby on Rails, R, and JavaScript / Coffeescript using both the Knockout and Angular frameworks.

I'd like to think I've become a much better programmer over the past year. Looking back at my old code, I can tell that I grew a lot.

Here's how I did it.

Spend Time Programming

@peterhurford
peterhurford / parallelization.md
Created Oct 17, 2015
How does code get parallelized?
View parallelization.md

Computer code is a series of executed statements. Frequently, these statements are executed one at a time. If one part of your code takes a long time to run, the rest of your code won't run until that part is finished.

However, this isn't how it has to be. We can often make the exact same code go much faster through parallelization, which is simply running different parts of the computer code simaltaneously.

Asynchronous Code

The first example of this is asynchronous code. The idea here is that many times you do things like send a call to another computer, perhaps over the internet, using an API. Normally, code then has to simply wait for the other computer to give it a response over the API. But asynchronous code can simply keep on going and then the API call returns later.

This makes code harder to reason about and handle because you don't know when the API call will return or what your code will be like when it returns, but it makes your code faster because you don't have to wait arou

View advanced-r-abridged.md

Advanced R, Abridged

"Advanced R" by Hadley Wickham is widely considered the best resource to improve your knowledge at R. However, going through it and answering every exercise takes a long time. This guide is designed to give you the most essential parts of Advanced R so that you can get going right away. It still will take a long time, but not as long.

--

1.) Quickly skim these chapters (without doing the exercises) to make sure you're familiar with the concepts:

@peterhurford
peterhurford / meat_consumption_kg_meat_per_capita_per_country.csv
Created Apr 17, 2017
Meat consumption (kg meat consumed per capita) per country
View meat_consumption_kg_meat_per_capita_per_country.csv
COUNTRY BEEF PIG POULTRY SHEEP TOTAL
ARG 40.41400058 8.24187459 36.4689953 1.174247185 86.29911766
AUS 22.8010372 20.25072536 42.00750521 7.423454044 92.48272181
BGD 0.885267859 5.14E-04 1.223173534 1.163676301 3.272631248
BRA 24.15640871 11.20721696 39.36312514 0.393513724 75.12026453
BRICS 4.289081407 15.79587836 10.29847417 1.654767905 32.03820184
CAN 17.37132968 15.74658647 34.15846671 0.81704465 68.09342751
CHL 14.96778476 17.51448686 30.93243359 0.411750266 63.82645548
CHN 3.817396071 31.56769795 11.61724318 2.965417779 49.96775498
COL 12.10121226 5.084564383 26.43592901 0.202352547 43.8240582
@peterhurford
peterhurford / num_rows_csv.R
Last active Feb 25, 2021
What's the fastest way to determine the number of rows of a CSV in R?
View num_rows_csv.R
# What's the fastest way to determine the number of rows of a CSV in R?
# ...Reading the entire CSV to only get the dimensions is likely too slow. Is there a faster way?
# Benchmarks done on a EC2 r3.8xlarge
# Cowritten with Abel Castillo <github.com/abelcastilloavant>
m <- 1000000
d <- data.frame(id = seq(m), a = rnorm(m), b = runif(m))
dim(d)
# [1] 1000000 3
pryr::object_size(d)
@peterhurford
peterhurford / how_to_release_a_new_python_package.md
Last active Feb 23, 2021
How to release a new Python package
View how_to_release_a_new_python_package.md

Nothing novel here, just want these instructions all in one place for my own use.

1.) Ensure everything is pushed to master and is working

2.) Ensure CHANGES.md is up to date with latest

3.) Ensure version in setup.py is incremented

4.) Tag the repo - e.g., git tag 0.2 && git push origin 0.2