Skip to content

Instantly share code, notes, and snippets.

View gm3dmo's full-sized avatar

David Morris gm3dmo

  • GitHub Staff
  • York
  • 01:43 (UTC +01:00)
View GitHub Profile

DWP Payment Data

https://data.gov.uk/dataset/ccdc397a-3984-453b-a9d7-e285074bba4d/spend-over-25-000-in-the-department-for-work-and-pensions

What happens when somebody downloads a CSV from the above?

  1. Most likely open it with Excel/Google Sheets/Open Office/Numbers? (More likely if an individual or small business).
  2. Upload it into some kind of database/Big Query/AWS (More likely if a company/data scientist working with journalists).

When you upload it and scroll down it looks like there is no data after row 190 which is weird and makes you thing something is very wrong. That turns out to be because 192 - 484 are indeed blank:

@gm3dmo
gm3dmo / gist:b89c911e00e97b3219d25790343b62c7
Last active February 18, 2019 07:56
use pandas to get min/max values of all columns in a csv
#!/usr/bin/env python3
import pandas as pd
def main():
f = 'test-data/basic.csv'
df = pd.read_csv(f)
import csv
from ldif3 import LDIFParser
from pprint import pprint
def main():
"""
Purpose
=======
Used to read ldif dumped from an AWS Simple AD and report on disabled accounts.
#!/bin/bash
#set -x
# Name: build_centos_dvd.sh
# Purpose: Customize a Centos DVD
# Author: David Morris
shopt -s -o nounset
@gm3dmo
gm3dmo / clock-offset-from-ntp.py
Last active March 6, 2024 12:24
display clock offset from ntp using chrony
#!/usr/bin/env python
import sys
import shlex
import subprocess
def run_cmd():
"""chronyc tracking
Reference ID : 50484330 (PHC0)
Stratum : 1
@gm3dmo
gm3dmo / gist:0ea14bd0762595a737ec470980a52a8f
Last active June 29, 2018 06:22
Handling Names and Addresses
Do you have a standard set of tests (organisation wide) that relate to handling names and addresses?
Smith
Smith-Jones
Smith Jones
Sir Ranulp Twystleson-Wickham-Fiennes
https://www.mjt.me.uk/posts/falsehoods-programmers-believe-about-addresses/
@gm3dmo
gm3dmo / watermarker.py
Created June 6, 2018 19:14
watermarker script to overlay a pdf onto another pdf with python
# from an article by Mike Driscoll:
# http://www.blog.pythonlibrary.org/2018/06/06/creating-and-manipulating-pdfs-with-pdfrw/
from pdfrw import PdfReader, PdfWriter, PageMerge
def watermarker(path, watermark, output):
base_pdf = PdfReader(path)
watermark_pdf = PdfReader(watermark)
mark = watermark_pdf.pages[0]
for page in range(len(base_pdf.pages)):

curl has --write-out json for timing testing etc see this article curl-write-out-json

example:

curl_custom_flags="-kso /dev/null --write-out '%{json}'"
@gm3dmo
gm3dmo / kernel-benchmark.md
Last active January 8, 2018 11:42
Kernel version vs da Capo (batik) benchmark

#Kernel Benchmark Use an off the shelf benchmark to test impact of different kernels on benchmark performance. I chose http://www.dacapobench.org/

times=10
f=$(/usr/bin/uname -r)
jdk1.8.0_77/bin/java -jar dacapo-9.12-bach.jar batik -n ${times} > ${f}.txt 2>&1

now run it

#Quiz Gordon, Queen,Flash Gordon,1980,https://www.youtube.com/watch?v=glFmRMj_Wbc&feature=youtu.be&t=16 Its Evil,Time Bandits,1981,https://youtu.be/F6X9KcrXHwg?t=11 Too many secrets,Sneakers,1992,https://youtu.be/GutJf9umD9c?t=168 He slimed me,Ghostbusters,1984,https://youtu.be/7_pR6mUYtOo?t=107 It's a unix system,Jurrassic Park,1993,https://youtu.be/dxIPcbmo1_U?t=8 Hello Hello Andybody Home,Back to the future,1985,https://youtu.be/95_DB6GgLQs?t=39 'Ello Poppet,Pirates of the Carribean,2003,https://www.youtube.com/watch?v=4kdjPhmQA0k&feature=youtu.be&t=2 We're all misfits here,The Island of Mistfit Toys,2001,https://www.youtube.com/watch?v=Gr6GbKciNCY&feature=youtu.be&t=174 Spartans what is your profession,300,2006,https://www.youtube.com/watch?v=lIr8u0j08gU&feature=youtu.be&t=20