Skip to content

Instantly share code, notes, and snippets.

View drkane's full-sized avatar

David Kane drkane

View GitHub Profile
@drkane
drkane / to_excel_table.py
Last active January 9, 2023 15:18
Pandas to_excel with a table
# from https://xlsxwriter.readthedocs.io/example_pandas_table.html
df.index.name = "org_id"
criteria = df["Include"]=="y"
sheets = {
"Included - active": (criteria & df["active"]),
"Included - inactive": (criteria & ~df["active"]),
"Excluded": ~criteria,
}
column_widths = df.apply(lambda x: x.astype(str).str.len().max(), axis=0).apply(lambda x: min(x, 50)).to_dict()
We can't make this file beautiful and searchable because it's too large.
"reg_number","name","constitution","geographical_spread","address","postcode","phone","active","date_registered","date_removed","web","email","company_number","objects","source","last_updated","income","spending","fye","dual_registered"
SC004401,University Of Glasgow Court,Statutory corporation (Royal Charter etc),UK and overseas,"Directorate, Finance Office, Gilbert Scott Building, University Avenue, Glasgow",G12 8QQ,,true,1992-01-01,,www.gla.ac.uk,,,"The advancement of education. The advancement of health. The advancement of, arts, heritage, culture, or science.",Office of Scottish Charity Regulator,2021-01-25 18:48:11,690648000,758804000,2019-07-31,false
SC015263,University Of Strathclyde,Statutory corporation (Royal Charter etc),UK and overseas,"Finance, McCance Building, 16 Richmond Street, Glasgow",G1 1XQ,,true,1965-04-02,,www.strath.ac.uk,,,"The advancement of education. The advancement of arts, heritage, culture, or science.",Office of Scottish Charity Regulator,2021-01-25 18:48:20,354611000,404702000
@drkane
drkane / example_additional_data.json
Created February 16, 2021 16:36
Examples of the additional_data field from the 360Giving datastore
{
"id_and_name": "[\"Exeter City Council\", \"GB-LAE-EXE\"]",
"locationLookup": [
{
"ewcd": "K04000001",
"ewnm": "England and Wales",
"gbcd": "K03000001",
"gbnm": "Great Britain",
"ukcd": "K02000001",
"uknm": "United Kingdom",
ladcd ladnm
E09000002 Barking and Dagenham
E09000003 Barnet
E09000004 Bexley
E09000005 Brent
E09000006 Bromley
E09000008 Croydon
E09000009 Ealing
E09000010 Enfield
E09000013 Hammersmith and Fulham
@drkane
drkane / banding_bins.py
Created November 23, 2020 12:24
Bins used for age, income and award amount for charity/grants data
# Bins used for numeric fields
AMOUNT_BINS = [0, 500, 1000, 2000, 5000, 10000, 100000, 1000000, float("inf")]
AMOUNT_BIN_LABELS = ["Under £500", "£500 - £1k", "£1k - £2k", "£2k - £5k", "£5k - £10k",
"£10k - £100k", "£100k - £1m", "Over £1m"]
INCOME_BINS = [-1, 10000, 100000, 250000, 500000, 1000000, 10000000, float("inf")]
INCOME_BIN_LABELS = ["Under £10k", "£10k - £100k", "£100k - £250k",
"£250k - £500k", "£500k - £1m", "£1m - £10m", "Over £10m"]
AGE_BINS = pd.to_timedelta(
[x * 365 for x in [-1, 1, 2, 5, 10, 25, 200]], unit="D")
AGE_BIN_LABELS = ["Under 1 year", "1-2 years", "2-5 years",
@drkane
drkane / lsoa_full.csv
Created May 6, 2020 13:29
LSOAs matched to MSOA, TTWA, Local Authorities (2011), Local Authorities (2020), upper tier local authorities (2020) and countries
We can't make this file beautiful and searchable because it's too large.
LSOA11CD,LSOA11NM,MSOA11CD,MSOA11NM,TTWA11CD,TTWA11NM,LAD11CD,LAD11NM,LAD11NMW,LAD20CD,LAD20NM,LAD20NMW,CTY20CD,CTY20NM,CTRYCD,CTRYNM
95DD04W1,Ballee,95DD04W1,Ballee,N12000001,Ballymena,N09000008,Mid and East Antrim,,N09000008,Mid and East Antrim,,N09000008,Mid and East Antrim,N92000002,Northern Ireland
95DD05W1,Ballykeel,95DD05W1,Ballykeel,N12000001,Ballymena,N09000008,Mid and East Antrim,,N09000008,Mid and East Antrim,,N09000008,Mid and East Antrim,N92000002,Northern Ireland
95DD06W1,Ballyloughan,95DD06W1,Ballyloughan,N12000001,Ballymena,N09000008,Mid and East Antrim,,N09000008,Mid and East Antrim,,N09000008,Mid and East Antrim,N92000002,Northern Ireland
95DD07S1,Broughshane 1,95DD07S1,Broughshane 1,N12000001,Ballymena,N09000008,Mid and East Antrim,,N09000008,Mid and East Antrim,,N09000008,Mid and East Antrim,N92000002,Northern Ireland
95DD07S2,Broughshane 2,95DD07S2,Broughshane 2,N12000001,Ballymena,N09000008,Mid and East Antrim,,N09000008,Mid and East Antrim,,N09000008,Mid and East Antrim,N92000002,Northe
@drkane
drkane / testdata.py
Created April 13, 2020 22:11
Script for generating test Charity Commission zip files
import io
from zipfile import ZipFile, ZIP_DEFLATED
import random
import bcp
import click
import tqdm
@click.command()
@drkane
drkane / title_exceptions.py
Created April 1, 2020 15:31
Function for turning uppercase names to titlecase, particularly for UK charities and companies
# use with https://pypi.org/project/titlecase/
# eg: titlecase.titlecase(x, title_exceptions)
def title_exceptions(word, **kwargs):
word_test = word.strip("(){}<>.")
# lowercase words
if word_test.lower() in ['a', 'an', 'of', 'the', 'is', 'or']:
return word.lower()
@drkane
drkane / query.json
Created February 18, 2020 13:19
GrantNav search json query input change
{
"extra_context": {
"amountAwardedFixed_facet_size": 3,
"awardYear_facet_size": 3
},
"sort": {
"_score": {
"order": "desc"
}
},
@drkane
drkane / company_number_patterns.csv
Created February 18, 2020 10:28
Patterns of UK Company Numbers on the Companies House register of companies. All numeric digits in numbers have been replaced with "0".
CompanyNumberPattern CompanyCategory companies
00000000 Community Interest Company 18634
00000000 Converted/Closed 11
00000000 Old Public Company 35
00000000 Other Company Type 3
00000000 PRI/LBG/NSC (Private, Limited by guarantee, no share capital, use of 'Limited' exemption) 40062
00000000 PRI/LTD BY GUAR/NSC (Private, limited by guarantee, no share capital) 103622
00000000 PRIV LTD SECT. 30 (Private limited company, section 30 of the Companies Act) 18
00000000 Private Limited Company 4526934
00000000 Private Unlimited 250