Skip to content

Instantly share code, notes, and snippets.

@quis
Last active November 21, 2017 09:20
Show Gist options
  • Save quis/9c2625225b7e381da2b0d523ae54b3b7 to your computer and use it in GitHub Desktop.
Save quis/9c2625225b7e381da2b0d523ae54b3b7 to your computer and use it in GitHub Desktop.
Domains of public sector organisations not on gov.uk
acas.org.uk
ahdb.org.uk
ahrc.ac.uk
arb.org.uk
artscouncil.org.uk
bankofengland.co.uk
bbc.co.uk
bbsrc.ac.uk
bfi.org.uk
biglotteryfund.org.uk
bl.uk
boundarycommission.org.uk
british-business-bank.co.uk
britishcouncil.org
britishmuseum.org
caa.co.uk
careerswales.com
catribunal.org.uk
ccwater.org.uk
channel4.com
chevening.org
citb.co.uk
comisiynyddygymraeg.org
cqc.org.uk
dpecgb.co.uk
dsfc.ac.uk
dsma.uk
ebbsfleetdc.org.uk
ecitb.org.uk
eis2win.co.uk
electoralcommission.org.uk
epsrc.ac.uk
equalityhumanrights.com
esrc.ac.uk
fca.org.uk
finds.org.uk
fireservicecollege.ac.uk
fleetairarm.com
gbcc.org.uk
geffrye-museum.org.uk
gov.scot
greeninvestmentbank.com
hblb.org.uk
hefce.ac.uk
hesa.ac.uk
historicengland.org.uk
hlf.org.uk
horniman.ac.uk
housing-ombudsman.org.uk
hrp.org.uk
ico.org.uk
icrev.org.uk
imb.org.uk
intelligencecommissioner.com
iocco-uk.info
ipt-uk.com
iraqinquiry.org.uk
iwm.org.uk
kew.org
lcrhq.co.uk
lease-advice.org
legalombudsman.org.uk
legalservicesboard.org.uk
lgo.org.uk
liverpoolmuseums.org.uk
marshallscholarship.org
mrc.ac.uk
nam.ac.uk
nationalforest.org
nationalgallery.org.uk
nerc.ac.uk
nestpensions.org.uk
newcoventgardenmarket.com
nhm.ac.uk
nhmf.org.uk
nhsla.com
nic.org.uk
nice.org.uk
nihrc.org
nipolicingboard.org.uk
nlb.org.uk
nmrn.org.uk
northumberlandnationalpark.org.uk
northyorkmoors.org.uk
npg.org.uk
nsandi.com
ofcom.org.uk
offa.org.uk
ogauthority.co.uk
ombudsman.org.uk
onr.org.uk
ordnancesurvey.co.uk
paradescommission.org
pbni.org.uk
pensionprotectionfund.org.uk
pensions-ombudsman.org.uk
pensionsadvisoryservice.org.uk
pharmacopoeia.com
portonbiopharma.com
ppfo.org.uk
professionalstandards.org.uk
psr.org.uk
qeiicc.co.uk
rafmuseum.org.uk
registrarofconsultantlobbyists.org.uk
rmg.co.uk
royalarmouries.org
royalmarinesmuseum.co.uk
royalmint.com
royalparks.org.uk
rssb.co.uk
s4c.co.uk
safetyatsportsgrounds.org.uk
sciencemuseum.org.uk
seafish.org
sentencingcouncil.org.uk
servicecomplaintsombudsman.org.uk
slc.co.uk
soane.org
sportengland.org
stfc.ac.uk
submarine-museum.co.uk
supremecourt.uk
tate.org.uk
theatrestrust.org.uk
theccc.org.uk
thecrownestate.co.uk
theipsa.org.uk
transportfocus.org.uk
trinityhouse.co.uk
ukad.org.uk
ukri.org
vam.ac.uk
victimscommissioner.org.uk
visitbritain.org
visitengland.com
wallacecollection.org
wfd.org
wiltonpark.org.uk
yorkshiredales.org.uk
#!/usr/bin/env python3
# Adapted from https://github.com/openregister/government-organisation-data/blob/4623fb7c88135c8eeeafdb3fb1b911f424df3c67/lists/govuk/download.py
import sys
import requests
import json
from bs4 import BeautifulSoup
from urllib.parse import urlparse
domains_to_exclude = {
'.gov.uk',
'.nhs.uk',
'.police.uk',
'.mod.uk',
}
def get_government_domains():
url = "https://www.gov.uk/api/organisations?page=1"
while url:
resp = requests.get(url=url)
r = json.loads(resp.text)
for row in r['results']:
page = requests.get(row['web_url'])
soup = BeautifulSoup(page.text, "html.parser")
element = soup.select_one(".url-link")
if element:
link_href = urlparse(element['href']) \
.netloc \
.replace('www.', '')
if not any(
link_href.endswith(domain) for domain in domains_to_exclude
):
print(link_href)
yield link_href
if 'next_page_url' in r:
url = r['next_page_url']
else:
url = None
domains = set(get_government_domains())
print("="*80)
for domain in sorted(domains):
print(domain)
adjudicatorsoffice.gov.uk
bcomm-scotland.independent.gov.uk
bcomm-wales.gov.uk
broads-authority.gov.uk
btpa.police.uk
budgetresponsibility.independent.gov.uk
cafcass.gov.uk
ccrc.gov.uk
childrenscommissioner.gov.uk
civilservicecommission.independent.gov.uk
consultation.boundarycommissionforengland.independent.gov.uk
cpni.gov.uk
cps.gov.uk
da.mod.uk
dartmoor-npa.gov.uk
dcalni.gov.uk
dft.gov.uk
digital.nhs.uk
dwi.gov.uk
england.nhs.uk
estyn.gov.uk
exmoor-nationalpark.gov.uk
fcoservices.gov.uk
food.gov.uk
forestry.gov.uk
gamblingcommission.gov.uk
gchq.gov.uk
gla.gov.uk
hee.nhs.uk
hfea.gov.uk
hmgcc.gov.uk
hmic.gov.uk
hra.nhs.uk
hse.gov.uk
hta.gov.uk
iapdeathsincustody.independent.gov.uk
icai.independent.gov.uk
improvement.nhs.uk
ipcc.gov.uk
jac.judiciary.gov.uk
jncc.defra.gov.uk
judiciary.gov.uk
justice.gov.uk
justiceinspectorates.gov.uk
lakedistrict.gov.uk
lawcom.gov.uk
lordsappointments.independent.gov.uk
metoffice.gov.uk
mi5.gov.uk
nationalarchives.gov.uk
nationalcrimeagency.gov.uk
naturalresourceswales.gov.uk
ncsc.gov.uk
newforestnpa.gov.uk
nhsbsa.nhs.uk
nhsbt.nhs.uk
nihe.gov.uk
northernireland.gov.uk
ofgem.gov.uk
ofwat.gov.uk
ons.gov.uk
orr.gov.uk
peakdistrict.gov.uk
ppo.gov.uk
privycouncil.independent.gov.uk
publicappointmentscommissioner.independent.gov.uk
publichealthwales.wales.nhs.uk
sfo.gov.uk
sia.homeoffice.gov.uk
sis.gov.uk
southdowns.gov.uk
spa.independent.gov.uk
statisticsauthority.gov.uk
surveillancecommissioners.independent.gov.uk
terrorismlegislationreviewer.independent.gov.uk
thepensionsregulator.gov.uk
uksport.gov.uk
valuationtribunal.gov.uk
wales.gov.uk
wales.nhs.uk
wao.gov.uk
webarchive.nationalarchives.gov.uk
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment