Skip to content

Instantly share code, notes, and snippets.

View anjackson's full-sized avatar
🧐

Andy Jackson anjackson

🧐
View GitHub Profile
@anjackson
anjackson / SwingFXWebView.java
Created January 19, 2012 15:41
Embedding a JavaFX WebView in a Swing panel.
import com.sun.javafx.application.PlatformImpl;
import java.awt.BorderLayout;
import java.awt.Dimension;
import java.awt.event.ActionEvent;
import java.awt.event.ActionListener;
import javafx.application.Platform;
import javafx.collections.ObservableList;
import javafx.embed.swing.JFXPanel;
import javafx.scene.Group;
import javafx.scene.Node;
@anjackson
anjackson / sha1b32.sh
Created November 18, 2015 22:03
Calculate the Base32-encoded SHA-1 digest of a file at the command line.
openssl dgst -sha1 -binary $1 | python -c "import base64,sys; print base64.b32encode(sys.stdin.read())"
@anjackson
anjackson / watch_v_Hnrdfb6HiK0.html
Created August 3, 2022 12:02
Example file with UTF-8 that is not being detected by Tika
<!DOCTYPE html><html style="font-size: 10px;font-family: Roboto, Arial, sans-serif;" lang="en-GB" system-icons typography typography-spacing><head><meta http-equiv="X-UA-Compatible" content="IE=edge"/><meta http-equiv="origin-trial" content="At2B4ABoBE3kiyFJp5tVx/Zi81HAk2cn2zjA0NcVurqsrwLHavE/fe86HDn71lPLg+o1Rf7jkyRD7QdT4TS8+g0AAABteyJvcmlnaW4iOiJodHRwczovL3lvdXR1YmUuY29tOjQ0MyIsImZlYXR1cmUiOiJQcml2YWN5U2FuZGJveEFkc0FQSXMiLCJleHBpcnkiOjE2NjEyOTkxOTksImlzU3ViZG9tYWluIjp0cnVlfQ=="/><script nonce="zVi1d0BoBdL0_JsEzzYI1g">var ytcfg={d:function(){return window.yt&&yt.config_||ytcfg.data_||(ytcfg.data_={})},get:function(k,o){return k in ytcfg.d()?ytcfg.d()[k]:o},set:function(){var a=arguments;if(a.length>1)ytcfg.d()[a[0]]=a[1];else for(var k in a[0])ytcfg.d()[k]=a[0][k]}};
window.ytcfg.set('EMERGENCY_BASE_URL', '\/error_204?t\x3djserror\x26level\x3dERROR\x26client.name\x3d1\x26client.version\x3d2.20220801.00.00');</script><script nonce="zVi1d0BoBdL0_JsEzzYI1g">(function(){window.yterr=window.yterr||true;window.unha
2022-07-13T09:58:56.201Z 200 1039 https://www.vectisradio.com/wp-sitemap.xml I https://www.vectisradio.com/robots.txt application/xml #127 20220713095855289+777 sha1:VFT6NNA5DW5POM6PGF7ERSJYBV6LDNAJ tid:97213:https://www.vectisradio.com/schedule/ isSitemap,launchTimestamp:20220713095854,duplicate:digest,ip:185.151.30.133 {"contentSize":1483,"warcFilename":"BL-NPLD-20220713084656856-19144-80~npld-heritrix3-worker-1~8443.warc.gz","warcFileOffset":380231951,"scopeDecision":"ACCEPT by rule #1 WatchedFileSurtPrefixedDecideRule","warcFileRecordLength":1623}
2022-07-13T09:59:06.496Z -5002 - https://healthwatchwarrington.co.uk/get-involved/ - https://healthwatchwarrington.co.uk/get-involved/ unknown #061 20220713095152869+433395 - tid:1673:http://www.healthwatchwarrington.co.uk/ launchTimestamp:20220713090000,WebRenderStatus:200,resetQuotas,WebRenderCount:1 {"warcPrefix":"BL-NPLD-WEBRENDER-frequent-npld-20220606093552","scopeDecision":"ACCEPT by rule #1 WatchedFileSurtPrefixedDecideRule"}
2022-07-13T
@anjackson
anjackson / aot-collection-example.json
Created June 8, 2022 15:15
An example of 'back end' collection data.
{
"ttype": "collections",
"id": 4028,
"url": "act-4028",
"created_at": "2021-10-13 09:25:25.153",
"name": "Public Health Discourse",
"description": "Writing and other materials reflecting on health, from the open web.\n\nPlease get in touch with the leads about this collection: Cui Cui (cui.cui@bodleian.ox.ac.uk); Alice Doyle (adoyle2@exseed.ed.ac.uk) and Leontien Talboom (lkt39@cam.ac.uk).",
"publish": false,
"parents_all": "",
"revision": "Archive of Tomorrow collection. This project runs from February 2022 until the beginning of 2023. Eilidh MacGlone 29/03/2022.",
@anjackson
anjackson / crawler-beans.cxml
Created September 25, 2015 14:55
Example H3 crawler beans from one our our domain crawler instances.
<?xml version="1.0" encoding="UTF-8"?>
<!--
HERITRIX 3 CRAWL JOB CONFIGURATION FILE
This is a relatively minimal configuration suitable for many crawls.
Commented-out beans and properties are provided as an example; values
shown in comments reflect the actual defaults which are in effect
if not otherwise specified specification. (To change from the default
behavior, uncomment AND alter the shown values.)
@anjackson
anjackson / rabbithole.md
Last active January 4, 2021 21:12
2020-01-03 US General Election Voting Data Problem

When digging too deep into Twitter, following some conspiracy tweets about Trump's election loss, I came to this odd site: https://hereistheevidence.com/

This site claims to provide tools and links to data to show alleged voting irregularities, and gives examples like this: https://twitter.com/indio007/status/1331828590552428544

Returned BEFORE ballot was mailed
23305 ballots pic.twitter.com/t0O5mUMWKh

— noone special (@indio007) November 26, 2020

Out of curiosity, I thought I'd see if I could reproduce the alleged irregularities.

The here-is-the-evidence site provides tools to download, but I'm not going to go anywhere near those. Installing software from a site like this would be very risky. And anyway, basic tools like grep would be enough

@anjackson
anjackson / govuk-council-hosts.csv
Last active September 11, 2019 09:41
A list of unique GOV.UK domains with the work 'council' in the text, as discovered by UKWA crawlers since 2018-01-01
host title
tendringdc.gov.uk Contact Council Tax | Tendring District Council
thanet.gov.uk Thanet District Council
northampton.gov.uk Northampton Borough Council Homepage
kettering.gov.uk Kettering Borough Council Homepage
sholland.gov.uk South Holland District Council - South Holland District Council
ryedale.gov.uk Ryedale District Council - working with you to make a difference
sevenoaks.gov.uk Sevenoaks District Council homepage
barmouthtowncouncil.gov.uk Barmouth Town Council - Barmouth Town Council
testvalley.gov.uk Home | Test Valley Borough Council
year elements_used count
0 2016 a 2787170
1 2016 html 2758395
2 2016 head 2754630
3 2016 title 2753436
4 2016 meta 2717990
5 2016 script 2715418
6 2016 link 2701135
7 2016 div 2695886
8 2016 img 2670980
year content_type_tika count
0 2016 text/html; charset=UTF-8 278514582
1 2016 application/xhtml+xml; charset=UTF-8 117731771
2 2016 image/jpeg 48557044
3 2016 application/rss+xml 16497156
4 2016 text/html; charset=ISO-8859-1 13856782
5 2016 application/xhtml+xml; charset=ISO-8859-1 12076356
6 2016 image/gif 7153120
7 2016 text/html; charset=windows-1252 6334265
8 2016 application/xhtml+xml; charset=windows-1252 5859147