Skip to content

Instantly share code, notes, and snippets.

View aih's full-sized avatar

Ari Hershowitz aih

View GitHub Profile
@aih
aih / htmldiff-Copilot
Created January 31, 2023 01:43
Copilot-generated algorithm to diff content of two XML files
from lxml import etree
from lxml.etree import XMLParser
from lxml import html
from lxml.html.diff import htmldiff
def diffXMLContent(file1, file2):
parser = XMLParser(remove_blank_text=True)
tree1 = etree.parse(file1, parser)
tree2 = etree.parse(file2, parser)
@aih
aih / missing_file_ending.sh
Last active January 30, 2023 14:44
Finds files that are created with an -extracted.json ending, but not a -fullresults.json ending
#!/bin/bash
# Get list of files ending with -extracted.json
files_extracted=$(ls *-extracted.json)
# Use for loop to iterate over list of files ending with -extracted.json
for file in $files_extracted; do
# Check if file with -fullresults.json ending exists
if [ ! -f ${file%-extracted.json}-fullresults.json ]; then
# Print the filename if it doesn't exist
@aih
aih / options.js mur
Created March 5, 2022 06:21
Options.js with tables
{
// Selectors for mapping sections in document:
// `key` is a selector to match section
// `value` is an array of element selectors to exclude
// from matched section content. Pass empty array or `null`
// to include full content
match: ['part', 'title', 'subtitle', 'chapter', 'section', 'toc', 'tocItem', 'preface', 'level', 'table', 'recital', 'resolutions', 'conclusions', 'preamble', 'regulation', 'schedule', 'article', 'docTitle', 'docNum'],
// Selectors to ignore sections matched by `match` option
ignore: ['paragraph//paragraph', 'level//paragraph', 'level//level'],
@aih
aih / options xcdiff
Created December 14, 2021 22:14
Base options to use in xcdiff
{
// Selectors for mapping sections in document:
// `key` is a selector to match section
// `value` is an array of element selectors to exclude
// from matched section content. Pass empty array or `null`
// to include full content
match: ['xml-meta', 'division', 'part', 'title', 'subtitle', 'chapter', 'section', 'intermediatelevel', 'majorlevel'],
//match: ['part', 'title', 'subtitle', 'chapter', 'section', 'toc', 'tocItem', 'docTitle', 'preamble', 'level', 'recital', 'resolutions', 'conclusions', 'regulation', 'table'],
// Selectors to ignore sections matched by `match` option
ignore : ['section//section', 'section//division', 'section//part', 'section//title', 'section//subtitle', 'section//chapter'],
@aih
aih / docker-volume-inspect.txt
Created November 24, 2021 01:00
Find license path for docker/podman volumes
podman volume inspect <volumename>
@aih
aih / billsumaries-idea.txt
Created November 18, 2021 23:15
Bill summaries ML idea
One long-term idea that would be very interesting and valuable is to train a model to produce bill summaries.
The bill summaries are in XML in bulk at this site:
https://www.govinfo.gov/bulkdata/BILLSUM/117/hr
For example: https://www.govinfo.gov/bulkdata/BILLSUM/117/hr/BILLSUM-117hr1177.xml
In the `summary-text` element, within <![CDATA[ ]]>
@aih
aih / catalog.xml
Created November 18, 2021 17:16
Forms of schema and dtd paths for bills, amendments and resolutions
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<!-- DTD -->
<system systemId="res.dtd" uri="schemas/res.dtd"/>
<system systemId="C:\Program Files (x86)\JustSystems\rules\bill.dtd" uri="schemas/bill.dtd"/>
<system systemId="http://xml.house.gov/bill.dtd" uri="schemas/bill.dtd"/>
<system systemId="http://xml.house.gov/amend.dtd" uri="schemas/amend.dtd"/>
<public publicId="-//US Congress//DTDs/bill.dtd//EN" uri="schemas/bill.dtd"/>
<public publicId="-//US Congress//DTDs/res.dtd//EN" uri="schemas/res.dtd"/>
<public publicId="-//US Congress//DTDs/bill v2.8 20020720//EN" uri="schemas/bill.dtd"/>
@aih
aih / files_bigger_than
Created October 21, 2021 18:45
Command line utility (written by GH Copilot) to list files greater than a certain size
files_bigger_than() {
local file_path="$1"
local size="$2"
local file_size=$(stat -c%s "$file_path")
if [ "$file_size" -gt "$size" ]; then
echo "$file_path"
fi
}
@aih
aih / mac-docker.adoc
Last active October 24, 2021 17:55
Restart docker on mac
$ docker-machine restart default
Restarting "default"...
(default) Check network to re-create if needed...
(default) Waiting for an IP...
Waiting for SSH to be available...
Detecting the provisioner...
Restarted machines may have new IP addresses. You may need to re-run the `docker-machine env` command.

$ docker-machine env default
@aih
aih / createdirs.adoc
Created July 9, 2021 07:44
Create directory with bash expansion
$ mkdir -p bills/{111..117}/{pdf,dtd,uslm,compare}

$ tree bills

bills
├── 111
│   ├── compare
│   ├── dtd
│   ├── pdf