Skip to content

Instantly share code, notes, and snippets.

Avatar

Sherwood Callaway shcallaway

View GitHub Profile
@shcallaway
shcallaway / README.md
Last active Jul 11, 2021
List CMS datasets
View README.md

List names of CMS datasets in Socrata:

curl -s 'http://api.us.socrata.com/api/catalog/v1?domains=data.cms.gov&search_context=data.cms.gov&limit=2000' | jq ".results | .[] | .resource.name"

Put CMS dataset names, ids, descriptions, etc. into CSV:

curl -s 'http://api.us.socrata.com/api/catalog/v1?domains=data.cms.gov&search_context=data.cms.gov&limit=2000' | jq ".results | .[] | .resource | [.name, .id, .description, .createdAt, .updatedAt, .data_updated_at] | @csv"
@shcallaway
shcallaway / db-write.py
Created Jul 5, 2021
Writes CPTs and cases to MySQL database from CSV
View db-write.py
#!/usr/local/bin/python3
# pip3 install mysql-connector-python
import mysql.connector
import argparse
import csv
DB_USER = 'root'
DB_HOST = 'localhost'
DB_NAME = 'opkit'
@shcallaway
shcallaway / sis-parser.py
Created Jul 4, 2021
Parse CPT codes and case info from a SIS Complete CSV export
View sis-parser.py
#!/usr/local/bin/python3
import csv
import re
import argparse
CASES_OUT = 'sis-parser-out-cases.csv'
CPTS_OUT = 'sis-parser-out-cpts.csv'
parser = argparse.ArgumentParser(prog="sis-parser.py", description='Parse cases and CPT codes from a SIS Complete CSV export.')
@shcallaway
shcallaway / linkedin-copy.js
Created Jun 7, 2021
Create CSV of LinkedIn people search results
View linkedin-copy.js
// Here's a quick script you can copy-paste into your browser console to copy a CSV of people search results from LinkedIn
// Example: https://www.linkedin.com/search/results/people/?currentCompany=%5B%223282%22%2C%2236494%22%2C%222142019%22%5D&geoUrn=%5B%22103644278%22%5D&origin=FACETED_SEARCH&profileLanguage=%5B%22en%22%5D&title=sales%20rep
var contentStrings = Array.from(document.querySelectorAll('.entity-result__content')).map(node => node.innerText)
var tuples = contentStrings.map(function(contentString) {
var split = contentString.split("\n")
return [split[0], split[split.length-2], split[split.length-1]]
})
@shcallaway
shcallaway / README.md
Last active May 25, 2021
How to use XSLT to transform XML files
View README.md

I had some XML files containing structured data. I wanted to insert this data in to a SQL database. So I needed to figure out how to transform the XML into SQL statements. Turns out, there is something called XLST that can be used to programmatically transform XML files into... well... whatever you want. So here's how I used XSLT to transform XML into SQL statements.

  1. Download Saxon-HE from here: https://saxonica.com/download/java.xml
  2. Create your xml file:
<!-- /Users/sherwood/Code/cdcatalog.xsl -->
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="cdcatalog.xsl"?>
<catalog>
@shcallaway
shcallaway / README.md
Created Feb 28, 2020
Use curl to see Logstash metrics
View README.md

You can get some really detailed stats about Logstash pipelines via the HTTP API. See the docs for more!

$ curl -XGET 'localhost:9600/_node/stats/pipelines?pretty'
{
  "host" : "logstash-msk-679b6b8dd9-4pd2h",
  "version" : "7.4.0",
  "http_address" : "0.0.0.0:9600",
  "id" : "ba41dded-1a12-41e9-988f-03bd4eae9d4a",
  "name" : "logstash-msk-679b6b8dd9-4pd2h",
@shcallaway
shcallaway / multiprocessing.py
Created Feb 27, 2020
Demo script to illustrate how multiprocessing.Pool.map is blocking
View multiprocessing.py
import time
import multiprocessing
def delay_print(message):
time.sleep(1)
print(message)
for i in range(10):
@shcallaway
shcallaway / README.md
Last active Dec 18, 2019
Use jq to parse JSON logs into something more readable
View README.md

Structured logs are way better than normal logs for a whole bunch of reasons, but they can sometimes be a pain to read in the shell. Take this logline for example:

{"erlang_pid":"#PID<0.1584.0>","level":"error","message":"Got error when retry: :econnrefused, will retry after 1535ms. Have retried 2 times, :infinity times left.","module":"","release":"c2ef629cb357c136f529abec997426d6d58de485","timestamp":"2019-12-17T19:22:11.164Z"}

This format is hard for a human to parse. How about this format instead?

error | 2019-12-17T19:21:02.944Z | Got error when retry: :econnrefused, will retry after 1648ms. Have retried 2 times, :infinity times left.
@shcallaway
shcallaway / README.md
Last active Nov 7, 2019
Datadog reserved log attributes
View README.md

Datadog's reserved log attributes are confusing as heck. It's not clear what each attribute does, so you can’t predict or understand what will happen when you create a mapping. Allow me to demonstrate.

I got this logline from my Datadog S3 archive bucket. It gives you a sense of what logs look like after going through Datadog's opaque transformations.

{
    "_id": "AW5Hc8y8FxIBf2udiA1a", // Log ID generated by Datadog
    "attributes": { // Key-values from the original JSON logline are moved under "attributes"
        "@timestamp": "2019-11-07T19:59:59.804Z",
        "@version": "1",
@shcallaway
shcallaway / README.md
Created Oct 7, 2019
View paginated, colorized JSON in your terminal with jq and less
View README.md

You can view paginated, colorized JSON in your terminal by piping JSON to jq -C and less -R:

curl https://jsonplaceholder.typicode.com/posts | jq -C | less -R