Skip to content

Instantly share code, notes, and snippets.

View tingletech's full-sized avatar

Brian Tingle tingletech

View GitHub Profile
@tingletech
tingletech / README
Last active November 16, 2021 23:37
curl https://glossapsycholinguistics.journalpub-dev.escholarship.org/article/id/19/ --silent | xmllint --html --xmlout - 2>/dev/null | xsltproc ./glosstract.xsl - > 19.html
files:
# Let's Encrypt's expired root
"/etc/pki/ca-trust/source/blacklist/x3.pem" :
mode: "000775"
owner: root
group: users
content: |
DST Root CA X3
==============
-----BEGIN CERTIFICATE-----
@tingletech
tingletech / README.md
Last active October 21, 2020 23:17
move_preprints manage command for janeway
usage: manage.py move_preprints [-h] [--version] [-v {0,1,2,3}] [--settings SETTINGS] [--pythonpath PYTHONPATH] [--traceback] [--no-color] active_user proxy_user

move preprints from a proxy account to a new account

positional arguments:
  active_user           `email` of new active account
  proxy_user            `email` of old proxy account

optional arguments:
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<responseDate>2020-07-27T22:54:52.731437+00:00</responseDate>
<request/>
<ListRecords>
<record>
<header>
<identifier>oai:calipshere:https://registry.cdlib.org/api/v1/collection/5376/:ark:/13030/k6xp75mg</identifier>
<datestamp>2018-01-22T12:53:11.041Z</datestamp>
</header>
<metadata>
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import argparse
import os
import sys
import json
from urllib.parse import urlparse
import boto3
# https://stackoverflow.com/q/48914324/1763984
@tingletech
tingletech / Z.txt
Last active October 18, 2019 20:27
scala> var r = spark.sql("select substring(first(repository_name)[0],0,50) as repository_name, first(campus_name)[0] as campus, count(title[0]) as total, count(distinct(title[0])) as uniq, count(distinct(title[0]))/count(title[0]) as uniq_ratio from calisphere group by repository_url[0] order by uniq_ratio desc, uniq desc")
r: org.apache.spark.sql.DataFrame = [repository_name: string, campus: string ... 3 more fields]
scala> r.show(300,false)
+--------------------------------------------------+----------------+------+------+-------------------+
|repository_name |campus |total |uniq |uniq_ratio |
+--------------------------------------------------+----------------+------+------+-------------------+
|Television Academy Foundation |null |904 |904 |1.0 |
|California State University, Stanislaus. Library |null |260 |260 |1.0 |
|Architecture and Design Collection, Art, Design an|UC S
# solr schema fields that have a `_ss` varient for facets
UCLDC_SCHEMA_FACETS = [
"title",
"alternative_title",
"contributor",
"coverage",
"creator",
"date",
"extent",
"format",
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import argparse
import os
import sys
import json
from urllib.parse import urlparse
import boto3
from pyspark.sql import SparkSession
from icecream import ic
# -*- coding: UTF-8 -*-
import unicodedata
import re
RE_ALPHANUMSPACE = re.compile(r'[^0-9A-Za-z\s]*') # \W include "_" as does A-z
def normalize_sort_field(sort_field,
default_missing='~title unknown',
missing_equivalents=['title unknown']):