Skip to content

Instantly share code, notes, and snippets.

View kmcelwee's full-sized avatar
🌱

Kevin McElwee kmcelwee

🌱
View GitHub Profile
@kmcelwee
kmcelwee / derrida-puppeteer.js
Created October 28, 2021 20:19
Puppeteer script for all elements we want to view in browsertrix
const puppeteer = require('puppeteer');
function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
@kmcelwee
kmcelwee / derrida-290.py
Created October 27, 2021 16:22
Generate IIIF proxy mapfile
# goal is to map from
# /library/levi-strauss-anthropologie-structurale-1958/gallery/images/front-cover/iiif/
# ..to..
# https://iiif-cloud.princeton.edu/iiif/2/ee%2F09ce512bce02%2Fintermediate_file
import json
from django.urls import reverse
from derrida.books.models import Instance
from derrida.common.utils import absolutize_url
@kmcelwee
kmcelwee / pre-commit
Last active June 15, 2021 15:55 — forked from gmodarelli/pre-commit
RuboCop with git pre-commit
#!/bin/sh
#
# Check for ruby style errors and autocorrect.
red='\033[0;31m'
green='\033[0;32m'
yellow='\033[0;33m'
NC='\033[0m'
# Get only the staged files
@kmcelwee
kmcelwee / generate-thesis-tickets.sh
Created June 10, 2021 16:51
Create the annual thesis tickets
gh issue create --title "Import African American Studies theses to dataspace prod" --label "senior theses" --body "When completed, don't forget to update [Lynn's spreadsheet](https://docs.google.com/spreadsheets/d/1NUUN0B1pc6iWxQoTYUW_WE0AwGCMt_qqC-e3CVlG-P0/edit#gid=0)";
sleep 3;
gh issue create --title "Import Anthropology theses to dataspace prod" --label "senior theses" --body "When completed, don't forget to update [Lynn's spreadsheet](https://docs.google.com/spreadsheets/d/1NUUN0B1pc6iWxQoTYUW_WE0AwGCMt_qqC-e3CVlG-P0/edit#gid=0)";
sleep 3;
gh issue create --title "Import Architecture School theses to dataspace prod" --label "senior theses" --body "When completed, don't forget to update [Lynn's spreadsheet](https://docs.google.com/spreadsheets/d/1NUUN0B1pc6iWxQoTYUW_WE0AwGCMt_qqC-e3CVlG-P0/edit#gid=0)";
sleep 3;
gh issue create --title "Import Art and Archaeology theses to dataspace prod" --label "senior theses" --body "When completed, don't forget to update [Lynn's spreadsheet](https://docs.google.com/s
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@kmcelwee
kmcelwee / twitter-reply-exception.json
Created March 22, 2021 19:12
A tweet that is a reply, but has `in_reply_to_status_id` null because the original user deleted their tweet. (Twitter Dev conversation: https://twittercommunity.com/t/the-commonly-described-ways-of-determining-whether-a-tweet-is-a-reply-seem-wrong/151579)
{
"created_at": "Wed Nov 18 19:02:24 +0000 2020",
"id": 1329137902199005184,
"id_str": "1329137902199005184",
"full_text": "@2legit2dunk https://t.co/4lx6Z4wqAp",
"truncated": false,
"display_text_range": [
13,
36
],

PostgreSQL & other queries in Dataspace

To enter the Postgres command line, you need to be the dspace user (sudo su - dspace). The command is psql. Here is a link to the database diagram for DSpace 5.

It sometimes may be quicker to use the REST API than creating a complicated query. And the JRuby DSpace wrapper (documentation) may be simpler as well.

Useful commands:

  • \dt: describe all tables
  • \d {TABLE}: describe the given table
  • \copy ({query}) to '{filename}' as CSV HEADER: saves the query to a CSV with a header
import pandas as pd
df = pd.read_csv('pgp.csv')
df_multi_type = df[~pd.isna(df['Type']) & df['Type'].str.contains(';')]
df_multi_type['Type'].count() # 148 multi-type PGPIDs
df_multi_type[df_multi_type['Library'] == 'CUL']['Type'].count() # 75 PGPIDs multi-type from CUL
# list 148 of PGPIDs
31166
32188