Skip to content

Instantly share code, notes, and snippets.

View portableant's full-sized avatar
👹
Ducking and diving

Daniel Pett portableant

👹
Ducking and diving
View GitHub Profile
@portableant
portableant / findCreate.py
Created September 20, 2018 11:37
Restore innodb from frm and ibd files only
import argparse
import os
import re
parser = argparse.ArgumentParser(description='Process file.')
parser.add_argument('-f', '--file', help='The filename', required=True)
args = parser.parse_args()
def create_file(file):
print(file)
@portableant
portableant / wandnimagescrape.r
Created August 31, 2018 08:50
A gist for downloading images from W&N
setwd("{SET YOUR DIRECTORY HERE}")
library(jsonlite)
library(RCurl)
base <- 'http://webapps.fitzmuseum.cam.ac.uk/wndev/assets/'
data <- read.csv('imageList.csv')
log_con <- file("test.log")
download <- function(data){
folder = data[1]
file <- data[2]
image <- paste0(base,folder,'/', substr(file,1,nchar(file)-3) ,'.jpg')
Create core on ubuntu
sudo su - solr -c "/opt/solr/bin/solr create -c {core} -n data_driven_schema_configs"
Import csv on ubuntu
/opt/solr/bin/post -c {core} {path}/{filename}
Delete all documents and commit changes
curl {scheme}://{hostname}:{port}/solr/{core}/update?commit=true -H "Content-Type: text/xml" --data-binary '<delete><query>*:*</query></delete>'
@portableant
portableant / brokenandmissing.txt
Last active August 30, 2018 20:49
List of resources no longer available or functioning properly since I left the BM
404 or redirect
search.britishmuseum.org - Google Search Appliance knowledge portal
data.finds.org.uk - sparql endpoint and tripestore for Portable Antiquities Scheme
overpass.finds.org.uk - OSM instance running on Portable Antiquities Scheme
pastexplorers.org.uk - Anglo-Saxon flash village and website
vr.britishmuseum.org - developer warnings for breach of t&c
Github
Removed from britishmuseumdh team
@portableant
portableant / pasImageScrape.R
Last active June 18, 2018 23:16
Obtain PAS images in folders by object type
#' A script for getting images from PAS
#' Please do improve
#' My R skills are poor.
# Set your working directory
setwd("/Users/danielpett/Documents/research/electricarchaeo")
# Use the following libraries
library(jsonlite)
library(RCurl)
@portableant
portableant / scrapeCyprusImages.R
Created May 22, 2018 09:50
Download images from Fitzwilliam COL
setwd("/Users/danielpett/Documents/research/fitzwilliam/")
csv <- "cyprus.csv"
data <- read.csv(csv, header=T, na.strings=c("","NA"))
images <- data[!is.na(data$multimedia.0.processed.original.location),]
uris <- images$admin.uri
urlList <- as.character(uris)
print(urlList)
for(a in urlList){
] page <- a %>% read_html()
files <- page %>% html_nodes("img") %>% html_attr("src")
@portableant
portableant / extractExif.R
Created May 21, 2018 10:19
Extract exif tags using R and write to CSV
setwd("{directory}")
install.packages("devtools")
devtools::install_github("paleolimbot/exifr")
library(exifr)
image_files <- list.files(pattern = "*.jpg")
data <- as.data.frame(read_exif(image_files, tags = c("filename", "headline", "Description", "Keywords", "Title", "Copyright Notice")))
data$Keywords <- sapply(data$Keywords, paste, collapse=",")
write.csv(data, file='metadata.csv',row.names=FALSE, na="")
@portableant
portableant / esExtract.txt
Last active May 24, 2018 12:09
Extract data from CIIM
The CIIM uses an out of date version of ES 2.3.5.
Install elastic dump on windows (powershell)
$ npm install elasticdump@2.4.2 -g
Run dump for mappings
$ elasticdump --input=http://{IP or URL}/es/ --output mapping.json --type=mapping
@portableant
portableant / pybossa
Created April 10, 2018 23:18
Config files for pybossa EC2 instance
server {
listen 80;
server_name crowdsourced.micropasts.org;
large_client_header_buffers 4 32k;
real_ip_header X-Forwarded-For;
# change that to your pybossa directory
root /var/www/pybossa;
client_max_body_size 20M;
library(jsonlite)
url <- 'https://finds.org.uk/database/search/results/q/objectType%3A%22SEAL+MATRIX%22+inscription%3A%2A/format/json'
json <- fromJSON(url)
total <- json$meta$totalResults
results <- json$meta$resultsPerPage
pagination <- ceiling(total/results)
keeps <- c("id","old_findID","broadperiod", "inscription", "institution", "creator", "fourFigureLat", "fourFigureLon")
data <- json$results
data <- data[,(names(data) %in% keeps)]
for (i in seq(from=2, to=pagination, by=1)){