Skip to content

Instantly share code, notes, and snippets.

@dimus
dimus / Hash.from_xml using Nokogiri
Created March 17, 2010 14:29
Adding Hash.from_xml method using Nokogiri
# USAGE: Hash.from_xml:(YOUR_XML_STRING)
require 'nokogiri'
# modified from http://stackoverflow.com/questions/1230741/convert-a-nokogiri-document-to-a-ruby-hash/1231297#1231297
class Hash
class << self
def from_xml(xml_io)
begin
result = Nokogiri::XML(xml_io)
return { result.root.name.to_sym => xml_node_to_hash(result.root)}
@atomotic
atomotic / h3-new-job
Last active September 23, 2015 18:58 — forked from anonymous/h3-new-job
#!/bin/bash
. heritrix.conf
if [ -z "$1" ] || [ -z "$2" ]; then
echo usage: $0 jobname seedsfile
exit
fi
JOB=$1
#!/usr/bin/env python
import grp
import mimetypes
from optparse import OptionParser
import os
from pprint import pprint
import pwd
from stat import *
import sys
<?xml version="1.0" encoding="UTF-8" ?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
<?php
// The gChart PHP library is required in order to make this work. You can download it from http://code.google.com/p/gchartphp/
// Make sure you put it in the same directory as this script
ini_set('display_errors','1');
$server_address = 'http://142.132.138.20:8983';
require ('gChart.php');
if ( isset($_GET['query'])) {
$query = 'mimetype:' . strtolower($_GET['query']) . '*';
@acdha
acdha / ocr-file.py
Created March 17, 2014 22:49
Fragment of code used to process images with Tesseract OCR
def ocr_file(filename, languages, output_base, temp_dir):
log.info("Launching tesseract on %s", filename)
output = subprocess.check_output(['tesseract', filename, output_base,
'-l', '+'.join(languages), TESSERACT_CONFIG],
cwd=temp_dir,
stderr=subprocess.STDOUT)
with OCR_STORAGE.open('%s/%s/%s.log' % (item_id, group, index), 'w') as log_f:
log_f.write(output)
package itforarchivists
import (
"encoding/json"
"fmt"
"net/http"
"github.com/richardlehane/siegfried/pkg/core"
"github.com/richardlehane/siegfried/pkg/pronom"
)
@mschoch
mschoch / test-bleve-search.sh
Last active June 18, 2023 03:53
bleve - create index, index JSON, query index
#!/bin/sh
# create a custom mapping
cat > /tmp/mapping.json << MAPPING
{
"types": {
"_default": {
"properties": {
"location": {
"properties": {
#!/bin/bash
PREFIX=$(basename "$1" .pdf)
if [ ! -z "$TESSERACT_FLAGS" ]; then
echo "Picked up TESSERACT_FLAGS: $TESSERACT_FLAGS"
fi
echo "Prefix is: $PREFIX"
echo "Converting to TIFF..."
if command -v parallel >/dev/null 2>&1; then
LAST_PAGE=$(($(pdfinfo "$1"|grep '^Pages:'|awk '{print $2}') - 1))
@olberger
olberger / shibb-cas-get.sh
Created December 4, 2014 09:59
Connection to a web app protected via Shibboleth with curl
#!/bin/sh
#set -x
# Usage: shibb-cas-get.sh {username} {password} # If you have any errors try removing the redirects to get more information
# The service to be called, and a url-encoded version (the url encoding isn't perfect, if you're encoding complex stuff you may wish to replace with a different method)
DEST=https://myapp.example.com/
SP=https://myapp.example.com/index.php
IDP="https://myidp.example.com/idp/shibboleth&btn_sso=SSOok"