Skip to content

Instantly share code, notes, and snippets.

View joewiz's full-sized avatar

Joe Wicentowski joewiz

  • Arlington, Virginia
View GitHub Profile
xquery version "3.0";
(:
Goal: Take a TEI document containing <ref> elements that need to be fixed, and fix these with XQuery Update.
Specifically, we find the page number references from the text node immediately following the <ref> element,
and move the page number inside the <ref> element. (I've simplified my data and the query to illustrate.)
Problem: The XQuery Update statement corrupts the sample.xml file. The resulting file has 0 bytes. When I
comment out the XQuery Update statement and uncomment the $test variable in the return expression, I get
expected results, so I think the logic is sound. Also, when I comment out line 25, the corruption doesn't
@joewiz
joewiz / frus-status-and-formats.xq
Created May 13, 2016 17:29
Generate report about FRUS publication status & output formats
xquery version "3.1";
(: Generate report about FRUS publication status & output formats
: Assumes full installation of https://github.com/HistoryAtState/hsg-project and cached s3 resources (not public)
:)
import module namespace config="http://history.state.gov/ns/site/hsg/config" at "xmldb:exist:///db/apps/hsg-shell/modules/config.xqm";
import module namespace fh="http://history.state.gov/ns/site/hsg/frus-html" at "xmldb:exist:///db/apps/hsg-shell/modules/frus-html.xqm";
declare namespace tei="http://www.tei-c.org/ns/1.0";
@joewiz
joewiz / show-app-links.xq
Last active April 20, 2016 14:18
Check HTML files for href patterns, with XQuery
@joewiz
joewiz / test-page-responses.xq
Created April 20, 2016 14:13
Check pages for HTTP response status codes, with XQuery
xquery version "3.0";
declare namespace output="http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method "html5";
declare option output:media-type "text/html";
let $pages :=
(
'/',
@joewiz
joewiz / chiefs-of-mission-tsv.xq
Created April 5, 2016 20:33
Chiefs of Mission since 2001 as TSV, with XQuery
xquery version "3.1";
(:
Chiefs of Mission since 2001 as TSV
Assumes eXist-db 3.0RC1 or newer (relies on XQuery 3.1 arrays support)
@see https://github.com/eXistSolutions/hsg-shell
@see https://github.com/joewiz/gsh
:)
import module namespace app="http://history.state.gov/ns/site/hsg/templates" at "/db/apps/hsg-shell/modules/app.xqm";
@joewiz
joewiz / facet-case-1.xq
Last active December 27, 2015 21:09
EXPath Facet Spec workbook
xquery version "3.0";
(:~ An implementation of facet:count as described in "Case 1: Simple facet based on existing attribute"
of the EXPath Facet Spec. Depends on eXist's util:eval() function to handle dynamic path expressions.
@see http://expath.org/spec/facet/20151225#case-1-simple-facet-based-on-existing-attribute
:)
import module namespace util="http://exist-db.org/xquery/util";
@joewiz
joewiz / deep-dedupe.xq
Last active March 3, 2016 04:46
Deduplicate a sequence of mixed-content items, with XQuery
xquery version "3.0";
declare namespace functx = "http://www.functx.com";
(: http://www.xqueryfunctions.com/xq/functx_index-of-deep-equal-node.html :)
declare function functx:index-of-deep-equal-node
( $nodes as node()* ,
$nodeToFind as node() ) as xs:integer* {
for $seq in (1 to count($nodes))
@joewiz
joewiz / post-mortem.md
Last active September 3, 2023 11:57
Recovery from nginx "Too many open files" error on Amazon AWS Linux

On Tue Oct 27, 2015, history.state.gov began buckling under load, intermittently issuing 500 errors. Nginx's error log was sprinkled with the following errors:

2015/10/27 21:48:36 [crit] 2475#0: accept4() failed (24: Too many open files)

2015/10/27 21:48:36 [alert] 2475#0: *7163915 socket() failed (24: Too many open files) while connecting to upstream...

An article at http://www.cyberciti.biz/faq/linux-unix-nginx-too-many-open-files/ provided directions that mostly worked. Below are the steps we followed. The steps that diverged from the article's directions are marked with an *.

  1. * Instead of using su to run ulimit on the nginx account, use ps aux | grep nginx to locate nginx's process IDs. Then query each process's file handle limits using cat /proc/pid/limits (where pid is the process id retrieved from ps). (Note: sudo may be necessary on your system for the cat command here, depending on your system.)
  2. Added fs.file-max = 70000 to /etc/sysctl.conf
@joewiz
joewiz / exist-xpath-functions.xq
Last active April 1, 2020 15:18
Compare XPath functions in W3C spec vs. eXist 3.4.0
xquery version "3.1";
element modules {
util:registered-modules()[starts-with(., 'http://www.w3')] !
element module {
element namespace-uri {.},
util:registered-functions(.) !
element function {.}
}
}
@joewiz
joewiz / strip-diacritics.xq
Last active October 22, 2022 14:42
Strip diacritics, with XQuery
xquery version "3.1";
declare function local:strip-diacritics($string as xs:string) as xs:string {
$string
=> normalize-unicode("NFD")
=> replace("\p{IsCombiningDiacriticalMarks}", "")
};
declare function local:inspect-diacritics($string as xs:string) as element() {
let $normalized := normalize-unicode($string, "NFD")