Skip to content

Instantly share code, notes, and snippets.

View joewiz's full-sized avatar

Joe Wicentowski joewiz

  • Arlington, Virginia
View GitHub Profile
@joewiz
joewiz / maps-to-csv.xq
Last active February 5, 2024 14:35
Convert XQuery maps to CSV (or TSV)
xquery version "3.1";
(: Convert a sequence of XQuery maps into CSV or TSV.
: Each map becomes one row.
: The entries' keys become column headers.
:)
declare variable $local:default-options :=
map {
(: Character to separate cells with :)
@joewiz
joewiz / chmod-recursive.xq
Created June 3, 2021 20:54
Batch change permissions on resources and collections in eXist-db
xquery version "3.1";
import module namespace dbutil="http://exist-db.org/xquery/dbutil";
dbutil:scan(
xs:anyURI("/db/apps/airlock-data"),
function($col, $res) {
if ($res) then
(: Set permissions on resources here :)
(
@joewiz
joewiz / an-introduction-to-recursion-in-xquery.md
Last active January 3, 2024 15:30
An introduction to recursion in XQuery

An introduction to recursion in XQuery

  • Created: Nov 28, 2017
  • Updated: Nov 29, 2017: Now covers transformation of XML documents

Recursion is a powerful programming technique, but the idea is simple: instead of performing a single operation, a function calls itself repeatedly to whittle through a larger task. In XQuery, recursion can be used to accomplish complex tasks on data that a plain FLWOR expression (which iterates through a sequence) cannot, such as transforming an entire XML document from one format into another, like TEI or DocBook into HTML, EPUB, LaTeX, or XSL-FO. Transforming a document is well-suited to recursion because each of the document's nodes may need to be examined and manipulated based on the node's type, name, and location in the document; and once a node has been processed, the transformation must continue processing the nodes' children and descendants until the deepest leaf node has been processed. But learning the technique of recursion is often hard for a beginning program

@joewiz
joewiz / post-mortem.md
Last active September 3, 2023 11:57
Recovery from nginx "Too many open files" error on Amazon AWS Linux

On Tue Oct 27, 2015, history.state.gov began buckling under load, intermittently issuing 500 errors. Nginx's error log was sprinkled with the following errors:

2015/10/27 21:48:36 [crit] 2475#0: accept4() failed (24: Too many open files)

2015/10/27 21:48:36 [alert] 2475#0: *7163915 socket() failed (24: Too many open files) while connecting to upstream...

An article at http://www.cyberciti.biz/faq/linux-unix-nginx-too-many-open-files/ provided directions that mostly worked. Below are the steps we followed. The steps that diverged from the article's directions are marked with an *.

  1. * Instead of using su to run ulimit on the nginx account, use ps aux | grep nginx to locate nginx's process IDs. Then query each process's file handle limits using cat /proc/pid/limits (where pid is the process id retrieved from ps). (Note: sudo may be necessary on your system for the cat command here, depending on your system.)
  2. Added fs.file-max = 70000 to /etc/sysctl.conf
@joewiz
joewiz / csv-to-xml.xq
Last active August 30, 2023 23:19
Convert CSV to XML, with XQuery
xquery version "3.1";
(: XQuery adaptation of https://github.com/digital-preservation/csv-tools/blob/master/csv-to-xml_v3.xsl.
See also the thread on basex-talk https://mailman.uni-konstanz.de/pipermail/basex-talk/2016-September/011272.html.
:)
declare function local:get-cells($row as xs:string) {
(: workaround for lack of lookahead support: append comma to end of row :)
let $string-to-analyze := $row || ","
let $analyze := fn:analyze-string($string-to-analyze, '(("[^"]*")+|[^,]*),')
@joewiz
joewiz / fix-straight-quotes.xq
Last active August 8, 2023 15:45
Convert quotes from straight to curly, with XQuery
xquery version "3.1";
(: This uses the eXist cache module to mimic xsl:accumulator approach described
: by Norm Walsh at https://so.nwalsh.com/2023/08/08-accumulators :)
declare function local:initiate-cache() {
cache:destroy("quotes"),
cache:create("quotes", map{}),
cache:put("quotes", "counter", 1)
};
@joewiz
joewiz / web-scraping-with-xquery.md
Last active March 9, 2023 07:18
Web Scraping with XQuery

Web Scraping with XQuery

Overview

Learning how to web scrape empowers you to apply your XQuery skills to any data residing on the web. You can fetch data from remote sites and services—for example, entire web pages or just the pieces of a page that matter to you. Once fetched, you can perform further analysis on the data, clean it up, mash it up with other data, transform it into different formats, etc.

Built-in functions for making HTTP requests

XPath-based languages like XQuery offer an standard function for accessing remote documents, the fn:doc() function. However, a limitation of this function is that it only works if the URI returns a well-formed XML document.

@joewiz
joewiz / CompareTools.plist
Last active February 28, 2023 05:25
Add oXygen as Diff & Merge Tool for Git Tower
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<array>
<dict>
<key>ApplicationIdentifier</key>
<string>ro.sync.exml.DiffDirs</string>
<key>ApplicationName</key>
<string>Diff Directories</string>
@joewiz
joewiz / json-ignore-whitespace-text-nodes-param.xq
Created January 25, 2023 18:43
Sample code using eXist-db json-ignore-whitespace-text-nodes parameter
xquery version "3.1";
(: @see https://github.com/eXist-db/exist/commit/53bdb54c664a8063e43392e0b0fb9eac57baf67d#diff-fc5225a3545e6b6807e7810bc9e40219500842a68dbb49a7eb83670016b160ab :)
declare boundary-space preserve;
let $json-ignore-whitespace-text-nodes-param-xml :=
<output:serialization-parameters xmlns:output="http://www.w3.org/2010/xslt-xquery-serialization">
<output:method value="json"/>
<exist:json-ignore-whitespace-text-nodes value="yes"/>
@joewiz
joewiz / restxq-template.xqm
Created November 23, 2014 04:04
A starter template for a RestXQ module
xquery version "3.0";
module namespace my-app = "http://my/app";
(: A starter template for a RestXQ module.
Save this file into eXist anywhere (e.g., /db/restxq-template.xqm or /db/apps/my-app/modules/restxq-template.xqm),
And access at http://localhost:8080/exist/restxq/index.html.
(Why? Because /restxq is the URL space for RestXQ, and the function's %rest:path annotation is for requests to "/index.html".)
Note that the collection configuration file where you store the module must invoke the RestXQ trigger:
<collection xmlns="http://exist-db.org/collection-config/1.0">