Skip to content

Instantly share code, notes, and snippets.

@joewiz
joewiz / post-mortem.md
Last active May 12, 2021
Recovery from nginx "Too many open files" error on Amazon AWS Linux
View post-mortem.md

On Tue Oct 27, 2015, history.state.gov began buckling under load, intermittently issuing 500 errors. Nginx's error log was sprinkled with the following errors:

2015/10/27 21:48:36 [crit] 2475#0: accept4() failed (24: Too many open files)

2015/10/27 21:48:36 [alert] 2475#0: *7163915 socket() failed (24: Too many open files) while connecting to upstream...

An article at http://www.cyberciti.biz/faq/linux-unix-nginx-too-many-open-files/ provided directions that mostly worked. Below are the steps we followed. The steps that diverged from the article's directions are marked with an *.

  1. * Instead of using su to run ulimit on the nginx account, use ps aux | grep nginx to locate nginx's process IDs. Then query each process's file handle limits using cat /proc/pid/limits (where pid is the process id retrieved from ps). (Note: sudo may be necessary on your system for the cat command here, depending on your system.)
  2. Added fs.file-max = 70000 to /etc/sysctl.conf
@joewiz
joewiz / yaml-to-xml.xq
Created Aug 22, 2016
Convert YAML to XML, with XQuery
View yaml-to-xml.xq
xquery version "3.0";
(: doesn't support YAML indentation yet - just a start :)
declare function local:process-yaml-value($value) {
let $single-quote := "^'(.+)'$"
let $double-quote := '^"(.+)"$'
return
if (matches($value, $single-quote) or matches($value, $double-quote)) then
let $pattern := "^['""](.+)['""]$"
@joewiz
joewiz / check-text-for-ocr-typo-patterns.xq
Last active Apr 9, 2021
Check a text for OCR typo patterns, using XQuery
View check-text-for-ocr-typo-patterns.xq
xquery version "3.1";
(:~
: Find possible OCR errors in a text by checking for patterns that an OCR
: process is known to misread, e.g., "day" misread as "clay", or "France"
: misread as "Prance." If the OCR engine just misread some instances of these
: words but got other instances correct, then this query will highlight
: candidates for correction.
:
: The query lets you configure a source text and define pattern sets to be used.
@joewiz
joewiz / enrich-dates-in-mixed-content.xq
Created Nov 21, 2017
Enrich dates in mixed content, with XQuery
View enrich-dates-in-mixed-content.xq
xquery version "3.1";
(: Turning "December 7, 1941" into <date>December 7, 1941</date> isn't too hard, with XPath 3.0's
fn:analyze-string() function, but if the date string occurs in mixed text, such as:
<p>Pearl Harbor was attacked on <em>December</em> 7, 1941.</p>
and you want to preserve the existing element structure to return:
<p>Pearl Harbor was attacked on <date><em>December</em> 7, 1941</date>.</p>
it's quite a bit more challenging.
This query uses string processing to align the results of fn:string-analyze() with the input's
@joewiz
joewiz / date-parser.xqm
Created Aug 26, 2018
Parse various formats of date strings, in XQuery
View date-parser.xqm
xquery version "3.1";
(:
Various Date String Parser
- Parses various flavors of date strings, returns as xs:dateTime or xs:date
- Key functions: dates:parseDateTime() and dates:parseDate()
- Adapted by Joe Wicentowski from
https://github.com/marklogic-community/commons/blob/master/dates/date-parser.xqy
- Adapted to standard XQuery (instead of the MarkLogic 0.9-ml flavor)
- TODO: test against https://github.com/marklogic-community/commons/blob/master/dates/date-parser-tests.xqy
@joewiz
joewiz / adaptive-serialization.xq
Created Sep 15, 2018
Boilerplate for declaring Adaptive serialization in XQuery
View adaptive-serialization.xq
xquery version "3.1";
declare namespace output="http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method "adaptive";
declare option output:indent "yes";
map { "reference": xs:anyURI("https://www.w3.org/TR/xslt-xquery-serialization-31/#adaptive-output") }
@joewiz
joewiz / group-by.xq
Last active Apr 9, 2021
How variables in XQuery FLWOR expressions change when using the "group by" clause
View group-by.xq
xquery version "3.1";
(:
## How variables in XQuery FLWOR expressions change when using the `group by` clause
Sometimes, when working with a `group by` clause, an XQuery FLWOR expression
might suddenly seem to act strangely, or at least unintuitively. In particular,
variables defined before the `group by` clause might suddenly seem to go haywire.
@joewiz
joewiz / zip-barebones.xq
Created Jan 17, 2020
Construct a zip file and stream it to a browser, with XQuery & eXist
View zip-barebones.xq
xquery version "3.1";
let $node := <root><x/></root>
let $entry := <entry name="test.xml" type="xml">{$node}</entry>
let $zip := compression:zip($entry, true())
let $name := "test.zip"
return
response:stream-binary($zip, "media-type=application/zip", "test.zip")
@joewiz
joewiz / tokenize-sentences.xq
Last active Apr 9, 2021
Split (or "tokenize") a string into "sentences", with XQuery. See http://joewiz.org/2013/06/29/one-paragraph-many-sentences/.
View tokenize-sentences.xq
xquery version "1.0";
(: A naive approach to sentence tokenization inspired by http://stackoverflow.com/a/2103653/659732
:
: Works well with edited text like newspapers. Parameters like punctuation can/should be edited;
: see the section below called "criteria".
:
: For a more sophisticated approach, see Tibor Kiss and Jan Strunk, "Unsupervised Multilingual
: Sentence Boundary Detection", Computational Linguistics, Volume 32, Issue 4, December 2006,
: pp. 485-525. Also, see these discussions of sentence tokenization:
@joewiz
joewiz / json-xml.xqm
Last active Jan 19, 2021
An implementation of XQuery 3.1's fn:json-to-xml and fn:xml-to-json functions for eXist
View json-xml.xqm
xquery version "3.1";
(:~
: An implementation of XQuery 3.1's fn:json-to-xml and fn:xml-to-json functions for eXist, which does not support them natively as of 4.3.0.
:
: @author Joe Wicentowski
: @version 0.4
: @see http://www.w3.org/TR/xpath-functions-31/#json
:)
module namespace jx = "http://joewiz.org/ns/xquery/json-xml";