Skip to content

Instantly share code, notes, and snippets.

@hoehrmann
hoehrmann / README.md
Created May 26, 2024 22:54
Using LLMs to convert old RFC .txt files to modern xml2rfc XML files (and turn them into modern .html files)

Frameworks like llama.cpp support context-free grammars to restrict the output of a large language model to a specific format.

The specification for the xml2rfc format comes with a RELAX NG schema that describes this particular format.

The RELAX NG specification defines its semantics based on a simpler format called the simple syntax. Some more advanced constructs are basically just syntactic sugar in this sense.

There are tools that convert the full format into the simple syntax.

The simple syntax is very easy to work for for all kinds of purposes.

@hoehrmann
hoehrmann / rag.py
Created May 26, 2024 22:07
LlamaIndex RAG CLI with local models
#!/usr/bin/env python
from llama_index.core.ingestion import IngestionPipeline, IngestionCache
from llama_index.core.query_pipeline import QueryPipeline
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.cli.rag import RagCLI
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index_client import CloudChromaVectorStore
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.llms.openai import OpenAI
@hoehrmann
hoehrmann / README.md
Last active May 3, 2024 22:21
SDMX-ML 2.0, SDMX-ML 2.1, SDMX-ML 3.0.0 structure messages to SDMX-JSON conversion rules

Missing:

  • A few elements allow infinite nesting, that is not yet represented
  • Some TODOs are mentioned in the YAML files directly.
  • Where XML paths have more variables than the JSON paths, variables have to be reviewed; Usually that means the first [n] in the JSON path would become [n,o] (SDMX-ML 3.0.0 distributes artefacts in a way that requires aggregation in SDMX-JSON 2.0)

Note:

  • SDMX 2.0 and SDMX 2.1 are converted to SDMX-JSON 1.0
  • SDMX 3.0.0 is converted to SDMX-JSON 2.0.0.
@hoehrmann
hoehrmann / xml-to-json-rules.yaml
Created April 21, 2024 23:14
XML to JSON mapping rule definitions
#
# In a JSON document, there is a unique normalized JSON Path for everything,
# so you can write out a JSON document as a combination of paths and values:
#
# $.meta.id = "ID123"
#
# Given some JSON schema document, you can write out all possible paths. To
# handle infinite nesting, part of the path could be described using a regular
# expression.
#
@hoehrmann
hoehrmann / json-to-json-path-lines.py
Last active April 16, 2024 22:07
Convert a JSON document into ndjson arrays with URI, RFC 9535 JSON Path (normalized and unique) and JSON value
import click
import ijson
import ijson.common
import json
import pathlib
import functools
def escape(s: str):
esc = {
"\u0000": "\\u0000",
@hoehrmann
hoehrmann / sqlite3_profile.c
Last active March 15, 2023 23:02
Quick and dirty LD_PRELOAD SQLite query logger/profiler
// This is now https://github.com/federlieb/federprof
@hoehrmann
hoehrmann / force-prefix-init-script.bash
Created August 16, 2021 21:22
Bash: prefix output of all commands with timestamp
function process_command() {
if [ "$$" -eq "$BASHPID" ]; then
if [[ "$HANDLER_INSTALLED" -ne "1" ]]; then
exec > >(
trap "" INT TERM;
awk '{ print strftime("%Y-%m-%d %H:%M:%S ", systime()) $0; fflush(stdout) }'
)
exec 2> >(
trap "" INT TERM;
awk '{ print strftime("%Y-%m-%d %H:%M:%S ", systime()) $0; fflush(stdout) }' >&2
@hoehrmann
hoehrmann / sqlite.abnf
Created June 9, 2019 18:57
ABNF for SQLite 3.28 SQL
; FIXME: The grammar has been transformed so that `w` appears after a
; token, but there is no way in ABNF to define it as token-separator
; that can optionally contain a mix of comments and white-space. Take
; `;;` as an example, for that to match `sql-stmt-list` `w` would
; have to match the empty string. But if `w` matches the empty string
; then `ISNOT` is the same as `IS NOT`.
sql-stmt-list = [ sql-stmt ] *( ";" w [ sql-stmt ] )
sql-stmt = [ "EXPLAIN" w [ "QUERY" w "PLAN" w ] ] ( alter-table-stmt / analyze-stmt / attach-stmt / begin-stmt / commit-stmt / create-index-stmt / create-table-stmt / create-trigger-stmt / create-view-stmt / create-virtual-table-stmt / delete-stmt / delete-stmt-limited / detach-stmt / drop-index-stmt / drop-table-stmt / drop-trigger-stmt / drop-view-stmt / insert-stmt / pragma-stmt / reindex-stmt / release-stmt / rollback-stmt / savepoint-stmt / select-stmt / update-stmt / update-stmt-limited / vacuum-stmt )
alter-table-stmt = "ALTER" w "TABLE" w [ schema-name w "." w ] table-na
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
use JSON;
use YAML::XS;
my $dbh = DBI->connect('dbi:SQLite:dbname=:memory:');
our $Arg;
WITH
bytes AS (
SELECT 0x00 AS byte
UNION ALL
SELECT byte+1 FROM bytes WHERE byte < 0xFF
),
base AS (
SELECT 0x0000 AS cp
UNION ALL
SELECT cp+1 FROM base WHERE cp < 0x10FFFF