Skip to content

Instantly share code, notes, and snippets.

View saulpw's full-sized avatar

Saul Pwanson saulpw

View GitHub Profile
@saulpw
saulpw / gist:8f86f7cbca9f2d93f67c58072bd26e28
Created October 11, 2023 22:53
VisiData pipe-separated-values
@VisiData.api
def open_psv(vd, p):
return PsvSheet(p.name, source=p)
class PsvSheet(CsvSheet):
pass
PsvSheet.options.csv_delimiter = '|'
@saulpw
saulpw / infer-schema.py
Created June 28, 2022 03:22
infer pyarrow schema from JSON
#!/usr/bin/env python3
import sys
import json
import gzip
import pyarrow as pa
def parse_jsonl(fp):
@saulpw
saulpw / timesync-test.py
Created April 30, 2022 18:59
Time alignment problem
from collections import defaultdict
def get_time_offsets(events):
'''events: list of time pairs (dev1, t1, dev2, t2) such that event at t1 guaranteed happens before event at t2
Return dict of dev:(offset, error)
then, adding offset to timestamps from its dev translates them into a common timespace.
'''
devs = defaultdict(lambda: ([], [])) # [dev] -> (events with dev time before, events with dev time after)
for e in events:
@saulpw
saulpw / gist:2c293d8963b66a87d443c07af2b4314a
Created October 13, 2021 19:04
visidata command to slide current row to the right
TableSheet.addCommand('zL', 'slide-cells-right', '''
for oldcol, newcol in reversed(list(zip(visibleCols[cursorVisibleColIndex:], visibleCols[cursorVisibleColIndex+1:]))):
newcol.setValue(cursorRow, oldcol.getValue(cursorRow))
visibleCols[cursorVisibleColIndex].setValue(cursorRow, None)
''', 'slide cells in current row one column to the right')
@saulpw
saulpw / HA-CAA-I-ANY-saulpw.md
Last active June 15, 2021 04:41
Saul Pwanson Contributor Agreement

Based on Harmony (HA-CAA-I-ANY) Version 1.0

Individual Contributor Assignment Agreement

Thank you for your interest in contributing to a project by Saul Pwanson ("We" or "Us").

This contributor agreement ("Agreement") documents the rights granted by contributors to Us.

This is a legally binding document, so please read it carefully before agreeing to it. The Agreement may cover more than one software project managed by Us.

A minimalistic internal options framework in Python

Pros:

  • implementation in very few lines of code
  • declaration and usage are very convenient
  • options are automatically typed (and Exception raised on set if conversion fails)

Through my frustration as a data engineer, I have developed a taste for how I want my data to be organized, to make its consumption as convenient and efficient as possible.

tl;dr:

- one .hdf5 file for data from 1MB-10GB
- all raw source data in embedded .zip
- automated construction process also in .zip
- elections-us-1776-2016.hdf5, 1GB file with all election data

--

#!/usr/bin/python3
import curses
def ctrl(ch):
return ord(ch) & 31 # convert from 'a' to ^A keycode
ENTER = ctrl('j')
ESC = 27