Skip to content

Instantly share code, notes, and snippets.

View atomic77's full-sized avatar

Alex Tomic atomic77

View GitHub Profile

This is a fast, but incomplete, parser for JSON-based wikibase dump files that I wrote to extract large amounts of info without having to load them into a database. I don't think I'll have enough time to turn this into a full-fledged Rust library for the wikibase format, so I'm posting it as a Gist in case it's useful to someone in its current state.

It can be used to filter entries based on a claim, and extract out only desired properties. By default, it reads from stdin as a single page per line.

For example, to select the birth and death time (properties P569, P570) for all instances of (claim Q5) human-being (property P31):

$ zcat latest-all.json.gz | head -6 | cargo run --release -- --filter-property P31 --filter-claim Q5 --select-properties P569 P570
@atomic77
atomic77 / GHCN_to_Pandas.ipynb
Last active November 9, 2020 03:06
Convert GHCN weather data files to pandas dataframes
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Setting up an Armbian build server with Multipass

In order to build an Armbian image from scratch, whether for development purposes or to apply user customizations on top of a base image, a build environment is required. Per the Armbian documentation, Ubuntu 20.04 is the officially supported build platform.

There is some support for Docker, though in my tests this has been a suboptimal experience. Even if you have access to Ubuntu 20.04 as your bare metal machine, the build process makes liberal use of sudo throughout, so it's probably not a bad idea to isolate the build process with a VM in any case.

Since the build environment is designed for Ubuntu, the flexibility (and complication) of using Vagrant to provision a VM seems a bit much when there is Multipass available that is desi

Keybase proof

I hereby claim:

  • I am atomic77 on github.
  • I am atomic77 (https://keybase.io/atomic77) on keybase.
  • I have a public key ASA9S2fRpL4G-gqywe3U0PCO6UOG_DuOaCpBZgEZKLA-oAo

To claim this, I am signing this object:

@atomic77
atomic77 / README.md
Created March 18, 2017 20:16
Loading wikimedia dumps into Elasticsearch

Wikipedia uses elasticsearch in production for full-text search after moving from a homegrown tool based on Lucene. Snapshots for easy bulk import are available for all the various datasets - much easier to work with than the SQL and XML dumps!

Tested with italian wikinews - everything seems to be loaded into a page document type. Not entirely sure what the timestamp field is, but seems like the last time the page was changed?