Skip to content

Instantly share code, notes, and snippets.

View netj's full-sized avatar

Jaeho Shin netj

View GitHub Profile
#!/bin/sh
python -c 'import sys,os.path; sys.stdout.writelines([os.path.realpath(p)+"\n" for p in sys.argv[1:]])' a b c ~/Dropbox/해원이♥사진들/ ~/bin/deepdive
#include <iostream>
using namespace std;
int main() {
// See: https://www.quora.com/Is-cin-cout-slower-than-scanf-printf/answer/Anders-Kaseorg
ios::sync_with_stdio(false);
string line;
while (cin) {
getline(cin, line);
cout << line << "\n"; // endl is worse as it flushes, but makes not much difference
}
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
title: Mindtagger task for labeling whole documents
items: {
file: sampled_documents.csv
key_columns: [doc_id]
grouping_columns: [doc_id]
}
template: template.html
@netj
netj / dd
Last active March 22, 2016 00:30
Test kit for mkmimo on OS X
#!/usr/bin/env bats
# an attempt to test https://github.com/netj/mkmimo against cases where read(2)/write(2) returns error
load test_helpers
@test "exits with non-zero status with bad files" {
rm -fv non-existent-input non-existent-output
! mkmimo non-existent-input \> >non-existent-output || false
! mkmimo <non-existent-input \> non-existent-output || false
}
{"id":"01b906d8-0589-4c50-b6e1-46f173756dde","content":"Table 1. Key IMF Financial Statistics \r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r as of September 24, 2015 \r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r (In billions of SDRs, and end of period, unless indicated otherwise) 1\/ \r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r (For definitions, see below) \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \r \u00a0\r \u00a0\r \r \u00a0\r \r \u00a0\r \r 2015 \r \u00a0\r \r \u00a0\r \r \u00a0\r

OS X's dyld embarrassingly drops some environment variables it depends on as shown by the dyld-env-drop-test.sh script next to this note. This means you cannot wrap your binaries with shims that fixup environment variables, etc. and hope them to inherit any DYLD_* environment set from another layer of wrapper script. You must call the binary executables immediately after setting up dyld.

$ ./dyld-env-drop-test.sh
does     work: some_random_env_var='foo'
does     work: another_random_name='foo'
does     work: PATH='foo'
does     work: USER='foo'
@netj
netj / README.md
Created February 25, 2016 13:43 — forked from deepdiveDeployBot/README.md
DeepDive GitHub release assets metadata

This Gist archives metadata for replaced DeepDive release assets, from which we can sum download counts, etc.

@netj
netj / README.md
Last active January 7, 2016 00:46
Some overhead numbers on compressing factor graph binaries

Here are some preliminary tests to see if compressing the grounded factor graph binaries make sense or not.

  • Tested 1/6/2016 on raiders6 with 112 threads on an ext4fs backed by a SSD RAID.
  • Uses a 2MB factors binary from spouse example, and repeats it 1000 times to create a ~2GB data.
  • Confirmed reading/writing without compression puts no additional overhead as those IOs have to happen even with compression.

pbzip2 option seems promising: it gives 60-70% read performance with 10x less space and in turn IO. It cuts the output throughput down to 15% (33MB/s), but that may be less problematic if the grounding dump is the bottleneck.

  • The numbers here for compression may not extrapolate well to multiple grounding/dump processes that has to unload data from database, do format_converter, then doing pbzip2/lbzip2/pigz/gzip/bzip2 at the end, although they all seem IO bound.
  • However, the decompression side will probably extrapo