Skip to content

Instantly share code, notes, and snippets.

@isoboroff
isoboroff / gist:5775326
Created June 13, 2013 16:46
This is a Python script to draw random lines from text files. The key application is where those files are much bigger than RAM and when you really don't want to read the entire file. It works by randomly seeking around in the files, then outputting the next full line. I am concerned that random.randrange(), random.randint(), file.seek(), and fi…
#!/usr/bin/env python2.7
import os
import random
import sys
import argparse
parser = argparse.ArgumentParser(description = 'Print random lines from a file')
parser.add_argument('-n', dest='sample_size', type=int, help='number of lines to sample', default=100)
parser.add_argument('files', nargs=argparse.REMAINDER, help='files to read from')
@isoboroff
isoboroff / gist:424fcdf63fa760c1d1a7
Created May 20, 2014 17:06
Getting out of Solr Zookeeper /solr/overseer/queue hell
I had a large index job crash at about doc 75M. This is with CDH4.6. I could not list the /solr/overseer/queue directory in ZK because it had millions of entries.
Here are the steps I followed to avoid a re-index:
1. Shut down all solr-servers and zookeeper-servers
2. Run zookeeper-server-initialize --force --myid X on each ZK server. This should result in an empty ZK space.
3. solrctl --init
4. hadoop mv /solr/the-collection /hold
5. Restart solr-server instances
6. solrctl instancedir --create ... # re-upload the config info
7. solrctl collection --create # with the same number of nodes, repls, as before
@isoboroff
isoboroff / gist:bca2aee7877567cf781b
Created July 17, 2015 19:25
Sometimes you mistakenly unpack tens of thousands of files into a single directory and maybe your OS/filesystem is unhappy about deleting it. This is recursive rm(1) in pure C.
#include <sys/types.h>
#include <dirent.h>
/**
* Usage: rm_all <dir-name>
* From lkml...
*/
#define u32 unsigned int
@isoboroff
isoboroff / getTrueName.c
Created November 9, 2017 22:19
Dereference an OS X Finder alias
// getTrueName.c
//
// DESCRIPTION
// Resolve HFS and HFS+ aliased files (and soft links), and return the
// name of the "Original" or actual file. Directories have a "/"
// appended. The error number returned is 255 on error, 0 if the file
// was an alias, or 1 if the argument given was not an alias
//
// BUILD INSTRUCTIONS
// gcc-3.3 -o getTrueName -framework Carbon getTrueName.c
@isoboroff
isoboroff / tweets-to-fortunes.py
Last active July 26, 2018 16:49
Convert a file of tweets into a file of fortunes as used by Unix fortune(1) and Emacs cookie-mode.
#!/usr/bin/env python3
import json
import argparse
import re
# source files from https://github.com/bpb27/trump_tweet_data_archive
# Removes URLs since M-x cookie-doctor gets confused by them
argparser = argparse.ArgumentParser(description='Convert from JSON array of condensed tweets to cookie format')
#!/usr/bin/env python3
if __name__ == "__main__":
import json
import argparse
import spacy
import dateparser
import signal
from contextlib import contextmanager
from tqdm import tqdm
@isoboroff
isoboroff / storage.py
Created February 18, 2021 14:08
A minimal Django custom storage backend to use GitPython to store revisions to uploaded files and disallow deletes
from django.conf import settings
from django.core.files.storage import FileSystemStorage
from django.core.files.base import File
from django.utils.deconstruct import deconstructible
from git import Repo
import io
@deconstructible
class VersionedStorage(FileSystemStorage):
@isoboroff
isoboroff / gen-key.py
Created February 18, 2021 20:07
Django ./manage.py command to generate secret keys
from django.core.management.base import BaseCommand, CommandError
from django.utils.crypto import get_random_string
class Command(BaseCommand):
help='''Generate a secret key the same way Django does.
You will need to install it either in settings.py for dev or externally for production
'''
def add_arguments(self, parser):
parser.add_argument('-l', '--length', type=int, default=50)
@isoboroff
isoboroff / elastic-baseline.py
Created August 27, 2021 13:42
Do a TREC title-only run against an ElasticSearch index.
#!/usr/bin/env python3
from elasticsearch import Elasticsearch, TransportError
import argparse
import re
import sys
ap = argparse.ArgumentParser(description='Do a baseline run against an Elasticsearch index')
ap.add_argument('--host', default='localhost', help='Elasticsearch host')
ap.add_argument('--port', default=9200, help='Elasticsearch port')
@isoboroff
isoboroff / index.html
Created November 17, 2021 19:23
Updating times according to the viewer's timezone
<script src="moment.min.js"></script>
<script src="moment-timezone-with-data-10-year-range.js"></script>
<script>
const my_zone = moment.tz.guess(true);
// Set up timezone selector with all the zones.
// The user's current guessed zone is selected.
const sel = document.querySelector('select.timezone');
moment.tz.names().forEach( zone => {
option = document.createElement('option');