Skip to content

Instantly share code, notes, and snippets.

xvda 202:0 0 300G 0 disk /
xvdf 202:80 0 350G 0 disk
├─xvdf1 202:81 0 8G 0 part
├─xvdf2 202:82 0 512M 0 part
└─xvdf3 202:83 0 341.5G 0 part
@selbyk
selbyk / fstab
Created April 25, 2015 00:35
How to create Linux swapfile
# Swap file created on DATE
/var/swapfile none swap sw 0 0
@selbyk
selbyk / batch_resize.sh
Created May 7, 2015 07:11
Shell script to process a subset of facial recognition image database from UMass.
#/bin/bash
# Usage: ./batch_resize.sh input_dir output_dir “10 30 50”
# ./batch_resize.sh help
# Setup our globals
PARAM_CHECK_FAIL=0
INPUT_DIR=""
OUTPUT_DIR=""
SIZES=()
@selbyk
selbyk / start_with_java_7.sh
Created May 16, 2015 20:59
Run program_to_run using Java 7
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk/jre
program_to_run
# This is an example of the kind of things you can do in a configuration file.
# All flags used by the client can be configured here. Run Let's Encrypt with
# "--help" to learn more about the available options.
# Use a 4096 bit RSA key instead of 2048
rsa-key-size = 4096
# Always use the staging/testing server
server = https://acme-v01.api.letsencrypt.org/directory
#!/bin/bash
rm *.jpg
rm *.png
rm *.gif
rm *.mp4
find ../ -mindepth 1 -maxdepth 1 -mtime -7 -name "*.jpg" -exec cp -t . {} +
export MAGICK_THREAD_LIMIT=4
@selbyk
selbyk / movie.sh
Last active December 5, 2015 20:02
#!/bin/bash
# Set to number of cores your computer has or you want to use
export MAGICK_THREAD_LIMIT=8
# Deleted old files
rm *.jpg
rm *.png
rm *.gif
rm *.mp4
Scraper/Content Extraction Training
Goal: Fetch relevant information sources, extract only appropriate content, save as documents as training data and usable by Watson
Method:
Fetch a few pages from various data sources using Phantom.js, then parse and save the website’s HTML as JSON
Iterate the text elements and extract features such as size, position, text, CSS properties, etc
Run the DBSCAN clustering algorithm over the document’s extracted feature data. Similar elements such as titles, headers, and article content should be grouped into the same clusters
Manually tag a portion of the documents to use as training data
A support vector machine (SVM) with linear kernel using a 4-fold cross validation should be capable of detecting the main content of a scraped page
@selbyk
selbyk / find_running_process.sh
Created March 17, 2016 16:13
accepts a command line argument for the process name, returns an exit code of 1 if the process is currently running
#/bin/bash
# Usage: ./find_running_process.sh <process_name>
# DEBUG=1 ./find_running_process.sh <process_name>
# Function to help with debug messages
debug_message () {
if [ $DEBUG -eq 1 ]
then
echo $1
fi
@selbyk
selbyk / process_pids.sh
Created March 17, 2016 16:34
iterate through all running Linux processes and just print the process ID
#!/bin/bash
# Usage: ./process_pids.sh
for proc in /proc/*
do
FILENAME=${proc##*/}
if [[ $FILENAME =~ ^-?[0-9]+ ]]
then
echo $FILENAME
fi