Skip to content

Instantly share code, notes, and snippets.

@cwvhogue
cwvhogue / recompress_bz2.sh
Created March 14, 2014 22:04
Recompress bzip2 files with pbzip2 in place - mjob map phase contents
bzip2 -dc $MANTA_INPUT_FILE | pbzip2 -c > /var/tmp/file.tmp && mpipe ${MANTA_INPUT_OBJECT} < /var/tmp/file.tmp
cwvhogue@manta # cowsay -f /tmp/manta.cow hello there humans
____________________
< hello there humans >
--------------------
\
\
_||___||_
_|{_______}|_
_/ -- -- \_
_ / -- -- \ _
#TESTING ASPERA CONNECT:
# Download
# 64-bit
curl -O http://download.asperasoft.com/download/sw/connect/3.5/aspera-connect-3.5.1.92523-linux-64.sh
chmod +x ./aspera-connect-3.5.1.92523-linux-64.sh
./aspera-connect-3.5.1.92523-linux-64.sh
# Download
* Lenovo ThinkStation D20 4158CM1
** Dual Intel Xeon X5650 2.67GHz - Total 24 Cores.
** Flashed MB Bios to latest (01/14/14).
** Disable CStates (all!). Disable NUMA-aware! Enable Hyperthreading.
** Onboard Broadcom NICs are functioning.
** 48GB RAM (Kingston KTL-TS316S/8G, 240pin Reg ECC Single Rank DDR3-1600MHz PC3-12800 1.5V)
** 6 Sticks in CPU1 bank - CPU2 bank does not work with this RAM
** 96GB is unattainable with cheap RAM
** LSI MegaRAID 9211-8I 8-Port 6Gb/S PCIEx8 SAS/SATA HBA
** Disable onboard SAS from BIOS. Flash LSI MegaRAID BIOS to IT mode driver
@cwvhogue
cwvhogue / RamaPlot.R
Last active December 13, 2015 18:49
Simple R Ramachandran Angle Distribution Plot SmoothScatter function with an improved R Color Palette called icy.palette. I made this palette so I could better compare values across the entire height range on sheets of 20 plots. The palette uses subtle alternate banding shades in a hand-picked wavy gradient. The colors go from (low values) blue-…
RamaPlot <- function(Title, aa) {
if (missing(aa)) aa <- read.csv(file.choose(), header=F)
if (missing(Title)) Title<- "Ramachandran Plot"
Lab.palette <-
colorRampPalette(c("white","lightyellow", "lightcyan","cyan", "lightskyblue", "lightseagreen", "yellowgreen" ,"yellow", "goldenrod", "orange", "orange4", "firebrick", "darkred", "red", "darkmagenta", "magenta", "hotpink", "pink","lightpink","white"), space = "Lab")
par(las=1,pty="s")
smoothScatter(aa$V3,aa$V4, xlab=expression(phi), ylab=expression(psi), asp=1, ylim=c(-180,180), xlim=c(-180,180), main=Title, xaxs="i", yaxs="i", axes=F, frame.plot=F, colramp=Lab.palette)
prange<- seq(-180, 180, by=40)
@cwvhogue
cwvhogue / manyfiles
Last active December 14, 2015 15:29
SmartOS manyfiles.sh is a bash script to create N files of size M. It uses the solaris mkfie(1M) command to create 0 padded files that can be small (1024) or rather large (100g for 100Gb file). Files are named TEMP_ZEROS_0000001 to N. Afterward, pbzip2 can be used to load up the system cpu and compress these in paralell to equal sized files (in …
Applies to SmartOS/illumOS/Solaris machines with the mkfile command.
For testing too many file effects, create 500,000 numbered 2k zero padded files:
#./manyfiles.sh 500000 2k
For maxing out file storage, then releasing, this makes 20 100Gb zero padded files, bzips them.
# ./manyfiles.sh 20 100g
# pbzip2 TEMP_* &
The 20 resulting files should be all exactly the same size in bytes.
@cwvhogue
cwvhogue / MapReduce_manta_image_identify
Created August 26, 2013 21:15
Open Getty Image verification on Joyent Manta using ImageMagick "identify" command in a one-line MapReduce job.
# MapReduce ImageMagick 'identify' command run over 4,599 jpg files in /public/art Manta directory
$ mfind /$MANTA_USER/public/art | mjob create -w -m 'identify $MANTA_INPUT_FILE' -r cat
added 1000 inputs to 6f2e6ac8-b6a5-4ed5-bad4-8371aea010dd
added 1000 inputs to 6f2e6ac8-b6a5-4ed5-bad4-8371aea010dd
added 1000 inputs to 6f2e6ac8-b6a5-4ed5-bad4-8371aea010dd
added 1000 inputs to 6f2e6ac8-b6a5-4ed5-bad4-8371aea010dd
added 599 inputs to 6f2e6ac8-b6a5-4ed5-bad4-8371aea010dd
# Retrieve the output file to my local machine
@cwvhogue
cwvhogue / ImageMagick_identify_output_to_R_histogram
Last active December 21, 2015 18:28
Plot distribution of image file sizes using ImageMagick 'identify' command - default output. 1. Reorganize the 'identify' output into a sorted .csv file. 2. Look for and remove broken images, regenerate .csv file with clean image set. 3. Plot histogram of image size distribution with R.
# image_identify.txt is a file with the default output from ImageMagick 'identify'
# run over a set of JPG files locally,
# or the equivalent Manta MapReduce 'identify' output from the previous Gist.
$ identify *.jpg > image_identify.txt
$ head -5 image_identify.txt
00000201.jpg JPEG 3295x5947 3295x5947+0+0 8-bit sRGB 23.99MB 0.010u 0:00.000
00000301.jpg[1] JPEG 4470x3126 4470x3126+0+0 8-bit sRGB 22.15MB 0.000u 0:00.009
00000401.jpg[2] JPEG 3115x4485 3115x4485+0+0 8-bit sRGB 19.41MB 0.000u 0:00.000
00000501.jpg[3] JPEG 3093x4515 3093x4515+0+0 8-bit sRGB 19.39MB 0.000u 0:00.000
@cwvhogue
cwvhogue / ImageMagick_identify_Manta_local_diff_validation
Last active December 21, 2015 18:29
Detailed image data set validation - diff-ing Manta ImageMagick 'identify' output with local copy.
# Start with the MapReduce version of ImageMagick 'identify' output from previous Gist
# With your local image directory (assume these are the good originals) run 'identify' as follows
$ identify *.jpg > master_identify.txt
# to match the MapReduce output, local job specific information needs to be removed.
$ cat master_identify.txt | \
sed 's/\[\(.*\)\]//' | \
sed 's/ \(.\):\(..\).\(...\)//' | \
sed 's/ 0.\(...\)u//' | \
@cwvhogue
cwvhogue / gist:6674659
Created September 23, 2013 18:18
Rename a directory full of *.webp to *.jpg on Unix with find and -exec
find . -name '*.webp' -exec sh -c 'mv "{}" ``dirname "{}"``/``basename "{}" webp``jpg' ';'