Skip to content

Instantly share code, notes, and snippets.

@mediaczar
mediaczar / remove_stopwords.pl
Last active December 10, 2015 08:48
Tokenise and run word counts
#!/usr/bin/perl
# takes a space-delimited word list (FILE) formatted
# <count> <word>
# and removes lines with stopwords supplied in STOPWORDS
# author: Mat Morrison (@mediaczar)
# date: 2013-01-01
use strict;
use warnings;
@mediaczar
mediaczar / munge_and_clean_sysomos.sh
Last active December 10, 2015 02:58
Munge and clean Tweet data downloaded from Sysomos
csvcut -c 27 17-Dec-2012\ orange.csv | csvlook | less
@mediaczar
mediaczar / rename_sysomos.sh
Last active December 10, 2015 02:38
Rename downloaded Sysomos output files
for f in *.csv ; do head -n 5 "$f" | tail -n 1 ; done
@mediaczar
mediaczar / perlcount.pl
Created November 26, 2012 15:43
Perl one liner to replace grep (by Peteris Krumins)
perl -lne '/REGEX/ && $t++; END { print $t }' FILENAME
@mediaczar
mediaczar / decompose.R
Created November 26, 2012 11:15
Experimenting with Holt Winters forecasting
# create the time series
search_index <- ts(c(48, 40, 34, 43, 31, 39, 46, 26, 33, 34, 28, 25, 25, 23, 22, 25, 25, 25, 19, 16, 14, 16, 19, 17, 18, 16, 21, 19, 19, 19, 15, 21, 28, 30, 15, 21, 30, 39, 44, 50, 34, 37, 38, 38, 30, 42, 39, 42, 42, 57, 58, 80, 65, 40, 56, 52, 54, 55, 57, 57, 50, 50, 35, 36, 33, 35, 33, 32, 28, 33, 19, 27, 26, 18, 20, 21, 19, 20, 19, 27, 19, 22, 26, 21, 24, 22, 21, 41, 34, 38, 45, 33, 36, 35, 46, 45, 39, 49, 53, 59, 48, 54, 57, 79, 57, 51, 40, 43, 39, 49, 45, 48, 24, 34, 33, 33, 35, 26, 25, 24, 27, 26, 24, 26, 37, 17, 21, 20, 21, 20, 21, 21, 22, 14, 15, 14, 15, 22, 24, 31, 34, 42, 46, 45, 50, 41, 38, 34, 42, 44, 51, 49, 48, 44, 54, 60, 69, 50, 45, 43, 45, 51, 43, 47, 43, 45, 26, 29, 30, 34, 32, 32, 33, 26, 21, 21, 27, 27, 24, 22, 23, 22, 17, 19, 22, 22, 22, 23, 23, 25, 27, 32, 33, 37, 34, 35, 38, 46, 35, 46, 50, 43, 59, 60, 54, 50, 64, 69, 77, 50, 57, 56, 62, 56, 59, 60, 43, 54, 39, 41, 40, 26, 29, 27, 27, 23, 25, 22, 27, 31, 17, 19, 25, 20, 20, 21, 20, 20, 17, 21, 23, 19, 29, 26, 4
@mediaczar
mediaczar / chartparser.sh
Created November 22, 2012 17:44
Sed command for near-enough parsing chart XML from PowerPoint file
sed 's/<c:pt\ idx=\"\([0-9]\{1,\}\)\"><c:v>\([0-9]\{1,\}\.[0-9]\{1,\}\)<\/c:v><\/c:pt>/\1\'$'\t''\2\'$'\n''/g' chart1.xml
@mediaczar
mediaczar / bar-and-column.R
Created October 4, 2012 08:53
bar and column chart in R
library(ggplot2)
categories <- c( "Mon","Tue","Wed","Thu","Fri","Sat", "Sun", "Foo", "Bar" )
# create some random data, rate.1 and rate.2
data <- data.frame( categories,
rate.1 = runif( length( categories )),
rate.2 = runif( length( categories )) )
levels( data$categories ) # wrong order (alphabetical)
@mediaczar
mediaczar / get_follower_ids.sh
Last active October 5, 2015 08:08
Grab follower ids from a Twitter account
curl http://api.twitter.com/1/followers/ids/mediaczar.xml | sed 's/<[^>]*>//g' | sed '/^$/d'
@mediaczar
mediaczar / parsemse.pl
Created May 6, 2012 19:58
Parse MoneySavingExpert forums
#!/usr/bin/perl
# samples the Money Saving Expert forum membership database
# returns user ID, post count, join date, date last active
# does all the hard work
sub getpage($page) {
$mech->get( $page );
@content = $mech->content;
@mediaczar
mediaczar / TwitterEdgeFinder.pl
Created January 2, 2012 13:27
Twitter Edge Finder
#!/usr/bin/perl
# checks the Twitter API to find the friendships between
# a list of usernames. This should really use the NEW API
# call that would let us halve the number of calls
# author: Mat Morrison
# date: Friday July 10, 2009
use warnings;
use LWP::Simple;
# set up variables
# we're just using a whitespace delimited list for the moment