Skip to content

Instantly share code, notes, and snippets.

View alienrobotwizard's full-sized avatar

Esme Mora alienrobotwizard

  • Abl
  • San Francisco, CA
View GitHub Profile
cat /data/anal/ford/ford_tweets| cut -f 3 | egrep -i "((FORD|MERCURY).*(FIESTA|MUSTANG|FUSION|TAURUS|FLEX|EDGE|ESCAPE|EXPEDITION|SPORT.*TRAC|EXPLORER|MILAN|MARINER|MOUNTAINEER|GRANDE?.*MARQUIS|TRACER)|#MAZDA|#FORDDRIVE|(@|#)?WEDDINGROADTRIP|(@|#)?INVISIBLEPEOPLE|(@|#)?PLAIDNATION)" | wc -l
case
when options[:map]
mapper_klass.new(self.options).stream
+ when options[:reduce] && options[:reduce_command]
+ system options[:reduce_command]
when options[:reduce]
reducer_klass.new(self.options).stream
when options[:run]
#!/usr/bin/env ruby
require 'rubygems'
require 'wukong'
class LetterMapper < Wukong::Streamer::LineStreamer
def map_text text
h = { }
idx = 0
<Keyspace Name="UserIDCache">
<KeysCachedFraction>0.01</KeysCachedFraction>
<ColumnFamily CompareWith="UTF8Type" Name="UserID"/>
<ColumnFamily CompareWith="UTF8Type" Name="UserSearchID"/>
<ColumnFamily CompareWith="UTF8Type" Name="UserScreenName"/>
<ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>
<ReplicationFactor>1</ReplicationFactor>
<EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
</Keyspace>
g++ -I XOPSupport -l Madlib.dll XOPStandardHeaders.h MCL_ScanningStage.h Madlib.h MCL_ScanningStage.c MCL_ScanningStageWinCustom.rc MCL_ScanningStage.rc
#!/usr/bin/env ruby
#
# Installing rsruby gem:
#
# sudo apt-get install r-base
# sudo gem install rsruby -- --with-R-dir=/usr/lib/R --with-R-include=/usr/share/R/include
# sudo ln -s /usr/lib/R/lib/libR.so /usr/lib/libR.so
# export R_HOME=/usr/lib/R
#
#
# Example azkaban job. Assumes you have two MR jobs to be run sequentially.
#
type=command
command=$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-*streaming*.jar -input /path/to/data -output /path/to/outputA -mapper mapperA.py -reducer reducerA.py
command.1=$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-*streaming*.jar -input /path/to/outputA -output /path/to/outputB -mapper mapperB.py -reducer reducerB.py
4.2.145 [[0,{"latitude":"38.0000","country_code":"US","longitude":"-97.0000"}],[159,{"latitude":"38.0000","country_code":"US","longitude":"-97.0000"}],[160,{"household_income":"41506","percent_hispanic":"17.98","city":"Irving","percent_semi_permanent":"18.61","percent_asian":"14.45","percent_under_18":"20.06","latitude":"32.8791","per_capita_income":"28214","percent_bs_graduate":"16.34","area_code":"623","country_code":"US","zip_code":"75038","percent_below_poverty":"10.61","people_per_household":"2.0","housing_unit_value":"149500","percent_homeownership":"11.27","housing_units":"13305","work_travel_time":"3.1","percent_dual_race":"3.09","percent_pacific":"0.0","percent_white":"50.03","population":"25191","region_code":"TX","percent_hs_graduate":"16.34","percent_non_english":"36.09","percent_foreign":"28.9","percent_black":"23.75","percent_over_65":"1.87","metro_code":"972","longitude":"-96.9898","households":"12466","percent_under_5":"7.3","percent_native":"0.56","percent_female":"48.11"}],[191,{"household_i
cat 200_twitspam_2.json| ruby -ne 'puts ["twitter_user_timeline_request", 15491144, 3, "0", "http://twitter.com/statuses/user_timeline/15491144.json?&page=1&count=200", 20100729, 200, "foobar", $_.strip].join("\t")' | ~/Programming/infochimps-data/social/network/twitter/base/parse/parse_twitter_api_requests.rb --map > 200_twitspam_parsed_2.tsv
bizmarketing4u 49108829 0.094361630 1
BrowneBig570 50190727 0.144720420 1
Megan___Fox 49509322 0.147560500 1
pen2netone 47910899 0.064650595 1
dextradyoung 47502802 0.146222870 2
mbainstitute 41130608 0.864213050 2
mcspartan76 17034645 0.111582670 2
Opereur2u 65572992 0.138023360 2
shaunaconway3 63460580 0.105175970 2
BrianaPitts 69789022 0.163385990 3