Created
January 22, 2009 03:36
-
-
Save mrflip/50401 to your computer and use it in GitHub Desktop.
The #L Challenge http://twitter.com/stephenfry/status/1136005076
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
== An elementary as hell solution to stephenfry's #L challenge: == | |
-- Get the twitter scrape from infochimps.org | |
-- http://blog.infochimps.org/2008/12/29/massive-scrape-of-twitters-friend-graph/ | |
-- (not available yet, but sometime soon) | |
-- | |
-- Then run this pig (http://wiki.apache.org/pig/) code | |
-- to extract all twitter screen names with three or more L's | |
-- It's not properly case insensitive but who cares. | |
-- | |
Ells_1 = FOREACH Users GENERATE screen_name; | |
Ells_2 = FILTER Ells_1 BY screen_name MATCHES '.*l.*l.*l.*'; | |
STORE Ells_2 INTO 'foo/ells' ; | |
# ====== SHELL CODE ======== | |
# Whip up a ruby-from-command-line oneoff to find long screen_names with a high fraction of L's | |
hadoop dfs -cat foo/ells/part\* | \ | |
ruby -ne '$_ = $_.downcase.chomp!; \ | |
ls = $_.count("l") ; \ | |
lf = ls.to_f/$_.length ; \ | |
puts "%7.3f\t%7d\t%7d\t%s" % [lf, ls, $_.length, $_] | |
' | sort -rn | head -n 50 | |
# The output: | |
# | |
# 1.000 6 6 llllll | |
# 1.000 5 5 lllll | |
# 1.000 14 14 llllllllllllll | |
# 1.000 10 10 llllllllll | |
# 0.867 13 15 illllllllllllli | |
# 0.867 13 15 hulllllllllllll | |
# 0.857 6 7 lll_lll | |
# 0.833 5 6 dlllll | |
# 0.800 8 10 alllllllly | |
# 0.800 4 5 llull | |
# 0.800 4 5 lllel | |
# 0.800 4 5 llill | |
# 0.800 12 15 lllvlllclllvlll | |
# 0.800 12 15 lll8lll8lll8lll | |
# 0.778 7 9 halllllll | |
# 0.750 6 8 llilllil | |
# The third through sixth users' names add up to exactly 50 L's | |
# | |
# Compose a message with no other L's except the #L hashtag: | |
# | |
# @stephenfry - @llllllllllllll have you met @llllllllll @IlllllllllllllI | |
# and @hulllllllllllll? Your orthographic unity = easy #L project win. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment