Skip to content

Instantly share code, notes, and snippets.

@miharekar
Created November 11, 2014 20:18
Show Gist options
  • Save miharekar/2ad9c28a8078cad8302f to your computer and use it in GitHub Desktop.
Save miharekar/2ad9c28a8078cad8302f to your computer and use it in GitHub Desktop.
apparatus twitter usernames
require 'nokogiri'
require 'open-uri'
require 'json'
require 'csv'
URL = 'http://apparatus.si/oddaja/pogovor/page/%d/'
def last_page
doc = Nokogiri::HTML(open(URL%1))
pagination = doc.css('.archive-pagination a')
link = pagination[-2].attr('href').gsub(/\D/, '').to_i
end
def shows
1.upto(last_page).map { |page|
Nokogiri::HTML(open(URL%page)).css('h1.entry-title a')
}.flatten.map { |show|
regex = show.text.match(/^(.*): (.*)$/)
{
no: regex[1].to_i,
person: regex[2],
link: show.attr('href')
}
}.reverse
end
def get_user(link)
url = URI(link.attr('href'))
user = url.path.sub('/', '')
end
def get_twitter_links(html)
html.css('a').select{ |link|
link.attr('href') =~ /twitter.com/
}.reject{ |link|
%w(anzet apparatus_si).include?(get_user(link).downcase) || link.attr('href') =~ /share/
}
end
twitter = shows.map { |show|
html = Nokogiri::HTML(open(show[:link]))
links = get_twitter_links(html)
if links.first
show.merge(twitter: get_user(links.first))
else
show
end
}
CSV.open('twitter.csv', 'w') do |csv|
csv << twitter.last.keys
twitter.each do |hash|
csv << hash.values
end
end
@miharekar
Copy link
Author

Get all twitter usernames of people who were on Apparatus podcast/Storming Mortal for fun and profit.
Take the first twitter link from show page that's not anzet|apparatus|share link. As with every 80/20 concept it works for majority, but fails miserably for minority which also includes me - I get parishilton 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment