Skip to content

Instantly share code, notes, and snippets.

@DrewWeth
Created July 18, 2015 21:08
Show Gist options
  • Save DrewWeth/5e5f20d7c70c8258d2b2 to your computer and use it in GitHub Desktop.
Save DrewWeth/5e5f20d7c70c8258d2b2 to your computer and use it in GitHub Desktop.
require 'open-uri'
require 'nokogiri'
starting_url = ARGV[0]
channel_ids=[]
video_urls=[]
temp=[]
url_string = ""
page_source = Nokogiri::HTML(open(starting_url))
page_source.css("a").each do |url|
if url["href"]
if url["href"].include? '/watch?v='
url_string = url["href"]
if url["href"].start_with? '/watch?v='
url_string = 'https://www.youtube.com' + url["href"]
end
video_urls << url_string
end
if url["href"].start_with? '/channel/'
temp = url["href"].split('/')
channel_ids << temp[2]
end
end
end
page_source.css("link").each do |url|
if url["href"]
if url["href"].include? '/watch?v='
url_string = url["href"]
if url["href"].start_with? '/watch?v='
url_string = 'https://www.youtube.com' + url["href"]
end
video_urls << url_string
end
end
end
page_source.css("span").each do |url|
if url["data-channel-external-id"]
channel_ids << url["data-channel-external-id"]
end
end
hash = {}
hash["urls"] = video_urls
hash["ids"] = channel_ids
puts hash
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment