Skip to content

Instantly share code, notes, and snippets.

@gabubellon
Last active August 3, 2022 01:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gabubellon/df8a3563f0e7cdf4b4d38e84dcf3fe4b to your computer and use it in GitHub Desktop.
Save gabubellon/df8a3563f0e7cdf4b4d38e84dcf3fe4b to your computer and use it in GitHub Desktop.
umi_2022_scrap_videos_urls.rb
require 'capybara/dsl'
require 'pry'
require 'csv'
include Capybara::DSL
#register the chrome driver
Capybara.register_driver :chrome do |app|
Capybara::Selenium::Driver.new(app, :browser => :chrome)
end
Capybara.default_driver = :chrome
visit "https://university.marxist.com/en/#Talks"
binding.pry # Go to brownser make login and exit pry
videos = []
find('div.sprocket-mosaic').all('li[class^="sprocket-tags"]').each do |data|
videos << {
title:data.find('div.sprocket-mosaic-head').text,
um_link:data.find('div.sprocket-mosaic-head').find_link['href'],
speker:data.find('div.sprocket-mosaic-text').text.split("\n")[0],
date:data.find('div.sprocket-mosaic-text').text.split("\n")[1],
time:data.find('div.sprocket-mosaic-text').text.split("\n")[2]
}
end
videos.each do |video|
puts video[:title]
visit video[:um_link]
video[:yt_link] = first('div.embed-container').find('iframe')['src']
end
CSV.open("um.csv", "wb") do |csv|
csv << videos.first.keys
videos.each do |hash|
csv << hash.values
end
end
Capybara.current_session.driver.quit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment