Skip to content

Instantly share code, notes, and snippets.

@ipedrazas
Created December 28, 2013 00:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save ipedrazas/8154292 to your computer and use it in GitHub Desktop.
Save ipedrazas/8154292 to your computer and use it in GitHub Desktop.
Little script that extracts title, description and screenshot of the URL based on https://gist.github.com/juanriaza/8144461 by @juanriaza
# -*- coding: utf-8 -*-
import os
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from pyvirtualdisplay import Display
# We use PyVirtualDisplay (a Python wrapper for Xvfb)
# to run headless WebDriver tests.
display = Display(visible=0, size=(800, 600))
display.start()
# URLs can be shortened URLs
url = 'http://kcy.me/wge8'
print url
req = requests.get(url)
print req.url
soup = BeautifulSoup(req.text, 'lxml')
print soup.title.string.encode('utf-8')
print soup.findAll(attrs={"name":"description"})[0]['content'].encode('utf-8')
driver = webdriver.Firefox()
driver.get(req.url)
fname = os.path.join(os.path.dirname(__file__), 'screenshot2.png')
# Capture a sreenshot or the URL
driver.get_screenshot_as_file(fname)
# Closing down
driver.close()
display.stop()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment