Skip to content

Instantly share code, notes, and snippets.

@econchick
Created March 3, 2013 18:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save econchick/87d6dec453bc04b171d4 to your computer and use it in GitHub Desktop.
Save econchick/87d6dec453bc04b171d4 to your computer and use it in GitHub Desktop.

Walkthrough of scraping a webpage and saving it to a database.

Initial Requirements:

  • Python 2.x
  • PostgreSQL
    • If you have a Mac, I highly suggest installing Postgres through Postgres.app for simple setup.
  • virtualenv You can either download directly, or:
    • Mac: $ sudo easy_install virtualenv
    • Ubuntu: $ sudo apt-get virtualenv
    • Fedora: $ sudo yum install python-virtualenv
    • Windows: Download manually
  • virtualenvwrapper You can either download it directly, or:
    • Mac: $ sudo easy_install virtualenvwrapper
    • Ubuntu: $ sudo apt-get virtualenvwrapper
    • Fedora: $ sudo yum install python-virtualenvwrapper
    • For Mac, Ubuntu, and Fedora:
      • $ export WORKON_HOME=~/Envs
      • $ mkdir -p $WORKON_HOME
      • $ source /usr/local/bin/virtualenvwrapper.sh
    • Windows: Download manually and follow install instructions

Setup

Within your terminal:

  • $ cd new-coder/scrape Change into the Web Scraping project.

  • Make sure you've installed virtualenv-wrapper and followed the steps above from Initial Requirements to set up your Terminal correctly. More information can be find at virtualenv-wrapper's docs.

  • $ mkvirtualenv ScrapeProj Make a virtual environment specific to your Data Viz project. You should see (ScrapeProj) before your prompt, now:

     (ScrapeProj) $
  • (ScrapeProj) $ pip install -r requirements.txt Now installing package requirements for this project. Your virtual environment will store the required packages in a self-contained area to not mess up with other Python projects.

@alecxe
Copy link

alecxe commented Mar 3, 2013

Make a virtual environment specific to your Data Viz project.

You mean Web Scraping project, right?

@alecxe
Copy link

alecxe commented Mar 5, 2013

I'd note that you need gcc to successfully install project requirements.

plus.. that at least on ubuntu you should install 'python2.x-dev', 'libxml2-dev' and 'libxslt-dev' packages system-wide via apt-get. On mac, I think, you also need some packages system-wide (at least 'libxml2' and 'libxslt').

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment