Skip to content

Instantly share code, notes, and snippets.

@gauden
Last active December 14, 2015 03:49
Show Gist options
  • Save gauden/5023638 to your computer and use it in GitHub Desktop.
Save gauden/5023638 to your computer and use it in GitHub Desktop.

Preparation for the Data Visualization Course

Introduction

This will be, to a great extent, a hands-on course. Participants should download and install the applications listed below in order to ensure a common operating environment.

I have selected a core set of applications, all of which are free and work on Linux, Windows, or Mac. It is best if you install all these in advance to ensure that we are not held up with download and installation pauses in class.

It is assumed that you will bring your own laptop to which you have administrator access and can install your own software. This exercise will equip you with a set of tools that will be useful long after the course.

1. Standard Browser

We shall be using web-based interfaces a lot and it is easier for me to assume that we are using a common browser that can handle modern standards. I will be using Google Chrome in class, you may have preference for Firefox, Safari, or Opera, and these should work quite consistently.

2. Text Editor

In data analysis, you will often need to use a text editor to work on data files or source code. Many standard wordprocessors store their files in proprietary encoded formats, and add a lot of formatting information to the text. In order to handle data and text files cleanly, a good text editor becomes an important part of your toolset. If you have a preferred text editor, stick with it. If new to the field, I suggest:

3. Data Cleaner

Many real-world datasets have a lot of imperfections and cleaning them up by hand is tedious. We will try out OpenRefine to help reduce the pain of data cleaning, a free resource that until recently was called Google Refine. The program describes itself thus:

OpenRefine is a power tool that allows you to load data, understand it, clean it up, reconcile it internally, and augment it with data coming from Freebase or other web sources. All with the comfort and privacy of your own computer.

The OpenRefine wiki page has instructions for installation, links to introductory screencasts, and detailed documentation.

4. The R Project for Statistical Computing

R has become the workhorse of statistical computing. It has a steep learning curve for newcomers but we will learn just enough R to be able to make useful graphics and on the way provide an introduction to the environment. According to the R Project website:

R is a free software environment for statistical computing and graphics.

  • Select a nearby download site, and download the right version for your computer and install R.
  • Roger Peng has an excellent video introduction to installation of R on Windows and on Mac
  • If you are totally new to R, then you may find this online course extremely useful: TryR is painless and quickly completed -- ideal if you have time on the weekend before the course.

5. RStudio

According to the RStudio website:

RStudio IDE is a powerful and productive user interface for R.

It is in fact a comprehensive collection of tools allowing newcomers a painless entry point to R and a one-stop computing environment for advanced work as well.

6. Drawing Package

Some of the graphics you will produce will need further editing afterwards, and there are many high-end proprietary applications that you can use for this. For the course, we will use the standard free and open source application, InkScape, which is available on all platforms.

7. Shared Folders

We will use Dropbox to share folders and resources and avoid playing thumbdrive tag: Dropbox sign-up and install.

8. Sharing and Q&A Sites

Once you start coding, you will find yourself needing to ask questions of a technical nature or to share experiences. It is useful to set yourself up with these websites in order to be able to ask, to answer, and to share experiences in statistical computing in general, and to share snippets of code:

  • StackOverflow: for asking and answering questions related to programming. Official tagline: "A language-independent collaboratively edited question and answer site for programmers."
  • Github Gists for sharing snippets of code and public texts (such as this page).
  • Cross Validated: "a question and answer site for statisticians, data analysts, data miners and data visualization experts." The same account you use for StackOverflow.com can be linked to this community as well.

9. Spreadsheet

It is assumed that you already have a spreadsheet installed. If not, install LibreOffice -- it has an excellent spreadsheet component called Calc.


Footnotes

  • A pdf version of this page is also available.
  • In total, these programs represent a hefty download and this will take some time, so do not try to do it at the last minute or during the course as this will cause significant delays to the group.
  • All the programs listed are free, not all are open source.
  • The choice of programs does not imply they are the "best" in any category -- they are simply the standard proposed for this course.
  • All trademarks are the property of their respective owners.
@MaxdC
Copy link

MaxdC commented Mar 1, 2013

Hello Gauden, did you already create a dropbox folder for our course?
Cheers, Max

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment