"metadata": {
"language": "Julia",
"name": ""
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
"cells": [
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"An Introduction to Gadfly"
"cell_type": "markdown",
"metadata": {},
"source": [
"Gadfly is an easy to use plotting package for Julia the new high level high performance language for technical computing. \n",
"Gadfly follows grammar of graphics principles to simplify translating your ideas to plots - mapping how y changes with x across levels of z. This introduction aims to make Gadfly approachable using a series of examples.\n",
"Translating your ideas to plots is more efficient using dataframes but we'll start with 1 and multiple dimensional arrays because your data may already be in that format."
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Starting Up"
"cell_type": "markdown",
"metadata": {},
"source": [
"One of the easiest ways to use Julia is with an IPython Notebook. This allows you to edit the code, add annotations, and keep your plots just as I'm doing here (see Appendix One for installation instructions). This notebook is available on github so you can copy and paste from it or use it if you wish. To start IJulia, open a terminal, change to the directory in which you are saving your notebooks and perhaps your data and enter this command:\n",
" ipython notebook --profile julia\n",
"That will open an IPython Dashboard and you can open an existing notebook from that directory or begin fresh with **New Notebook**. In Julia, when you want to use a package you start by entering \"using packagename\" and then wait a few seconds for it to load. Lets begin:"
"cell_type": "code",
"collapsed": false,
"input": [
"# we want to use Gadfly and Dataframes today\n",
"using Gadfly; using DataFrames"
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
"cell_type": "code",
"collapsed": false,
"input": [
"# First read in your files\n",
"# if the file's separator was a comma then you don't need to specify it\n",
"# similarly, its a default to read the first row as a header row\n",
"# if it was a tab separated file with a header row we would use:\n",
"# mydat = readtable(\"filenameNoHeader.csv\", separator='\\t')\n",
"d_age = readdlm(\"f_age.csv\")\n",
"d_sex = readdlm(\"f_sex.csv\")\n",
"d_dbp = readdlm(\"f_dBP.csv\") ;\n",
"# open 3 files and store them\n",
"# note the semicolon on the last line to stop Julia printing the final output\n",
"# lets just check what we read into mydata\n",
"print(\"sa \", size(d_age), \" ss \", size(d_sex), \" sd \", size(d_dbp),)\n",
"# and lets have a look at the first few rows of column 1 for each one\n",
"language": "python",
"metadata": {},
"outputs": [
"output_type": "stream",
"stream": "stdout",
"text": [
"sa ("
"output_type": "stream",
"stream": "stdout",
"text": [
"50,1) ss (50,1) sd (50,1)"
"metadata": {},
"output_type": "pyout",
"prompt_number": 42,
"text": [
"6-element Array{Float64,1}:\n",
" 39.0\n",
" 46.0\n",
" 48.0\n",
" 61.0\n",
" 46.0\n",
" 43.0"
"prompt_number": 42
"cell_type": "code",
"collapsed": false,
"input": [
"# I can do that one at a time or use a trick instead\n",
"# [array1 array2 array3] with spaces between the output arrays \n",
"# concatenates them into 3 columns and displays them\n",
"[d_age[1:6] d_sex[1:6] d_dbp[1:6]]"
"language": "python",
"metadata": {},
"outputs": [
"metadata": {},
"output_type": "pyout",
"prompt_number": 43,
"text": [
"6x3 Array{Any,2}:\n",
" 39.0 \"F\" 70.0\n",
" 46.0 \"M\" 81.0\n",
" 48.0 \"F\" 80.0\n",
" 61.0 \"M\" 95.0\n",
" 46.0 \"M\" 84.0\n",
" 43.0 \"M\" 110.0"
"prompt_number": 43
"cell_type": "code",
"collapsed": false,
"input": [
"# Im interested in the age distribution so lets plot a histogram\n",
"plot(x=d_age, Geom.histogram)"
"language": "python",
"metadata": {},
"outputs": [
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "code",
"collapsed": false,
"input": [
"p = plot(x=mydat[1], y=mydat[5], color=mydat[2],\n",
" yintercept=[15], Geom.hline(color=\"brown\"),\n",
" xintercept=[25], Geom.vline(color=\"blue\", size=1mm),\n",
" Scale.x_log10, Geom.point)"
"language": "python",
"metadata": {},
"outputs": [
"ename": "BoundsError",
"evalue": "BoundsError()",
"output_type": "pyerr",
"traceback": [
" in getindex at array.jl:277"
"prompt_number": 20
"cell_type": "code",
"collapsed": false,
"input": [
"draw(PNG(\"myplot.png\", 6inch, 3inch), p)\n",
"draw(PDF(\"myplot.pdf\", 6inch, 3inch), p)"
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "code",
"collapsed": false,
"input": [
"#dff, dfm = groupby(df4,\"Sex\")\n",
"dfm = df4[df4[\"Sex\"].==\"Male\",:]\n",
"dff = df4[df4[\"Sex\"].==\"Female\",:]"
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "code",
"collapsed": false,
"input": [
"# First read in your file\n",
"mydat = readtable(\"filename.csv\")\n",
"# if the file's separator was a comma then you don't need to specify it\n",
"# if it was a tab separated file with no header row we would use:\n",
"# mydat = readtable(\"filenameNoHeader.csv\", separator='\\t', header=false)\n",
"# lets just check what we read into mydata\n",
"print(\"size is \", size(mydat))\n",
"# and lets have a look at the first few rows and columns\n",
"mydat[1:3, 1:6]\n",
"# you'll notice that mydat is a dataframe but we're going to ignore that for now\n",
"# its just like a spreadsheet with r rows and c columns and if the sheet \n",
"# had headers it will have column names which makes them more memorable.\n",
"# Later we'll look at how to add or change column names.\n",
"# If I want a summary of the statistics for the sample its easy to get it\n",
"@printf(\"The mean is %.2f and the std deviation is %.2f \\n\\n\", mean(d_age), std(mydat[2]))\n",
"# @printf is print with formatted variables interpolated; \"\\n\" adds an extra line"
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "code",
"collapsed": false,
"input": [
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "markdown",
"metadata": {},
"source": [
"**Appendix One - Installing and Updating**\n",
"There are three components to install: Julia, IPython, and the Julia packages you need.\n",
"Instruction for installing Julia are [here](\n",
"Instructions fo installing IPython to support Julia are [here](\n",
"You can add packages simply. Heres a couple of lines to install the most likely packages (reinstallation does no harm if your not sure what you have).\n",
" Pkg.Add(\"IJulia\") ; Pkg.Add(\"DataFrames\") ; Pkg.Add(\"Gadfly\")\n",
" Pkg.Add(\"Stats\") ; Pkg.Add(\"GLM\") ; Pkg.Add(\"Distributions\")\n",
"From time to time it pays to check that your packages are up to date. Do that now with:\n",
" Pkg.Update() \n",
"Thats it. But if you have any issues then see the Julia page above and if still confused then just ask at the [Julia Users Group](!forum/julia-users) "
"metadata": {}
