Skip to content

Instantly share code, notes, and snippets.

@jmMAGALLANES
Created March 29, 2014 04:12
Show Gist options
  • Save jmMAGALLANES/9848232 to your computer and use it in GitHub Desktop.
Save jmMAGALLANES/9848232 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"<font color='blue'>A very basic Python and R Comparisson</font>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<font color='darkred'><b>Dr. Jos\u00e9 Manuel MAGALLANES</b></font> \n",
"Researcher at Center for Social Complexity, George Mason University (jmagalla@gmu.edu) \n",
"Professor at Department of Social Sciences, Pontificia Universidad Catolica del Peru (jmagallanes@pucp.edu)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here you will find some interesting ideas on how to make use of Python, a programming language very powerful and flexible. As a programming language it gives you the ways to implement algorithms and as its community of users ad developers keeps growing the tools available will keep increasing. Here we want to compare python with R, so can start by showing some similarities:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* As a calculator"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"((30/5) * (2**2))-100"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 1,
"text": [
"-76"
]
}
],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Many more functions are available in any python installation, but the library **math** will need to be called [see here](http://docs.python.org/2/library/math.html)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import math\n",
"\n",
"math.log(100) - math.sqrt(10)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 2,
"text": [
"1.4428925258197123"
]
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"R, does not need to call a library for standard math functions [see here](http://www.statmethods.net/management/functions.html), while to achieve the same level of operations Python may need to call more libraries than math, such as **numpy**, **scipy** or **statsmodels**. However, python can be call some R functions:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%load_ext rmagic\n",
"%R log(100) - sqrt(10)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 3,
"text": [
"array([ 1.44289253])"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* About Data Structures: \n",
"Having said that we should pay attention to the basic data structures that are present in Python that allow good programming; these are: lists, tuples and dictionaries:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#list: alterable collections of items that need not be homogeneous\n",
"numbersInList=[1,2,3,4,'5']\n",
"numbersInList"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 4,
"text": [
"[1, 2, 3, 4, '5']"
]
}
],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#tuple: since it cannot be altered, should be used to represent collections of items that are constant during a program\n",
"numbersInTuple=(1,2,3,4,'5')\n",
"numbersInTuple"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 5,
"text": [
"(1, 2, 3, 4, '5')"
]
}
],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#dictionaries: are also alterable collections but are presented as itemas consiting of key and value:\n",
"numbersInDictionary={'first':1, 'second':2, 'third':'3'}\n",
"numbersInDictionary"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 6,
"text": [
"{'first': 1, 'second': 2, 'third': '3'}"
]
}
],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In R you can manage collections very well using lists or dataFrames (a more complex list). This collections, in general allow for heterogeneous elements. However, we should keep in mind that there are data structures that only allow one type of elements, and if a different element is included, this may be coerced to a particular value:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%R RnumberVector=c(1,2,3,4,\"5\")"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 7,
"text": [
"array(['1', '2', '3', '4', '5'], \n",
" dtype='|S1')"
]
}
],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Python users may find interesting differences on R data structures: \n",
"<img src=http://i.imgur.com/SgK3khw.png>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is worth noticing the particularities of lists and vectors that R has, that are close to a Python dictionary. See below that lists and vector accept names for their values, while tuples and list do not accept that in Python:\n",
"\n",
"<img src=http://i.imgur.com/UWNjWbP.png>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Only dictionaries can have a similar behavior:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"PeterDictPython={'name':'Peter', 'lastname':'Jackson','age':30} #names of keys cannot be variables as in R"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 8
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"PeterDictPython"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 9,
"text": [
"{'age': 30, 'lastname': 'Jackson', 'name': 'Peter'}"
]
}
],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"PeterDictPython[1] #python dictionaries do not use indexes"
],
"language": "python",
"metadata": {},
"outputs": [
{
"ename": "KeyError",
"evalue": "1",
"output_type": "pyerr",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-10-e084f0a1ca2c>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mPeterDictPython\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;31m#python dictionaries do not use indexes\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mKeyError\u001b[0m: 1"
]
}
],
"prompt_number": 10
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"PeterDictPython['age']"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 11,
"text": [
"30"
]
}
],
"prompt_number": 11
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"PeterDictPython['name']"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 12,
"text": [
"'Peter'"
]
}
],
"prompt_number": 12
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"PeterDictPython['age']"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 13,
"text": [
"30"
]
}
],
"prompt_number": 13
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Data frames are very important for basic R users, since that is the general destination where the data sets reside:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=http://i.imgur.com/Obp1u99.png>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A standard Python installation can not read a file that easily and will need to convert it into a list of list or some other combination of structures. However, using PANDAS library will make life easier for R users:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from pandas import read_csv\n",
"data = \"https://raw.github.com/JoseManuelMAGALLANES/Repository/master/blog2/hdi.csv\"\n",
"dataFramePython = read_csv(data)\n",
"dataFramePython.head()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Country</th>\n",
" <th>hdi</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> Afghanistan</td>\n",
" <td> 0.37</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> Albania</td>\n",
" <td> 0.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> Algeria</td>\n",
" <td> 0.71</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> Andorra</td>\n",
" <td> 0.85</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> Angola</td>\n",
" <td> 0.51</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows \u00d7 2 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 14,
"text": [
" Country hdi\n",
"0 Afghanistan 0.37\n",
"1 Albania 0.75\n",
"2 Algeria 0.71\n",
"3 Andorra 0.85\n",
"4 Angola 0.51\n",
"\n",
"[5 rows x 2 columns]"
]
}
],
"prompt_number": 14
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dataFramePython.describe()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>hdi</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td> 187.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td> 0.675091</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td> 0.170856</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td> 0.300000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td> 0.538000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td> 0.710000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td> 0.803000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td> 0.955000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>8 rows \u00d7 1 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 15,
"text": [
" hdi\n",
"count 187.000000\n",
"mean 0.675091\n",
"std 0.170856\n",
"min 0.300000\n",
"25% 0.538000\n",
"50% 0.710000\n",
"75% 0.803000\n",
"max 0.955000\n",
"\n",
"[8 rows x 1 columns]"
]
}
],
"prompt_number": 15
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The *describe()* method does not give you the amount of NA's values, but if you miss R you can use the panda's data frame into R:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%Rpush dataFramePython"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 16
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%R summary(dataFramePython)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 17,
"text": [
"array(['Afghanistan : 1 ', 'Albania : 1 ',\n",
" 'Algeria : 1 ', 'Andorra : 1 ',\n",
" 'Angola : 1 ', 'Antigua and Barbuda: 1 ',\n",
" '(Other) :188 ', 'Min. :0.3000 ', '1st Qu.:0.5380 ',\n",
" 'Median :0.7100 ', 'Mean :0.6751 ', '3rd Qu.:0.8030 ',\n",
" 'Max. :0.9550 ', \"NA's :7 \"], \n",
" dtype='|S25')"
]
}
],
"prompt_number": 17
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The general advice is to read the [R](http://cran.r-project.org/doc/manuals/R-intro.html) and [Python](http://docs.python.org/2/tutorial/datastructures.html) documentation on data structures to see their purpose, differences, methods, advantages and limitations.Also, if you wish to explore a little more on using R in Python you can check [here](http://rpy.sourceforge.net/rpy2/doc-2.0/html/index.html), and to use Python inside R you can check this [link](http://cran.r-project.org/web/packages/rPython/index.html)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* For coding"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Python and R present a lower barrier than some well known languages as Java, that require many more details for a correct compilation. Below we show what it takes to print on the screen \"Hello, World\" in Java:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"http://i.imgur.com/ODM0Sum.png\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"While in Python:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print \"Hello, World\""
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Hello, World\n"
]
}
],
"prompt_number": 18
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"..or R:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%R print (\"Hello, World\")"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "display_data",
"text": [
"[1] \"Hello, World\"\n"
]
}
],
"prompt_number": 19
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"...it is much easier; as you just saw above. \n",
"\n",
"Coding requires the frequent use of iterations and the creation of funtions, and in both cases R and Python are very flexible, as it ca be seen here for R:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=http://i.imgur.com/qdoDsTN.png>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"...and here for Python:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for i in range(1,6):\n",
" print i**2"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1\n",
"4\n",
"9\n",
"16\n",
"25\n"
]
}
],
"prompt_number": 20
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"names = [\"John\", \"Mary\", \"Peter\"]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 21
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for name in names:\n",
" print name"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"John\n",
"Mary\n",
"Peter\n"
]
}
],
"prompt_number": 22
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"i=0\n",
"while i<len(names):\n",
" print names[i]\n",
" i+=1"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"John\n",
"Mary\n",
"Peter\n"
]
}
],
"prompt_number": 23
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One important **difference** when working with **indices** is that for a collection of **n** values, R offers indices from 1 to n, and Python offers indices from 0 to n-1. \n",
"Remember that Dictionaries do not have indices:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#dictionaries can be visited, making explicit if you want the values, the keys or both:\n",
"for value in PeterDictPython:\n",
" print value\n",
"\n",
"#is this what you want??"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"lastname\n",
"age\n",
"name\n"
]
}
],
"prompt_number": 24
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for value in PeterDictPython.values():\n",
" print value\n",
"\n",
"#this is what you menat, right?"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Jackson\n",
"30\n",
"Peter\n"
]
}
],
"prompt_number": 25
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for key,value in PeterDictPython.items():\n",
" print key,value\n",
"\n",
"#what about this?"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"lastname Jackson\n",
"age 30\n",
"name Peter\n"
]
}
],
"prompt_number": 26
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for key in PeterDictPython.keys():\n",
" print key, PeterDictPython[key]\n",
"\n",
"#another alternative"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"lastname Jackson\n",
"age 30\n",
"name Peter\n"
]
}
],
"prompt_number": 27
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment