Skip to content

Instantly share code, notes, and snippets.

@aparrish
Last active October 23, 2020 10:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save aparrish/9ee0dcb3c9c74f4e3e90 to your computer and use it in GitHub Desktop.
Save aparrish/9ee0dcb3c9c74f4e3e90 to your computer and use it in GitHub Desktop.
updated version of my dictionaries/web api notes!
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Dictionaries\n",
"\n",
"The dictionary is a very useful data structure in Python. The easiest way to conceptualize a dictionary is that it's like a list, except you don't look up values in a dictionary by their index in a sequence---you look them up using a \"key,\" or a unique identifier for that value.\n",
"\n",
"We're going to focus here just on learning how to get data *out* of dictionaries, not how to build new dictionaries from existing data. We're also going to omit some of the nitty-gritty details about how dictionaries work internally. You'll learn a lot of those details in later courses, but for now it means that some of what I'm going to tell you will seem weird and magical. Be prepared!\n",
"\n",
"## Why dictionaries?\n",
"\n",
"For our purposes, the benefit of having data that can be parsed into dictionaries, as opposed to lists, is that dictionary keys tend to be *mnemonic*. That is, a dictionary key will usually tell you something about what its value is. (This is in opposition to parsing, say, CSV data, where we have to keep counting fields in the header row and translating that to the index that we want.)\n",
"\n",
"Lists and dictionaries work together and are used extensively to represent all different kinds of data. Often, when we get data from a remote source, or when we choose how to represent data internally, we'll use both in tandem. The most common form this will take is representing a table, or a database, as a *list* of records that are themselves represented as *dictionaries* (mapping the name of the column to the value for that column). We'll see an example of this when we access the New York Times API, below.\n",
"\n",
"Dictionaries are also good for storing *associations* or *mappings* for quick lookups. For example, if you wanted to write a program that was able to recall the capital city of every US state, you might use a dictionary whose keys are the names of the states, and whose values are the corresponding capitals. Dictionaries are also used for data analysis tasks, like keeping track of how many times a particular token occurs in an incoming data stream.\n",
"\n",
"## What dictionaries look like\n",
"\n",
"Dictionaries are written with curly brackets, surrounding a series of comma-separated pairs of *keys* and *values*. Here's a very simple dictionary, with one key, `Obama`, associated with a value, `Hawaii`:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'Barack Obama': 'Hawaii'}"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"{'Barack Obama': 'Hawaii'}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's another dictionary, with two more entries:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'Barack Obama': 'Hawaii',\n",
" 'Bill Clinton': 'Arkansas',\n",
" 'George W. Bush': 'Texas'}"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"{'Barack Obama': 'Hawaii', 'George W. Bush': 'Texas', 'Bill Clinton': 'Arkansas'}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, we're building a simple dictionary that associates the names of presidents to the home states of those presidents. (This is my version of JOURNALISM.)\n",
"\n",
"The association of a key with a value is sometimes called a *mapping*. (In fact, in other programming languages like Java, the dictionary data structure is called a \"Map.\") So, in the above dictionary for example, we might say that the key `Bill Clinton` *maps to* the value `Arkansas`.\n",
"\n",
"A dictionary is just like any other Python value. It has a type:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<type 'dict'>\n"
]
}
],
"source": [
"print(type({'Barack Obama': 'Hawaii', 'George W. Bush': 'Texas', 'Bill Clinton': 'Arkansas'}))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And you can assign a dictionary to a variable:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<type 'dict'>\n"
]
}
],
"source": [
"president_states = {'Barack Obama': 'Hawaii', 'George W. Bush': 'Texas', 'Bill Clinton': 'Arkansas'}\n",
"print(type(president_states))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Keys and values in dictionaries can be of any data type, not just strings. Here's a dictionary, for example, that maps integers to lists of floating point numbers:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{17: [1.6, 2.45], 42: [11.6, 19.4], 101: [0.123, 4.89]}\n"
]
}
],
"source": [
"print({17: [1.6, 2.45], 42: [11.6, 19.4], 101: [0.123, 4.89]})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> HEAD-SPINNING OPTIONAL ASIDE: Actually, \"any type\" above is a simplification: *values* can be of any type, but keys must be *hashable*---see [the Python glossary](https://docs.python.org/2/glossary.html#term-hashable) for more information. In practice, this limitation means you can't use lists (or dictionaries themselves) as keys in dictionaries. There are ways of getting around this, though!\n",
"\n",
"A dictionary can also be empty, containing no key/value pairs at all:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{}\n"
]
}
],
"source": [
"print({})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting values out of dictionaries\n",
"\n",
"The primary operation that we'll perform on dictionaries is writing an expression that evaluates to the value for a particular key. We do that with the same syntax we used to get a value at a particular index from a list. Except with dictionaries, instead of using a number, we use one of the keys that we had specified for the value when making the dictionary. For example, if we wanted to know what Bill Clinton's home state was, or, more precisely, what the value for the key `Bill Clinton` is, we would write this expression:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Arkansas\n"
]
}
],
"source": [
"print({'Barack Obama': 'Hawaii', 'George W. Bush': 'Texas', 'Bill Clinton': 'Arkansas'}['Bill Clinton'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"... or, using a variable that has previously been assigned to a dictionary:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Texas\n"
]
}
],
"source": [
"print(president_states['George W. Bush'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we put a key in those brackets that does not exist in the dictionary, we get an error similar to the one we get when trying to access an element of an array beyond the end of a list:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"ename": "KeyError",
"evalue": "'Benjamin Franklin'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-9-643190b7aea9>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpresident_states\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'Benjamin Franklin'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mKeyError\u001b[0m: 'Benjamin Franklin'"
]
}
],
"source": [
"print(president_states['Benjamin Franklin'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you might suspect, the thing you put inside the brackets doesn't have to be a string; it can be any Python expression, as long as it evaluates to something that is a key in the dictionary:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Hawaii\n"
]
}
],
"source": [
"president = 'Barack Obama'\n",
"print(president_states[president])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can get a list of all the keys in a dictionary using the dictionary's `.keys()` method:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['Bill Clinton', 'George W. Bush', 'Barack Obama']\n"
]
}
],
"source": [
"print({'Barack Obama': 'Hawaii', 'George W. Bush': 'Texas', 'Bill Clinton': 'Arkansas'}.keys())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And a list of all the values with the `.values()` method:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['Arkansas', 'Texas', 'Hawaii']\n"
]
}
],
"source": [
"print({'Barack Obama': 'Hawaii', 'George W. Bush': 'Texas', 'Bill Clinton': 'Arkansas'}.values())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want a list of all key/value pairs, you can call the `.items()` method:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[('Bill Clinton', 'Arkansas'), ('George W. Bush', 'Texas'), ('Barack Obama', 'Hawaii')]\n"
]
}
],
"source": [
"print({'Barack Obama': 'Hawaii', 'George W. Bush': 'Texas', 'Bill Clinton': 'Arkansas'}.items())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"(The weird list-like things here that use parentheses instead of brackets are called *tuples*---we'll discuss those at a later date.)\n",
"\n",
"### Dictionaries can contain lists and other dictionaries\n",
"\n",
"As mentioned above, a dictionary can itself contain lists and dictionaries as values (and those lists and dictionaries can themselves contain other lists and dictionaries, etc. etc. until your computer runs out of memory). The syntax for getting a value out of a list inside of a dictionary looks very similar to the syntax for getting a value out of a list of lists:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"edam\n"
]
}
],
"source": [
"print({'cheeses': ['cheddar', 'edam', 'emmental']}['cheeses'][1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To explain in a bit more detail, observe here what the following expression evaluates to:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['cheddar', 'edam', 'emmental']\n"
]
}
],
"source": [
"print({'cheeses': ['cheddar', 'edam', 'emmental']}['cheeses'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It follows that putting a square bracket index at the end of *that* expression would evaluate to a single item inside the list.\n",
"\n",
"> BONUS EXERCISE: Devise a dictionary that has within it another dictionary for a value. Write the expression to get the value for a key inside of the inner dictionary."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Adding key/value pairs to a dictionary\n",
"\n",
"Once you've assigned a dictionary to a variable, like so:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"president_states = {'Barack Obama': 'Hawaii', 'George W. Bush': 'Texas', 'Bill Clinton': 'Arkansas'}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"... you can add another key/value pair to the dictionary by assigning a value to a new index, like so:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"president_states['Ronald Reagan'] = 'California'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Printing the dictionary shows that there's a new key/value pair in there:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'Bill Clinton': 'Arkansas', 'George W. Bush': 'Texas', 'Barack Obama': 'Hawaii', 'Ronald Reagan': 'California'}\n"
]
}
],
"source": [
"print(president_states)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dictionaries are unordered\n",
"\n",
"You may have noticed something in the previous examples, which is that sometimes the order in which we wrote our key/value pairs in our dictionaries is NOT the same order that those key/value pairs come out as when evaluating the dictionary as an expression or when using the `.keys()` and `.values()` methods. That's because dictionaries in Python are *unordered*. A dictionary consists of a number of key/value pairs, but that's it---Python has no concept of which pairs come \"before\" or \"after\" other the pairs in the dictionary.\n",
"\n",
"Here's a more forceful demonstration:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'a': 1, 'c': 3, 'b': 2, 'e': 5, 'd': 4, 'g': 7, 'f': 6, 'i': 9, 'h': 8, 'j': 10}\n"
]
}
],
"source": [
"print {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6, 'g': 7, 'h': 8, 'i': 9, 'j': 10}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Chances are that when you run the code in the above cell, you'll get back a different ordering than the ordering you'd originally written. If you restart the iPython process (or your server), you might get an ordering back that is different from *that*.\n",
"\n",
"A better way of phrasing the idea that dictionaries are unordered is to say instead that \"two dictionaries are considered the same if they have the same keys mapped to the same values.\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dictionary keys are unique\n",
"\n",
"Another important fact about dictionaries is that you can't put the same key into one dictionary twice. If you try to write out a dictionary that has the same key used more than once, Python will silently ignore all but one of the key/value pairs. For example:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'a': 3}\n"
]
}
],
"source": [
"print({'a': 1, 'a': 2, 'a': 3})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similarly, if we attempt to set the value for a key that already exists in the dictionary (using `=`), we won't add a second key/value pair for that key---we'll just overwrite the existing value:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1\n",
"100\n"
]
}
],
"source": [
"test_dict = {'a': 1, 'b': 2}\n",
"print(test_dict['a'])\n",
"test_dict['a'] = 100\n",
"print(test_dict['a'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the case where a key needs to map to multiple values, we might instead see a data structure in which the key maps to another kind of data structure that itself can contain multiple values, like a list:"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'a': [1, 2, 3]}\n"
]
}
],
"source": [
"print({'a': [1, 2, 3]})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting data from the web\n",
"\n",
"At this point, we have enough programming scaffolding in place to start talking about how to access Web APIs. A web API is some collection of data, made available on the web, provided in a format easy for computers to parse. But in order to write programs to access web APIs, I need to talk about a few other things first."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### URLs\n",
"\n",
"A URL (\"uniform resource locator\") uniquely identifies a document on the web, and provides instructions for how to access it. It's the thing you type into your web browser's address bar. It's what you cut-and-paste when you want to e-mail an article to a friend. Most of what we do on the web---whether we're using a web browser or writing a program that accesses the web---boils down to manipulating URLs.\n",
"\n",
"So it's important for us to understand the structure of URLs, so we can take them apart and put them back together (both in our heads and programmatically). URLs have a conventional structure that is specified in Internet standards documentation, and many of the web APIs we'll be accessing assume knowledge of this structure. So let's take the following URL:\n",
"\n",
" http://www.example.com/foo/bar?arg1=baz&arg2=quux\n",
" \n",
"... and break it down into parts, so we have a common vocabulary.\n",
"\n",
"| Part | Name |\n",
"|------|------|\n",
"| `http` | scheme |\n",
"| `www.example.com` | host |\n",
"| `/foo/bar` | path |\n",
"| `?arg1=baz&arg2=quux` | query string |\n",
"\n",
"All of these parts are required, except for the query string, which is optional. Explanations:\n",
"\n",
"* The *scheme* determines what *protocol* will be used to access this resource. For our purposes, this will almost always be `http` (HyperText Transfer Protocol) or `https` (HTTP, but over an encrypted connection).\n",
"* The *host* specifies which server on the Internet we're going to talk to in order to retrieve the document we want.\n",
"* The *path* names a resource on the server, often using slashes (`/`) to represent hierarchical relationships between resources. (Sometimes this corresponds to actual files on the server, but just as often it does not.)\n",
"* The *query string* is a means to tell the server *how* we want the document delivered. (More examples of this soon.)\n",
"\n",
"Most of the work you'll do in learning how to use a web API is learning how to construct and manipulate URLs. A quick glance through [the documentation for, e.g., the New York Times API](http://developer.nytimes.com/docs) reveals that the bulk of the documentation is just a big list of URLs, with information on how to adjust those URLs to get the information you want."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### HTML, JSON and web APIs\n",
"\n",
"The most common format for documents on the web is HTML (HyperText Markup Language). Web browsers like HTML because they know how to render as human-readable documents---in fact, that's what web browsers are for: turning HTML from the web into visually splendid and exciting multimedia affairs.\n",
"\n",
"HTML was specifically designed to be a tool for creating web pages, and it excels at that, but it's not so great for describing structured data. Another popular format---and the format we'll be learning how to work with this week---is JSON (JavaScript Object Notation). Like HTML, JSON is a format for exchanging structured data between two computer programs. Unlike HTML, JSON is primarily intended to communicate content, rather than layout.\n",
"\n",
"Roughly speaking, whenever a web site exposes a URL for human readers, the document at that URL is in HTML. Whenever a web site exposes a URL for programmatic use, the document at that URL is in JSON. (There are other formats commonly used for computer-readable documents, like XML. But let's keep it simple for now.) As an example, Wordnik has a human-readable version of page for the definition of the word \"hello,\" found at this URL:\n",
"\n",
"> https://wordnik.com/words/hello\n",
"\n",
"But Wordnik also has a version of this data designed to be easily readable by computers. This is the URL, and it returns a document in JSON format:\n",
"\n",
"> http://api.wordnik.com:80/v4/word.json/cheese/definitions\n",
"\n",
"Every web site makes available a number of URLs that return human-readable documents; many web sites (like Twitter) also make available URLs that return documents intended to be read by computer programs. Often---as is the case with Facebook, or with sites like Metafilter that make their content available through RSS feeds---these are just two views into the same data.\n",
"\n",
"You can think of a web API as the set of URLs, and rules for manipulating URLs, that a web site makes available and that are also intended to be read by computer programs. (API stands for \"application programming interface\"; a \"web API\" is an interface enables you to program applications that use the web site's data.)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### API Keys\n",
"\n",
"You may have noticed when you clicked on the \"computer-readable\" link above that you received the following message (or one like it) in your web browser:\n",
"\n",
" {\"message\": \"unauthorized\", \"type\": \"error\"}\n",
"\n",
"This message results from the fact that most web APIs (unlike most web pages) require some kind of *authentication*. \"Authentication\" here means some kind of information that associates the request with an individual. In many APIs, this takes the form of a \"token\" or \"key\" (also called a \"client ID\" and/or \"secret\")---most usually an extra parameter that you pass on the end of the URL (or in an HTTP header) that identifies the request as having come from a unique user. Some services (like Facebook) provide a subset of functionality to non-authenticated (\"anonymous\") requests; others require authentication for all requests.\n",
"\n",
"So how do you get \"keys\" or \"tokens\"? There's usually some kind of sign-up form in or near the developer documentation for the service in question. Sign up for Wordnik [here](http://developer.wordnik.com/), or Foursquare [here](https://developer.foursquare.com/) (click on \"My Apps\" in the header after you've logged in). The form may ask you for a description of your application; it's usually safe to leave this blank, or to put in some placeholder text. Only rarely is this text reviewed by an actual human being; your key is usually issued automatically.\n",
"\n",
"Different services have different requirements regarding how to include your keys in your request; you'll have to consult the documentation to know for sure.\n",
"\n",
"In the following examples, I'll assume you've acquired an access token for the APIs in question and have assigned them to variables in the programs as appropriate."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Fetching web documents with `urllib`\n",
"\n",
"Python has a library called `urllib` built-in, which allows you to make requests to web servers in order to retrieve web documents. You give it a URL, and it gives back a string that contains the content of the document located at that URL. We used this earlier to fetch CSV files.\n",
"\n",
"Here's an example of how to use `urllib`. Our task here is to fetch the document at the URL given above, for the computer-readable version of Wordnik's definitions for the word \"hello\":"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[{\"textProns\":[],\"sourceDictionary\":\"ahd-legacy\",\"exampleUses\":[],\"relatedWords\":[],\"labels\":[],\"citations\":[],\"word\":\"hello\",\"attributionText\":\"from The American Heritage® Dictionary of the English Language, 4th Edition\",\"partOfSpeech\":\"interjection\",\"sequence\":\"0\",\"text\":\"Used to greet someone, answer the telephone, or express surprise.\",\"score\":0.0},{\"textProns\":[],\"sourceDictionary\":\"ahd-legacy\",\"exampleUses\":[],\"relatedWords\":[],\"labels\":[],\"citations\":[],\"word\":\"hello\",\"attributionText\":\"from The American Heritage® Dictionary of the English Language, 4th Edition\",\"partOfSpeech\":\"noun\",\"sequence\":\"1\",\"text\":\"A calling or greeting of \\\"hello.”\",\"score\":0.0},{\"textProns\":[],\"sourceDictionary\":\"ahd-legacy\",\"exampleUses\":[],\"relatedWords\":[],\"labels\":[],\"citations\":[],\"word\":\"hello\",\"attributionText\":\"from The American Heritage® Dictionary of the English Language, 4th Edition\",\"partOfSpeech\":\"verb-intransitive\",\"sequence\":\"2\",\"text\":\"To call \\\"hello.”\",\"score\":0.0}]\n"
]
}
],
"source": [
"import urllib\n",
"\n",
"api_key = \"a80a5131f7620c32a8919063dce09d01b6239543e3d0063bf\"\n",
"url = \"http://api.wordnik.com:80/v4/word.json/hello/definitions?api_key=\"+api_key\n",
"doc_str = urllib.urlopen(url).read()\n",
"print(doc_str)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Oh hey wow that's pretty rad! We'll break down *how* exactly to know what the URL for a particular resource is a bit later (and how to add the API to the request). But for now, just revel in the accomplishment of getting information from an API into your Python program.\n",
"\n",
"Don't worry for now about decomposing the `urllib.urlopen()` line---the important part is merely that you can put a string containing a URL in that first pair of parentheses and the whole expression will evaluate to a string containing the contents of the document fetched from that URL."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interpreting JSON\n",
"\n",
"So we assigned the result of fetching that document from the Wordnik API to a variable called `doc_str`. When we printed it out, it looked like... a list of dictionaries? It has the structure of a list of dictionaries, at least: looks like we have square brackets that contain curly brackets with keys associated with values, comma-separated with colons in between. So how would we get the value for the first item from this list? It can't be as easy as it looks, right?!"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[\n"
]
}
],
"source": [
"print doc_str[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nope. That's not right. It *looks* like a list, but it's actually..."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<type 'str'>\n"
]
}
],
"source": [
"print type(doc_str)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"... a string! In fact, almost all of the data that gets returned from web APIs will arrive in the form of a string. Even though the string here *looks* like a Python data structure, it's actually a string in JSON format. So what we need is some kind of Python library that will allow us to write an expression that translates a JSON string into an actual Python data structure. Once we have that data structure, we can start writing other expressions to do with the data what we please.\n",
"\n",
"Fortunately, just such a library exists! Here's how to use it."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import json\n",
"\n",
"doc_data = json.loads(doc_str)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `json.loads` function takes some expression that evaluates to a string as its parameter (between the parentheses), and evaluates to whatever Python data structure is represented in the string. Snazzy. Let's check the type now:"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<type 'list'>\n"
]
}
],
"source": [
"print type(doc_data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A list! Okay. Let's get the first item:"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{u'word': u'hello', u'score': 0.0, u'sequence': u'0', u'relatedWords': [], u'text': u'Used to greet someone, answer the telephone, or express surprise.', u'textProns': [], u'labels': [], u'attributionText': u'from The American Heritage\\xae Dictionary of the English Language, 4th Edition', u'exampleUses': [], u'citations': [], u'partOfSpeech': u'interjection', u'sourceDictionary': u'ahd-legacy'}\n"
]
}
],
"source": [
"print doc_data[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Got it. In fact, we can do all of the things with this list that we can do with any Python dictionary:"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Used to greet someone, answer the telephone, or express surprise.\n",
"A calling or greeting of \"hello.”\n",
"To call \"hello.”\n"
]
}
],
"source": [
"for item in doc_data:\n",
" print item['text']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> NOTE: Ignore the `u` in front of all of those strings---that's just Python's way of telling us that those are technically Unicode strings. For our purposes, Unicode strings behave exactly the same as any other kind of string."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Fun with the New York Times API\n",
"\n",
"Many web sites and organizations offer web APIs. We're going to go over how one API in particular works, or at least a subset of a particular web API---the New York Times API. The idea is that by introducing you to this one API, you'll learn the tools necessary to sign up for, query, and interpret APIs from other providers as well.\n",
"\n",
"### Signing up for an API key\n",
"\n",
"Before you can use the New York Times API, you need to sign up for an API key. Do so by going to [the NY Times API application registration site](http://developer.nytimes.com/) and following the instructions. You'll need to provide your e-mail address and select the API you want a key for. For the purposes of this exercise, choose the \"Article Search\" API.\n",
"\n",
"You'll see a form that looks like this:\n",
"\n",
"<img src=\"http://static.decontextualize.com/snaps/create-a-new-api-key.png\" alt=\"nytimesapi01\"/>\n",
"\n",
"You should momentarily receive a e-mail with your API key. The key is just a string of letters and numbers. It'll look something like this:\n",
"\n",
" 098f6bcd4621d373cade4e832627b4f6\n",
" \n",
"That's your \"key\" for that API. Whenever you make a request to that API, you'll need to include your key in the request. The exact methodology for including the key will be explained below. (Note: the key above is just something I made up; it's not a valid key; don't try using it in actual requests.)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using the API console\n",
"\n",
"We can start exploring how the New York Times API works, and what kind of data it provides, and what that data looks like using the [API console](http://developer.nytimes.com/article_search_v2.json#/Console/GET/articlesearch.json). When you click on that link, you'll see a screen that looks something like this:\n",
"\n",
"<img src=\"http://static.decontextualize.com/snaps/nytimes-api-console.png\" alt=\"nytimesapi02\"/>\n",
"\n",
"Here are the important moving parts:\n",
"\n",
"* The left-hand panel shows several sections that you can expand and collapse. The \"Credentials\" section lets you put in your API key, which you'll need for future requests.\n",
"* The \"Parameters\" section, when expanded, allows you to type in various parameters to be passed as part of the API request. The console itself doesn't tell you what these parameters mean; for that, you'll need to consult [the README](http://developer.nytimes.com/article_search_v2.json#/README) (also available by clicking on \"README\" at the top of the console.)\n",
"* Once you've filled in your API key and at least the `q` parameter, the right-hand panel will show the results of your request in JSON format. You can reload or repeat the request using the \"View Results\" button at the top of the page.\n",
"\n",
"There's no reason you should know what these fields mean without reading the documentation first. But it's okay to guess, and it's okay to play!\n",
"\n",
"The API tool is an invaluable way to \"try out\" your requests before writing Python to programmatically access the API. Most APIs make a similar tool available, so when you're trying to learn a new API, keep a look out."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Putting together the URL\n",
"\n",
"Unhelpfully, the API console doesn't just give you the URL to fetch in order to perform the API request. We have to do a little bit of backwards engineering, either based on the code examples provided, or on the documentation. In the \"Examples\" section of the README, we're told that the following is an example of a valid URL for a request to the API:\n",
"\n",
" http://api.nytimes.com/svc/search/v2/articlesearch.json?q=new+york+times&page=2&sort=oldest&api-key=####\n",
" \n",
"From this, we can deduce that the request URL consists of two parts:\n",
"\n",
"* `http://api.nytimes.com/svc/search/v2/articlesearch.json`, which will remain invariant for all requests; and\n",
"* a *query string* that follows the string above. (In the README example, the query string is `q=new+york+times&page=2&sort=oldest&api-key=####`. We're meant to replace `####` with our own API key.)\n",
"\n",
"So to make a request, we need to be able to build a query string."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Building query strings dynamically\n",
"\n",
"Let's review what a query string looks like. Here's the example query string from the section above about URL structure:\n",
"\n",
" ?arg1=baz&arg2=quux\n",
" \n",
"At first glance, it looks like garbage. For another perspective, let's look at the query string from the example request to the New York Times API:\n",
"\n",
" ?q=new+york+times&page=2&sort=oldest&api-key=####\n",
" \n",
"(I've written the `api-key` here as `####` so as to not give away my credentials.) A little bit of the structure becomes more apparent here! It looks like we have a series of key/value, separated by ampersands (`&`): `q=new+york+times`, `page=2`, `sort=oldest`, etc. The pairs themselves are separated by equal signs (`=`).\n",
"\n",
"> INTEPRETIVE QUESTION: What kind of data structure does this resemble? It's a data structure that we talked about earlier today. Think hard. Yes, it's a dictionary!\n",
"\n",
"What we'd like to have, then, is some kind of way to write an expression that turns a Python dictionary into a string formatted correctly to include in a query string. It turns out that the process of doing this is [kind of tricky](http://en.wikipedia.org/wiki/Percent-encoding), so Python provides a function to do the hard work for us. That function is `urllib.urlencode()`. If you give it a dictionary as a parameter, it evaluates to a string containing the contents of that dictionary, formatted as a URL query string. An example:"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"q=python&api-key=12345\n"
]
}
],
"source": [
"import urllib\n",
"\n",
"print urllib.urlencode({'q': 'python', 'api-key': '12345'})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we include that query string at the end of the base URL (taking care to put a `?` between them), we get the full URL for making our request to the API:"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"http://api.nytimes.com/svc/search/v2/articlesearch.json?q=python&api-key=12345\n"
]
}
],
"source": [
"base_url = \"http://api.nytimes.com/svc/search/v2/articlesearch.json?\"\n",
"query_str = urllib.urlencode({'q': 'python', 'api-key': '12345'})\n",
"request_url = base_url + query_str\n",
"print request_url"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we're ready to do some damage. Join me, friends."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Making API requests in Python\n",
"\n",
"Now that we can put a URL together with `urllib.urlencode()`, we can actually make an API request. To do this, we'll use a function called `urllib.urlopen()`. First, make sure you've set a variable called `api_key` somewhere with your API key:"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"api_key = \"paste your api key here\"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, let's make the request using `urllib.urlopen()`. This function takes a single argument—the URL to fetch—and evaluates to an object with a `.read()` method that will return the entire contents of the response as a string, which we'll call immediately. Here's what it looks like:"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\"response\":{\"meta\":{\"hits\":4228,\"time\":7,\"offset\":0},\"docs\":[{\"web_url\":\"http:\\/\\/query.nytimes.com\\/gst\\/abstract.html?res=9D0CE1DF1E3EE433A25751C1A9609C94639FD7CF\",\"snippet\":\"Python's Eggs, a Meal of...\",\"lead_paragraph\":null,\"abstract\":\"Python's Eggs, a Meal of\",\"print_page\":\"5\",\"blog\":[],\"source\":\"The New York Times\",\"multimedia\":[],\"headline\":{\"main\":\"TOOTHSOME PYHON EGGS.\"},\"keywords\":[{\"name\":\"subject\",\"value\":\"PYTHON\"}],\"pub_date\":\"1882-06-12T00:03:58Z\",\"document_type\":\"article\",\"news_d\n"
]
}
],
"source": [
"import urllib\n",
"\n",
"base_url = \"http://api.nytimes.com/svc/search/v2/articlesearch.json?\"\n",
"query_str = urllib.urlencode({'q': 'python', 'api-key': api_key})\n",
"request_url = base_url + query_str\n",
"\n",
"response_str = urllib.urlopen(request_url).read()\n",
"print(response_str[:500])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I'm using string slice syntax here to print out the first five hundred characters of the response, just to verify that we've received the data we're expecting. The response is returned in JSON format, which we can convert to a Python data structure using `json.loads()`:"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[u'status', u'response', u'copyright']\n"
]
}
],
"source": [
"response_dict = json.loads(response_str)\n",
"print response_dict.keys() # just to quickly check if the data looks correct"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you've formatted the URL incorrectly, or if there's another problem (such as your computer not having a connection to the network, or the API being unavailable), you might get an error like this:"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false
},
"outputs": [
{
"ename": "IOError",
"evalue": "('http error', 401, 'Unauthorized', <httplib.HTTPMessage instance at 0x103c68bd8>)",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mIOError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-42-f75aa676e5e6>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0murl\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"http://api.nytimes.com/svc/search/v2/blarticle\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0merror_str\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0murllib\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0murlopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0merror_dict\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mjson\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mloads\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merror_str\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.pyc\u001b[0m in \u001b[0;36murlopen\u001b[0;34m(url, data, proxies)\u001b[0m\n\u001b[1;32m 85\u001b[0m \u001b[0mopener\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_urlopener\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 86\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mdata\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 87\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mopener\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 88\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 89\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mopener\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.pyc\u001b[0m in \u001b[0;36mopen\u001b[0;34m(self, fullurl, data)\u001b[0m\n\u001b[1;32m 206\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 207\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mdata\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 208\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 209\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 210\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.pyc\u001b[0m in \u001b[0;36mopen_http\u001b[0;34m(self, url, data)\u001b[0m\n\u001b[1;32m 357\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 358\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mdata\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 359\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhttp_error\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfp\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0merrcode\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0merrmsg\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mheaders\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 360\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 361\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhttp_error\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfp\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0merrcode\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0merrmsg\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mheaders\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.pyc\u001b[0m in \u001b[0;36mhttp_error\u001b[0;34m(self, url, fp, errcode, errmsg, headers, data)\u001b[0m\n\u001b[1;32m 370\u001b[0m \u001b[0mmethod\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 371\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mdata\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 372\u001b[0;31m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmethod\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfp\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0merrcode\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0merrmsg\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mheaders\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 373\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 374\u001b[0m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmethod\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfp\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0merrcode\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0merrmsg\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mheaders\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.pyc\u001b[0m in \u001b[0;36mhttp_error_401\u001b[0;34m(self, url, fp, errcode, errmsg, headers, data)\u001b[0m\n\u001b[1;32m 691\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mscheme\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlower\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0;34m'basic'\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 692\u001b[0m URLopener.http_error_default(self, url, fp,\n\u001b[0;32m--> 693\u001b[0;31m errcode, errmsg, headers)\n\u001b[0m\u001b[1;32m 694\u001b[0m \u001b[0mname\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'retry_'\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtype\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;34m'_basic_auth'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 695\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mdata\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.pyc\u001b[0m in \u001b[0;36mhttp_error_default\u001b[0;34m(self, url, fp, errcode, errmsg, headers)\u001b[0m\n\u001b[1;32m 379\u001b[0m \u001b[0;34m\"\"\"Default error handler: close the connection and raise IOError.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 380\u001b[0m \u001b[0mfp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mclose\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 381\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mIOError\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;34m'http error'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0merrcode\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0merrmsg\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mheaders\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 382\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 383\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0m_have_ssl\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mIOError\u001b[0m: ('http error', 401, 'Unauthorized', <httplib.HTTPMessage instance at 0x103c68bd8>)"
]
}
],
"source": [
"url = \"http://api.nytimes.com/svc/search/v2/blarticle\"\n",
"error_str = urllib.urlopen(url).read()\n",
"error_dict = json.loads(error_str)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We won't be focusing here on making our web API clients robust enough to handle errors gracefully, but for more information about the error, you can try printing out the string you received as a response, before you attempted to parse it as JSON:"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\"meta\":{\"code\":404,\"msg\":\"NOT FOUND.\"}}\n"
]
}
],
"source": [
"print error_str"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is the string that the server returned in response to the malformed request. With some APIs, this response will be helpful; in this case it isn't (it tells us that something \"isn't found\" but doesn't give us a good hint as to how we might fix the problem)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Working with responses\n",
"\n",
"Now we've gotten a response from the API, and we've parsed it into a Python data structure that we know how to use (a dictionary). But now what do we do with it? First off, let's look at the actual structure of the data that we have and try to characterize its structure, from both a syntactic (what are the parts and what's it made of) and semantic (what does it mean?) perspective.\n",
"\n",
"One of IPython Notebook's nice features is that if you make a code cell with a variable or expression on its own in a single line, it will do its best to format the value of that expression in a nice, readable way. Let's do this for the `response_dict` variable that we created in a cell above."
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{u'copyright': u'Copyright (c) 2013 The New York Times Company. All Rights Reserved.',\n",
" u'response': {u'docs': [{u'_id': u'4fc0032045c1498b0d0e1048',\n",
" u'abstract': u\"Python's Eggs, a Meal of\",\n",
" u'blog': [],\n",
" u'byline': None,\n",
" u'document_type': u'article',\n",
" u'headline': {u'main': u'TOOTHSOME PYHON EGGS.'},\n",
" u'keywords': [{u'name': u'subject', u'value': u'PYTHON'}],\n",
" u'lead_paragraph': None,\n",
" u'multimedia': [],\n",
" u'news_desk': None,\n",
" u'print_page': u'5',\n",
" u'pub_date': u'1882-06-12T00:03:58Z',\n",
" u'section_name': None,\n",
" u'slideshow_credits': None,\n",
" u'snippet': u\"Python's Eggs, a Meal of...\",\n",
" u'source': u'The New York Times',\n",
" u'subsection_name': None,\n",
" u'type_of_material': u'Article',\n",
" u'web_url': u'http://query.nytimes.com/gst/abstract.html?res=9D0CE1DF1E3EE433A25751C1A9609C94639FD7CF',\n",
" u'word_count': 323},\n",
" {u'_id': u'58a6f12a95d0e024746371fe',\n",
" u'abstract': None,\n",
" u'blog': [],\n",
" u'byline': {u'organization': u'THE ASSOCIATED PRESS',\n",
" u'original': u'By THE ASSOCIATED PRESS',\n",
" u'person': []},\n",
" u'document_type': u'article',\n",
" u'headline': {u'main': u\"11-Foot Python Slithers Into South Florida Student's Car\",\n",
" u'print_headline': u\"11-Foot Python Slithers Into South Florida Student's Car\"},\n",
" u'keywords': [],\n",
" u'lead_paragraph': u\"A South Florida college student says he was startled when he saw a large snake crawl under his roommate's car.\",\n",
" u'multimedia': [],\n",
" u'news_desk': u'None',\n",
" u'print_page': 0,\n",
" u'pub_date': u'2017-02-17T12:48:32+0000',\n",
" u'section_name': u'U.S.',\n",
" u'slideshow_credits': None,\n",
" u'snippet': u\"A South Florida college student says he was startled when he saw a large snake crawl under his roommate's car....\",\n",
" u'source': u'AP',\n",
" u'subsection_name': None,\n",
" u'type_of_material': u'News',\n",
" u'web_url': u'https://www.nytimes.com/aponline/2017/02/17/us/ap-us-odd-python-in-car.html',\n",
" u'word_count': 142},\n",
" {u'_id': u'54a44b1338f0d83a07dc39d5',\n",
" u'abstract': None,\n",
" u'blog': [],\n",
" u'byline': {u'original': u'Twentieth Century Fox Home Entertainment',\n",
" u'person': [{u'firstname': u'Twentieth',\n",
" u'lastname': u'Fox',\n",
" u'middlename': u'Century',\n",
" u'organization': u'',\n",
" u'rank': 1,\n",
" u'role': u'reported'}]},\n",
" u'document_type': u'multimedia',\n",
" u'headline': {u'main': u'Python'},\n",
" u'keywords': [],\n",
" u'lead_paragraph': u'',\n",
" u'multimedia': [{u'height': 126,\n",
" u'legacy': {u'wide': u'images/2014/05/12/movies/video-python/video-python-thumbWide.jpg',\n",
" u'wideheight': u'126',\n",
" u'widewidth': u'190'},\n",
" u'subtype': u'wide',\n",
" u'type': u'image',\n",
" u'url': u'images/2014/05/12/movies/video-python/video-python-thumbWide.jpg',\n",
" u'width': 190},\n",
" {u'height': 338,\n",
" u'legacy': {u'xlarge': u'images/2014/05/12/movies/video-python/video-python-articleLarge.jpg',\n",
" u'xlargeheight': u'338',\n",
" u'xlargewidth': u'600'},\n",
" u'subtype': u'xlarge',\n",
" u'type': u'image',\n",
" u'url': u'images/2014/05/12/movies/video-python/video-python-articleLarge.jpg',\n",
" u'width': 600},\n",
" {u'height': 75,\n",
" u'legacy': {u'thumbnail': u'images/2014/05/12/movies/video-python/video-python-thumbStandard.jpg',\n",
" u'thumbnailheight': u'75',\n",
" u'thumbnailwidth': u'75'},\n",
" u'subtype': u'thumbnail',\n",
" u'type': u'image',\n",
" u'url': u'images/2014/05/12/movies/video-python/video-python-thumbStandard.jpg',\n",
" u'width': 75}],\n",
" u'news_desk': u'Movies',\n",
" u'print_page': None,\n",
" u'pub_date': u'2014-12-31T14:14:19Z',\n",
" u'section_name': u'Movies',\n",
" u'slideshow_credits': None,\n",
" u'snippet': u'',\n",
" u'source': u'Internet Video Archive',\n",
" u'subsection_name': None,\n",
" u'type_of_material': u'Video',\n",
" u'web_url': u'http://www.nytimes.com/video/movies/100000003366293/python.html',\n",
" u'word_count': u'0'},\n",
" {u'_id': u'51d6cb197e0d9c0839d2e230',\n",
" u'abstract': u'Mark Forstater, a British film producer who was among those who made the 1975 comedy hit \\u201cMonty Python and the Holy Grail,\\u201d said there had been an agreement he would be \\u201ctreated as the seventh Python\\u201d when it came to income from merchandising and other spin-offs from that movie.',\n",
" u'blog': [],\n",
" u'byline': {u'original': u'By DAVE ITZKOFF',\n",
" u'person': [{u'firstname': u'Dave',\n",
" u'lastname': u'ITZKOFF',\n",
" u'organization': u'',\n",
" u'rank': 1,\n",
" u'role': u'reported'}]},\n",
" u'document_type': u'blogpost',\n",
" u'headline': {u'kicker': u'ArtsBeat',\n",
" u'main': u\"Monty Python Producer Wins Royalties in 'Spamalot' Lawsuit\",\n",
" u'print_headline': u'Monty Python Producer Wins in \\u2018Spamalot\\u2019 Suit'},\n",
" u'keywords': [{u'is_major': u'Y',\n",
" u'name': u'persons',\n",
" u'rank': u'1',\n",
" u'value': u'Forstater, Mark'},\n",
" {u'is_major': u'Y',\n",
" u'name': u'subject',\n",
" u'rank': u'2',\n",
" u'value': u'Theater'},\n",
" {u'name': u'organizations', u'rank': u'1', u'value': u'Monty Python'}],\n",
" u'lead_paragraph': None,\n",
" u'multimedia': [{u'height': 126,\n",
" u'legacy': {u'wide': u'images/2013/07/05/arts/spamalot/spamalot-thumbWide.jpg',\n",
" u'wideheight': u'126',\n",
" u'widewidth': u'190'},\n",
" u'subtype': u'wide',\n",
" u'type': u'image',\n",
" u'url': u'images/2013/07/05/arts/spamalot/spamalot-thumbWide.jpg',\n",
" u'width': 190},\n",
" {u'height': 75,\n",
" u'legacy': {u'thumbnail': u'images/2013/07/05/arts/spamalot/spamalot-thumbStandard.jpg',\n",
" u'thumbnailheight': u'75',\n",
" u'thumbnailwidth': u'75'},\n",
" u'subtype': u'thumbnail',\n",
" u'type': u'image',\n",
" u'url': u'images/2013/07/05/arts/spamalot/spamalot-thumbStandard.jpg',\n",
" u'width': 75}],\n",
" u'news_desk': None,\n",
" u'print_page': None,\n",
" u'pub_date': u'2013-07-05T09:29:05Z',\n",
" u'section_name': u'Arts',\n",
" u'slideshow_credits': None,\n",
" u'snippet': u'Mark Forstater, a British film producer who was among those who made the 1975 comedy hit \\u201cMonty Python and the Holy Grail,\\u201d said there had been an agreement he would be \\u201ctreated as the seventh Python\\u201d when it came to income from merchandis...',\n",
" u'source': u'The New York Times',\n",
" u'subsection_name': None,\n",
" u'type_of_material': u'Blog',\n",
" u'web_url': u'http://artsbeat.blogs.nytimes.com/2013/07/05/monty-python-producer-wins-royalties-in-spamalot-lawsuit/',\n",
" u'word_count': u'298'},\n",
" {u'_id': u'4fd2a2d68eb7c8105d8887f8',\n",
" u'abstract': u'Actors and assortment of other people audition in New York for part in Gin and Tonic, movie based on life of Graham Chapman, member of Monty Python, who died in 1989; people of all sizes, descriptions and mental states are inteviewed by director, David Eric Brenner; photo (M)',\n",
" u'blog': [],\n",
" u'byline': {u'original': u'By ALAN FEUER',\n",
" u'person': [{u'firstname': u'Alan',\n",
" u'lastname': u'FEUER',\n",
" u'organization': u'',\n",
" u'rank': 1,\n",
" u'role': u'reported'}]},\n",
" u'document_type': u'article',\n",
" u'headline': {u'main': u\"Silly Walks? Dead Birds? Yes, It's 42nd St.\"},\n",
" u'keywords': [{u'name': u'creative_works',\n",
" u'value': u'GIN AND TONIC (MOVIE)'},\n",
" {u'name': u'persons', u'value': u'BRENNER, DAVID ERIC'},\n",
" {u'name': u'persons', u'value': u'CHAPMAN, GRAHAM'},\n",
" {u'name': u'organizations', u'value': u'MONTY PYTHON'},\n",
" {u'name': u'subject', u'value': u'MOTION PICTURES'}],\n",
" u'lead_paragraph': u'The fat guy with the pheasant on his shoulder went next. He was painted red from head to toe. The pheasant was dead. The director asked his name.',\n",
" u'multimedia': [],\n",
" u'news_desk': u'Metropolitan Desk',\n",
" u'print_page': u'39',\n",
" u'pub_date': u'2004-06-06T00:00:00Z',\n",
" u'section_name': u'Movies; New York and Region',\n",
" u'slideshow_credits': None,\n",
" u'snippet': u'The fat guy with the pheasant on his shoulder went next. He was painted red from head to toe. The pheasant was dead. The director asked his name....',\n",
" u'source': u'The New York Times',\n",
" u'subsection_name': None,\n",
" u'type_of_material': u'News',\n",
" u'web_url': u'http://www.nytimes.com/2004/06/06/nyregion/silly-walks-dead-birds-yes-it-s-42nd-st.html',\n",
" u'word_count': 529},\n",
" {u'_id': u'4fd3a1718eb7c8105d8e821b',\n",
" u'abstract': u'New comment tools are introduced on The New York Times Web site.',\n",
" u'blog': [],\n",
" u'byline': {u'original': u'By ANDREW C. REVKIN',\n",
" u'person': [{u'firstname': u'Andrew',\n",
" u'lastname': u'REVKIN',\n",
" u'middlename': u'C.',\n",
" u'organization': u'',\n",
" u'rank': 1,\n",
" u'role': u'reported'}]},\n",
" u'document_type': u'blogpost',\n",
" u'headline': {u'kicker': u'Dot Earth',\n",
" u'main': u'A Better Room for an Argument?'},\n",
" u'keywords': [{u'name': u'organizations',\n",
" u'rank': u'4',\n",
" u'value': u'Monty Python'},\n",
" {u'name': u'subject', u'rank': u'3', u'value': u'New York Times'},\n",
" {u'name': u'subject',\n",
" u'rank': u'2',\n",
" u'value': u'Blogs and Blogging (Internet)'},\n",
" {u'name': u'type_of_material', u'rank': u'1', u'value': u'News'}],\n",
" u'lead_paragraph': u'[Dec. 1, 11:42 a.m. | Updated The system is experiencing growing pains today, so thanks for patience on comment moderation.]',\n",
" u'multimedia': [],\n",
" u'news_desk': None,\n",
" u'print_page': None,\n",
" u'pub_date': u'2011-11-30T22:22:19Z',\n",
" u'section_name': u'Opinion',\n",
" u'slideshow_credits': None,\n",
" u'snippet': u'New comment tools are introduced on The New York Times Web site....',\n",
" u'source': u'The New York Times',\n",
" u'subsection_name': None,\n",
" u'type_of_material': u'Blog',\n",
" u'web_url': u'http://dotearth.blogs.nytimes.com/2011/11/30/a-new-comment-process-on-new-york-times-blogs/',\n",
" u'word_count': 258},\n",
" {u'_id': u'5449391438f0d875ddacb400',\n",
" u'abstract': u'Before \\u201cMonty Python\\u2019s Flying Circus,\\u201d the comedians starred in \\u201cAt Last the 1948 Show,\\u201d a BBC series whose episodes were thought to be lost.',\n",
" u'blog': [],\n",
" u'byline': {u'original': u'By ALLAN KOZINN',\n",
" u'person': [{u'firstname': u'Allan',\n",
" u'lastname': u'KOZINN',\n",
" u'organization': u'',\n",
" u'rank': 1,\n",
" u'role': u'reported'}]},\n",
" u'document_type': u'blogpost',\n",
" u'headline': {u'kicker': u'ArtsBeat',\n",
" u'main': u'Newly Found Tapes Capture John Cleese and Graham Chapman, Pre-Python'},\n",
" u'keywords': [{u'name': u'persons',\n",
" u'rank': u'1',\n",
" u'value': u'Chapman, Graham'},\n",
" {u'name': u'persons', u'rank': u'2', u'value': u'Cleese, John'},\n",
" {u'name': u'organizations', u'rank': u'1', u'value': u'Monty Python'},\n",
" {u'name': u'subject', u'rank': u'1', u'value': u'Comedy and Humor'}],\n",
" u'lead_paragraph': None,\n",
" u'multimedia': [],\n",
" u'news_desk': u'Culture',\n",
" u'print_page': None,\n",
" u'pub_date': u'2014-10-23T13:20:03Z',\n",
" u'section_name': u'Arts',\n",
" u'slideshow_credits': None,\n",
" u'snippet': u'Before \\u201cMonty Python\\u2019s Flying Circus,\\u201d the comedians starred in \\u201cAt Last the 1948 Show,\\u201d a BBC series whose episodes were thought to be lost....',\n",
" u'source': u'The New York Times',\n",
" u'subsection_name': None,\n",
" u'type_of_material': u'Blog',\n",
" u'web_url': u'http://artsbeat.blogs.nytimes.com/2014/10/23/newly-found-tapes-capture-john-cleese-and-graham-chapman-pre-python/',\n",
" u'word_count': u'457'},\n",
" {u'_id': u'4fd25c858eb7c8105d80a6a1',\n",
" u'abstract': None,\n",
" u'blog': [],\n",
" u'byline': {u'original': u'By CHARLES McGRATH',\n",
" u'person': [{u'firstname': u'Charles',\n",
" u'lastname': u'McGRATH',\n",
" u'organization': u'',\n",
" u'rank': 1,\n",
" u'role': u'reported'}]},\n",
" u'document_type': u'article',\n",
" u'headline': {u'kicker': u'FILM',\n",
" u'main': u\"Terry Gilliam's Feel-Good Endings\"},\n",
" u'keywords': [{u'name': u'persons', u'value': u'GILLIAM, TERRY'},\n",
" {u'name': u'organizations', u'value': u'MONTY PYTHON'},\n",
" {u'name': u'subject', u'value': u'MOTION PICTURES'}],\n",
" u'lead_paragraph': u\"TERRY GILLIAM filmed his newest movie, ''Tideland,'' in Saskatchewan last fall, racing to complete the location shots before winter set in. The Mitch Cullin novel on which the film is based is mostly set in West Texas, but Mr. Gilliam had substituted the Canadian prairie instead. The evening after he wrapped, it started to snow, and the cast, crew and director all saw this as an omen. If this had been a Gilliam production beset by the kind of bad fortune that has sometimes clung to his movies, the snow would have blown in much sooner, five or six feet of it; the cameras would have frozen, turning the fingers of the cinematographer, Nicola Pecorini, black with frostbite; the stars would have been evacuated by chopper -- or rather, the budget having most likely run out, by dog sled.\",\n",
" u'multimedia': [{u'height': 325,\n",
" u'legacy': {u'MultimediaPopupHeight2': u'325',\n",
" u'MultimediaPopupWidth2': u'250',\n",
" u'MultimediaType2': u'Image',\n",
" u'MultimediaUrl2': u'imagepages/2005/08/14/arts/14mcgr.2.ready.html'},\n",
" u'rank': u'2',\n",
" u'type': u'Image',\n",
" u'url': u'imagepages/2005/08/14/arts/14mcgr.2.ready.html',\n",
" u'width': 250},\n",
" {u'height': 325,\n",
" u'legacy': {u'MultimediaPopupHeight1': u'325',\n",
" u'MultimediaPopupWidth1': u'250',\n",
" u'MultimediaType1': u'Image',\n",
" u'MultimediaUrl1': u'imagepages/2005/08/14/arts/14mcgr.ready.html'},\n",
" u'rank': u'1',\n",
" u'type': u'Image',\n",
" u'url': u'imagepages/2005/08/14/arts/14mcgr.ready.html',\n",
" u'width': 250},\n",
" {u'height': 75,\n",
" u'legacy': {u'hasthumbnail': u'Y',\n",
" u'thumbnail': u'images/2005/08/14/arts/14mcgr.1.75.jpg',\n",
" u'thumbnailheight': 75},\n",
" u'subtype': u'thumbnail',\n",
" u'type': u'image',\n",
" u'url': u'images/2005/08/14/arts/14mcgr.1.75.jpg'}],\n",
" u'news_desk': u'Arts and Leisure Desk',\n",
" u'print_page': u'1',\n",
" u'pub_date': u'2005-08-14T00:00:00Z',\n",
" u'section_name': u'Movies; Arts',\n",
" u'slideshow_credits': None,\n",
" u'snippet': u'The director Terry Gilliam has acquired a reputation for being both a visionary and \\x97 in Hollywood it amounts to the same thing \\x97 a bit of a madman....',\n",
" u'source': u'The New York Times',\n",
" u'subsection_name': None,\n",
" u'type_of_material': u'News',\n",
" u'web_url': u'http://www.nytimes.com/2005/08/14/movies/14mcgr.html',\n",
" u'word_count': 2075},\n",
" {u'_id': u'4fd21c9d8eb7c8105d79cf48',\n",
" u'abstract': u'Kevin Filipski article on the Monty Python films, television skits and shows now available on digital video discs, many packaged with extras; photo (M)',\n",
" u'blog': [],\n",
" u'byline': {u'original': u'By KEVIN FILIPSKI',\n",
" u'person': [{u'firstname': u'Kevin',\n",
" u'lastname': u'FILIPSKI',\n",
" u'organization': u'',\n",
" u'rank': 1,\n",
" u'role': u'reported'}]},\n",
" u'document_type': u'article',\n",
" u'headline': {u'main': u'Film; For Monty Python Fans, A Completely Digital Feast'},\n",
" u'keywords': [{u'name': u'organizations', u'value': u'MONTY PYTHON'},\n",
" {u'name': u'subject', u'value': u'RECORDINGS (VIDEO)'},\n",
" {u'name': u'subject', u'value': u'MOTION PICTURES'},\n",
" {u'name': u'subject', u'value': u'TELEVISION'}],\n",
" u'lead_paragraph': u\"AFTER 30 years, the five British boys and one American known as Monty Python have graduated from cult to media institution, now that all of the group's feature films -- save the filmed concert ''Monty Python Live at the Hollywood Bowl'' -- and the first two seasons of its television series, ''Monty Python's Flying Circus,'' have been released on digital video disc. If there is irony in the fact that such anarchic comedy has been forever preserved in the antiseptic digital medium, don't tell that to legions of understandably ecstatic Python fanatics. Of all the Python releases, the pinnacle is the eight DVDs -- in four volumes of two discs each, with more to come from further seasons -- that make up the ''Flying Circus'' set. (A&E Home Video, $44.95 a set). Each disc contains three or four episodes, and it is quite easy to skip right to one's favorite skits, since the best are instantly recognizable to any true Python fan: ''Nudge, Nudge,'' ''Lumberjack Song,'' ''The Ministry of Silly Walks,'' ''Dirty Hungarian Phrasebook'' and ''Spam,'' for starters.\",\n",
" u'multimedia': [],\n",
" u'news_desk': u'Arts and Leisure Desk',\n",
" u'print_page': u'30',\n",
" u'pub_date': u'2000-01-23T00:00:00Z',\n",
" u'section_name': u'Movies; Arts',\n",
" u'slideshow_credits': None,\n",
" u'snippet': u\"AFTER 30 years, the five British boys and one American known as Monty Python have graduated from cult to media institution, now that all of the group's feature films -- save the filmed concert ''Monty Python Live at the Hollywood Bowl'' -- and the...\",\n",
" u'source': u'The New York Times',\n",
" u'subsection_name': None,\n",
" u'type_of_material': u'News',\n",
" u'web_url': u'http://www.nytimes.com/2000/01/23/movies/film-for-monty-python-fans-a-completely-digital-feast.html',\n",
" u'word_count': 691},\n",
" {u'_id': u'55030af338f0d835da194dd6',\n",
" u'abstract': u'The comedians will be on hand for a 40th anniversary showing of \\u201cMonty Python and the Holy Grail.\\u201d',\n",
" u'blog': [],\n",
" u'byline': {u'original': u'By STEPHANIE GOODMAN',\n",
" u'person': [{u'firstname': u'Stephanie',\n",
" u'lastname': u'GOODMAN',\n",
" u'organization': u'',\n",
" u'rank': 1,\n",
" u'role': u'reported'}]},\n",
" u'document_type': u'blogpost',\n",
" u'headline': {u'kicker': u'ArtsBeat',\n",
" u'main': u'Monty Python Mini-Festival at Tribeca',\n",
" u'print_headline': u'Monty Python Reunion At Tribeca Film Festival'},\n",
" u'keywords': [{u'name': u'persons',\n",
" u'rank': u'1',\n",
" u'value': u'Cleese, John'},\n",
" {u'name': u'persons', u'rank': u'2', u'value': u'Gilliam, Terry'},\n",
" {u'name': u'persons', u'rank': u'3', u'value': u'Idle, Eric'},\n",
" {u'name': u'persons', u'rank': u'4', u'value': u'Jones, Terry'},\n",
" {u'name': u'persons', u'rank': u'5', u'value': u'Palin, Michael'},\n",
" {u'name': u'organizations', u'rank': u'1', u'value': u'Monty Python'},\n",
" {u'name': u'subject', u'rank': u'1', u'value': u'Comedy and Humor'},\n",
" {u'name': u'subject',\n",
" u'rank': u'2',\n",
" u'value': u'Tribeca Film Festival (NYC)'}],\n",
" u'lead_paragraph': None,\n",
" u'multimedia': [],\n",
" u'news_desk': u'Culture',\n",
" u'print_page': None,\n",
" u'pub_date': u'2015-03-11T13:08:13Z',\n",
" u'section_name': u'Arts',\n",
" u'slideshow_credits': None,\n",
" u'snippet': u'The comedians will be on hand for a 40th anniversary showing of \\u201cMonty Python and the Holy Grail.\\u201d...',\n",
" u'source': u'The New York Times',\n",
" u'subsection_name': None,\n",
" u'type_of_material': u'Blog',\n",
" u'web_url': u'http://artsbeat.blogs.nytimes.com/2015/03/11/monty-python-mini-festival-at-tribeca/',\n",
" u'word_count': u'207'}],\n",
" u'meta': {u'hits': 4228, u'offset': 0, u'time': 7}},\n",
" u'status': u'OK'}"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"response_dict"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Okay, this is still a little bit unreadable. I'm going to do my best to parse it out.\n",
"\n",
"> NOTE: nearly every API has its own idiosyncratic way of structuring its responses. Part of the point of API documentation is to let programmers know how the response is structured and what the response means. There's no substitute for studying the API documentation, but with a bit of practice, you can usually heuristically pinpoint which parts of a response are interesting based merely on what's visible in the response. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Python data structure here is a dictionary. (Again, ignore the funky `u` in front of all of the strings for now.) We know that from looking at it, but also because of what happens when we evaluate this expression:"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<type 'dict'>\n"
]
}
],
"source": [
"print type(response_dict)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The dictionary appears to have a number of keys. Here are my guesses about what their values must mean, based on my experience and guesses:\n",
"\n",
"* The `response` key seems to be where the results are. That key's value is in turn another dictionary.\n",
"* The `docs` key inside of the `response` key appears to be a *list of dictionaries*. Whenever you see a list of dictionaries, chances are you've found the part of the response that has the meatiest data.\n",
"* Each of the dictionaries in that list have a number of keys---e.g., `snippet`, `headline`---etc. These are the fields with the data we actually want!\n",
"* There's also a `meta` key in the dictionary, which points to another dictionary that has the number of \"hits,\" i.e., the number of documents returned in the search.\n",
"\n",
"Based on this analysis, I'm going to hone in on the value for the `response` key as something to play with. Let's confirm our suspicion that the value for this key is a list:"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<type 'dict'>\n"
]
}
],
"source": [
"print type(response_dict['response'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Okay, so what's the value for the `docs` key of that dictionary?"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<type 'list'>\n"
]
}
],
"source": [
"print type(response_dict['response']['docs'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"...and what's the value for the first item in that list?"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<type 'dict'>\n"
]
}
],
"source": [
"print type(response_dict['response']['docs'][0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A dictionary! So we have a list of dictionaries. I'm going to assign that list to a separate variable so it's a bit easier to work with."
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"articles = response_dict['response']['docs']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's take a look at what's actually in this list."
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{u'_id': u'54a44b1338f0d83a07dc39d5',\n",
" u'abstract': None,\n",
" u'blog': [],\n",
" u'byline': {u'original': u'Twentieth Century Fox Home Entertainment',\n",
" u'person': [{u'firstname': u'Twentieth',\n",
" u'lastname': u'Fox',\n",
" u'middlename': u'Century',\n",
" u'organization': u'',\n",
" u'rank': 1,\n",
" u'role': u'reported'}]},\n",
" u'document_type': u'multimedia',\n",
" u'headline': {u'main': u'Python'},\n",
" u'keywords': [],\n",
" u'lead_paragraph': u'',\n",
" u'multimedia': [{u'height': 126,\n",
" u'legacy': {u'wide': u'images/2014/05/12/movies/video-python/video-python-thumbWide.jpg',\n",
" u'wideheight': u'126',\n",
" u'widewidth': u'190'},\n",
" u'subtype': u'wide',\n",
" u'type': u'image',\n",
" u'url': u'images/2014/05/12/movies/video-python/video-python-thumbWide.jpg',\n",
" u'width': 190},\n",
" {u'height': 338,\n",
" u'legacy': {u'xlarge': u'images/2014/05/12/movies/video-python/video-python-articleLarge.jpg',\n",
" u'xlargeheight': u'338',\n",
" u'xlargewidth': u'600'},\n",
" u'subtype': u'xlarge',\n",
" u'type': u'image',\n",
" u'url': u'images/2014/05/12/movies/video-python/video-python-articleLarge.jpg',\n",
" u'width': 600},\n",
" {u'height': 75,\n",
" u'legacy': {u'thumbnail': u'images/2014/05/12/movies/video-python/video-python-thumbStandard.jpg',\n",
" u'thumbnailheight': u'75',\n",
" u'thumbnailwidth': u'75'},\n",
" u'subtype': u'thumbnail',\n",
" u'type': u'image',\n",
" u'url': u'images/2014/05/12/movies/video-python/video-python-thumbStandard.jpg',\n",
" u'width': 75}],\n",
" u'news_desk': u'Movies',\n",
" u'print_page': None,\n",
" u'pub_date': u'2014-12-31T14:14:19Z',\n",
" u'section_name': u'Movies',\n",
" u'slideshow_credits': None,\n",
" u'snippet': u'',\n",
" u'source': u'Internet Video Archive',\n",
" u'subsection_name': None,\n",
" u'type_of_material': u'Video',\n",
" u'web_url': u'http://www.nytimes.com/video/movies/100000003366293/python.html',\n",
" u'word_count': u'0'}"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"articles[2]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This dictionary is easier to decipher. It's easy to imagine that `snippet` value contains a snippet of the body text of the article, `pub_date` has the date that the article was published, `headline` is a dictionary with various parts of the headline, and `url` is a URL pointing to the article itself on nytimes.com.\n",
"\n",
"Now that we know we have a list, we can do some list-like things with it. We can, for example, see how many articles were returned in the response:"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"10\n"
]
}
],
"source": [
"print len(articles)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"How about writing a list comprehension to make a list of all of the titles of the articles that were returned?"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[u'TOOTHSOME PYHON EGGS.',\n",
" u\"11-Foot Python Slithers Into South Florida Student's Car\",\n",
" u'Python',\n",
" u\"Monty Python Producer Wins Royalties in 'Spamalot' Lawsuit\",\n",
" u\"Silly Walks? Dead Birds? Yes, It's 42nd St.\",\n",
" u'A Better Room for an Argument?',\n",
" u'Newly Found Tapes Capture John Cleese and Graham Chapman, Pre-Python',\n",
" u\"Terry Gilliam's Feel-Good Endings\",\n",
" u'Film; For Monty Python Fans, A Completely Digital Feast',\n",
" u'Monty Python Mini-Festival at Tribeca']"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[article['headline']['main'] for article in articles]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or the first twenty characters of each article's body summary:"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[u\"Python's Eggs, a Mea\",\n",
" u'A South Florida coll',\n",
" u'',\n",
" u'Mark Forstater, a Br',\n",
" u'The fat guy with the',\n",
" u'New comment tools ar',\n",
" u'Before \\u201cMonty Python',\n",
" u'The director Terry G',\n",
" u'AFTER 30 years, the ',\n",
" u'The comedians will b']"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[article['snippet'][:20] for article in articles]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Full example: counting search results\n",
"\n",
"Note: In this example we're using a `for` loop, which hasn't been discussed previously in these notes, but has been covered in Foundations. Review your notes from that class if you need a refresher.\n",
"\n",
"Suppose we've set ourselves to the task of determining which fruit is the most popular fruit, based on how many times the name of the fruit has occurred in the New York Times. (This is also my version of JOURNALISM.) In order to do this, we're going to go through the process of taking a list like this:"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"topics = [\"apple\", \"banana\", \"cherry\", \"coconut\", \"durian\", \"lemon\", \"mango\", \"orange\", \"peach\", \"pear\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"... and turn it into a dictionary that has these strings as its keys, and values for each of these keys corresponding to the number of articles found in the New York Times Article Search API for that string. We're trying, in other words, to get a data structure that looks like this:\n",
"\n",
" {'apple': 4172,\n",
" 'banana': 19734,\n",
" 'cherry': 73358,\n",
" 'coconut': 37516,\n",
" 'durian': 96198,\n",
" 'lemon': 9808,\n",
" 'mango': 43265,\n",
" 'orange': 82419,\n",
" 'peach': 25389,\n",
" 'pear': 31081}\n",
" \n",
"(We'll probably get different numbers, though.)\n",
"\n",
"Our methodology:\n",
"\n",
"* Create an empty dictionary.\n",
"* Loop over each of the strings in list `topics`.\n",
"* Inside the loop: create a query string; perform an API request; store the value of the `total` key in the response dictionary as the value for the topic key (i.e., `apple`, `banana`, `cherry`, etc.).\n",
"* Display the dictionary that we made.\n",
"\n",
"Here's the code! Note: it may take some time for this code to complete."
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"apple\n",
"banana\n",
"cherry\n",
"coconut\n",
"durian\n",
"lemon\n",
"mango\n",
"orange\n",
"peach\n",
"pear\n",
"{'coconut': 13284, 'lemon': 47812, 'apple': 99525, 'peach': 20628, 'cherry': 65059, 'pear': 41795, 'banana': 19809, 'mango': 6458, 'orange': 274724, 'durian': 232}\n"
]
}
],
"source": [
"import urllib\n",
"import json\n",
"import time\n",
"\n",
"topics = [\"apple\", \"banana\", \"cherry\", \"coconut\", \"durian\", \"lemon\", \"mango\", \"orange\", \"peach\", \"pear\"]\n",
"topic_count = {}\n",
"for topic in topics:\n",
" print topic\n",
" query_str = urllib.urlencode({'q': topic, 'api-key': api_key})\n",
" request_url = \"http://api.nytimes.com/svc/search/v2/articlesearch.json?\" + query_str\n",
" response_str = urllib.urlopen(request_url).read()\n",
" response_dict = json.loads(response_str)\n",
" topic_count[topic] = response_dict['response']['meta']['hits']\n",
" time.sleep(0.5)\n",
" \n",
"print topic_count"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Wait, what're these `import time` and `time.sleep(0.5)` things here? I'll tell you. `time.sleep(0.5)` lines tell Python to wait `0.5` seconds before proceeding. This is common practice when you're making multiple requests to an API in quick succession, to avoid overwhelming the server or going over your rate limit.\n",
"\n",
"> BONUS CHALLENGE: Write an expression that prints out the total number of articles for all topics searched for in this program."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion\n",
"\n",
"In this session, you've learned how to use the dictionary, a very powerful and ubiquitous data structure. You learned the basics of how to access a web API, and how to convert the \"raw data\" returned from such an API (in JSON format) to an actual Python data structure. You learned how to make URL query strings \"on the fly\" by constructing dictionaries with the desired keys and values. Finally, you put it all together and made a short program that does multiple API queries, combining the results from those queries into a dictionary. What will you accomplish next?!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment