public
Last active

urllib2 vs requests

  • Download Gist
0_urllib2.py
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#!/usr/bin/env python
# -*- coding: utf-8 -*-
 
import urllib2
 
gh_url = 'https://api.github.com'
 
req = urllib2.Request(gh_url)
 
password_manager = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_manager.add_password(None, gh_url, 'user', 'pass')
 
auth_manager = urllib2.HTTPBasicAuthHandler(password_manager)
opener = urllib2.build_opener(auth_manager)
 
urllib2.install_opener(opener)
 
handler = urllib2.urlopen(req)
 
print handler.getcode()
print handler.headers.getheader('content-type')
 
# ------
# 200
# 'application/json'
1_requests.py
Python
1 2 3 4 5 6 7 8 9 10 11 12 13
#!/usr/bin/env python
# -*- coding: utf-8 -*-
 
import requests
 
r = requests.get('https://api.github.com', auth=('user', 'pass'))
 
print r.status_code
print r.headers['content-type']
 
# ------
# 200
# 'application/json'

looks short as shit. now do py3

And you could use httplib2 which has python 3 support

import httplib2

h = httplib2.Http(".cache")
h.add_credentials('user', 'pass')
r, content = h.request("https://api.github.com", "GET")

print r['status']
print r['content-type']

Python3 support is planned.

I'm rather certain that httplib2 doesn't support multipart file uploads, and requires you to urlencode your POST data yourself. It doesn't support parameters at all. Or Unicode. Or PUT Requests.

Don't get me wrong, httplib2 is awesome. It does have advanced caching support, which is extremely useful sometimes. I don't want to cache. I want to make requests.

Most importantly, it also requires more verbosity, which is the whole reason requests exists. Why do I have to create an Http object? When will I ever not use Http???

This isn't exactly fair -- you're using temporary variables (gh_user, gh_pass) in the urllib2 version but not in the requests version.

Also, your one variable in the requests version is a single character, compared to (for example) password_manager.

+1 about the temporary variables.

it's still massively shorter than httplib2, no matter how you name it

This example isn't a SLOC competition. It's intended to show the best possible way to make the same request with both libraries.

@justquick you meant massively shorter than urllib2, because it is the same size for httplib2

I don't think that it's misleading, but I've changed the urllib2 example anyway.

Why not a bit shorter?

import urllib2

gh_url = 'https://github.com'

auth_handler = urllib2.HTTPBasicAuthHandler()
auth_handler.add_password(None, gh_url, 'user', 'passwd')

opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)
handler = urllib2.urlopen(gh_url)

print handler.getcode()
print handler.headers.getheader('content-type')

Damn. Requests has a way better API design. I love that .get() is a method, auth is an optional parameter, and status is a field, not a method. So many little design decisions done right. Can we please get this into the standard/default python distribution? Thankssssss!

Seriously, I always put off doing HTTP requests in python because the API is such a pain in the ass. I end up running wget at the command line and copying and pasting html into files and loading the files in python because open() is an easier API. But with request I'd just do it all in python! What a freaking improvement!

In httplb2 if you reuse the same http object it will reuse the connection so you don't have the overhead of building up and tearing down connections. How do you reuse a connection with Request?

@espeed: requests doesn't have keep-alive support at this time, but it should be added in soon.

@kennethreitz I like your example, thanks!

I think that the urllib2 example is a straw man. Here is a code, that I use in my code to access github using urllib2:

import urllib2
from base64 import encodestring

request = urllib2.Request('https://api.github.com/user')
base64string = encodestring('%s:%s' % ('user', 'pass')).replace('\n', '')
request.add_header('Authorization', 'Basic %s' % base64string)
r = urllib2.urlopen(request)

print r.getcode()
print r.headers["content-type"]
print r.headers["X-RateLimit-Limit"]

Here is the same code using requests:

import requests

r = requests.get('https://api.github.com/user', auth=('user', 'pass'))

print r.status_code
print r.headers['content-type']
print r.headers['X-RateLimit-Limit']

Both print (make sure you change your username and password):

200
application/json
5000

While the requests code is much simpler, the urllib2 code is much better than your original example: you just need to specify the url once (not twice), as well as you access the headers in the same way as in requests. And it's 4 lines (to open the url), not 8 lines as in your original example. So one should be fair.

And it looks like that you can simplify it even further:

import urllib2
from base64 import b64encode

request = urllib2.Request('https://api.github.com/user')
request.add_header('Authorization', 'Basic ' + b64encode('user' + ':' + 'pass'))
r = urllib2.urlopen(request)

print r.getcode()
print r.headers["content-type"]
print r.headers["X-RateLimit-Limit"]

So it's only 3 lines and it's compatible with the usual urllib2 call. So while the API of urllib2 really isn't good, it isn't nearly as bad as your original example. So I think that you should use this 3 lines version instead to be fair.

EDIT: Make sure you use the b64encode function above. Then you don't need to remove the trailing '\n' from the string.

EDIT 2: Simplified a little more. I think it's as simple as it can get now.

@certik Very clever!
But for the purpose of this post, don't you think we should be comparing the common case? Even though you can make the call shorter by skipping use of the urllib password and auth manager and just adding a header manually with a base64 encode... you're basically NOT using the urllib api anymore.

So this isn't so much a comparison of urllib to requests, as a comparison of clever raw header hacking + urllib to requests.

@toomim: I think you are right. But I think that the original example installs some stuff into urllib2 with my passwords (!?), which I really don't feel comfortable with. So I would say that urllib2 is missing this functionality completely, and one has to simply include the header by hand. But as you can see, it's really simple (in this case).

How do you post xml data? I used session.post(url, data=xml_string), but that didn't work.

httplib2 does indeed support PUT requests.

This is very nice indeed. It can be very useful for small scripts that need "advanced" features of urllib2 (can someone please stop with the libname2 stuff already?)

Does it handle all authentication types? What about NTLM?

@certik: it doesn't actually "install" anything - install_opener is just a method to add an opener callback to the pre-request stack of urllib2:

http://docs.python.org/library/urllib2.html#urllib2.install_opener

Ambiguously named things like that are another reason to use Requests!

@bradleywright: It does install the opener for all future requests going through urllib2, which is bad idea for authentication.
The no-side-effects (also shorter!) way is:

handler = opener.open(req)

Anyway, Requests API is much cleaner.

@kennethreitz Hi Kenneth - Did requests get keepalive (https://gist.github.com/973705#gistcomment-45426)?

@kennethreitz Cool. Then I'm going to look at switching Bulbs (https://github.com/espeed/bulbs) over to it, which currently uses httplib2. Let me run some performance tests and I'll let you know!

Another reason to use Requests: this urllib2 Gist will only work if the host returns HTTP error code 401 and a "WWW-Authenticate" header. Of course the base64 encoded method also works.

Congratulations with keep-alive support. Kind of "must have" feature. Could you elaborate on usage of it in docs? Unfortunately in documentation i see only how to disable keep alive. Do you need to create session object like Http object in httplib2?

Well done. I don't agree with the httplib2 argument, I think they just don't get the point. I do, it's lazier to use Requests.

Regardless of how excessive or standard the urllib2 example is, API usage and flow is everything. Do you want to spend brain cycles on deciphering your libraries or spend them on solving application problems?

Requests does not require as many brain cycles.

The problem with urllib2 is not that it's impossible to write a short script to do something, it's that it's impossible to do so from memory.

Having to google and copy-paste-modify an example every time I want to do something more complicated than a GET request with no parameters or authentication or cookies is not acceptable.

I hate external dependencies but for a lib like requests...i just cant even remotely bring myself to hate it...not even a little bit

You won me over with r.encoding and offering a unicode object. Handling that in urllib2 is a pain in the ass -- you have to regex out the response code from the headers and use that to encode the response to unicode.

thank you.

Oh how we all love requests!

Move over guys, I need room in this circlejerk.

you can sit next to me mr garamonde

@Quest79: Nice Cyan!

I am a freshman, at the first look, I like the stytle of requests..
ps: this is my first time to read python opensource project

One thing that everone misses from above arguments is that urllib2 does not handle SSL connections in a sane way at all. This makes https pointless. requests make this trivially easy and robust. I used it to create a custom transport for SUDS so I could do https, because SUDS uses urllib2. My custom SUDS transport is 1 file (170 lines). The standard SUDS transport is 472 lines and takes a long time to get your head around in addition to it's other deficiencies.

requests++

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.