urllib is a Python standard library that provides a set of modules for handling URLs and performing various tasks related to network communication. It allows you to open URLs, retrieve data from web servers, parse URLs, encode and decode URL parameters, handle errors, manage cookies, and more.
The main module in urllib is urllib.request, which provides functions and classes for making HTTP requests, interacting with web servers, and handling responses. With urllib.request.urlopen(), you can open a URL and obtain a file-like object to read the response. This method supports both GET and POST requests, allowing you to send data to the server if needed.
urllib - Python Standard Library Documentation
Function/Method | Description | Example |
---|---|---|
urllib.request.urlopen(url[, data][, timeout]) |
Opens the URL specified by url and returns a file-like object. Optionally, data can be passed for POST requests, and timeout sets the timeout value in seconds. |
response = urllib.request.urlopen("https://www.example.com") |
urllib.request.urlretrieve(url[, filename[, reporthook[, data]]]) |
Retrieves the URL specified by url and saves it to the file filename . reporthook can be used for progress monitoring, and data can be used for POST requests. |
urllib.request.urlretrieve("https://www.example.com/image.jpg", "image.jpg") |
urllib.parse.urlencode(query[, doseq]) |
Encodes a dictionary query into a URL-encoded string. doseq indicates whether to sequence values in lists/tuples. |
params = {'key': 'value', 'foo': 'bar'} encoded_params = urllib.parse.urlencode(params) |
urllib.parse.urljoin(base, url[, allow_fragments]) |
Constructs a URL by joining a base URL with a relative URL. allow_fragments specifies whether to ignore fragments. |
full_url = urllib.parse.urljoin("https://www.example.com/base/", "page.html") |
urllib.error.URLError |
Exception class raised for URL-related errors. It is a subclass of OSError . |
try: response = urllib.request.urlopen("https://www.example.com") except urllib.error.URLError as e: print("Error:", e.reason) |
urllib.robotparser.RobotFileParser |
Parses and stores a robots.txt file. It provides methods to check if a given user agent can access a specific URL. |
rp = urllib.robotparser.RobotFileParser() rp.set_url("https://www.example.com/robots.txt") rp.read() allowed = rp.can_fetch("mybot", "https://www.example.com/page.html") |
urllib.request.ProxyHandler |
Handler class for using a proxy server. It can be used with urllib.request.build_opener() to create an opener object. |
proxy_handler = urllib.request.ProxyHandler({'http': 'http://proxy.example.com:8080'}) |
urllib.request.HTTPBasicAuthHandler |
Handler class for performing HTTP Basic authentication. It adds the necessary headers for authentication. | password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm() password_mgr.add_password(None, "https://www.example.com", "username", "password") auth_handler = urllib.request.HTTPBasicAuthHandler(password_mgr) |
urllib.request.HTTPCookieProcessor |
Handler class for managing cookies. It automatically handles sending and receiving cookies. | cookie_handler = urllib.request.HTTPCookieProcessor() |
urllib.request.build_opener([handler, ...]) |
Constructs an opener object with the specified handlers. Handlers are used to customize the request and response handling. | opener = urllib.request.build_opener(proxy_handler, auth_handler, cookie_handler) |
urllib.request.install_opener(opener) |
Installs the opener object as the default opener. Subsequent requests made with urllib.request.urlopen() will use this opener. |