Skip to content

Instantly share code, notes, and snippets.

@Denilson-Semedo
Created June 8, 2023 01:22
Show Gist options
  • Save Denilson-Semedo/40465eada1ada24c11e3ca708fbe4c0d to your computer and use it in GitHub Desktop.
Save Denilson-Semedo/40465eada1ada24c11e3ca708fbe4c0d to your computer and use it in GitHub Desktop.
urllib

urllib

urllib is a Python standard library that provides a set of modules for handling URLs and performing various tasks related to network communication. It allows you to open URLs, retrieve data from web servers, parse URLs, encode and decode URL parameters, handle errors, manage cookies, and more.

The main module in urllib is urllib.request, which provides functions and classes for making HTTP requests, interacting with web servers, and handling responses. With urllib.request.urlopen(), you can open a URL and obtain a file-like object to read the response. This method supports both GET and POST requests, allowing you to send data to the server if needed.

urllib - Python Standard Library Documentation

Function/Method Description Example
urllib.request.urlopen(url[, data][, timeout]) Opens the URL specified by url and returns a file-like object. Optionally, data can be passed for POST requests, and timeout sets the timeout value in seconds. response = urllib.request.urlopen("https://www.example.com")
urllib.request.urlretrieve(url[, filename[, reporthook[, data]]]) Retrieves the URL specified by url and saves it to the file filename. reporthook can be used for progress monitoring, and data can be used for POST requests. urllib.request.urlretrieve("https://www.example.com/image.jpg", "image.jpg")
urllib.parse.urlencode(query[, doseq]) Encodes a dictionary query into a URL-encoded string. doseq indicates whether to sequence values in lists/tuples. params = {'key': 'value', 'foo': 'bar'}
encoded_params = urllib.parse.urlencode(params)
urllib.parse.urljoin(base, url[, allow_fragments]) Constructs a URL by joining a base URL with a relative URL. allow_fragments specifies whether to ignore fragments. full_url = urllib.parse.urljoin("https://www.example.com/base/", "page.html")
urllib.error.URLError Exception class raised for URL-related errors. It is a subclass of OSError. try:
    response = urllib.request.urlopen("https://www.example.com")
except urllib.error.URLError as e:
    print("Error:", e.reason)
urllib.robotparser.RobotFileParser Parses and stores a robots.txt file. It provides methods to check if a given user agent can access a specific URL. rp = urllib.robotparser.RobotFileParser()
rp.set_url("https://www.example.com/robots.txt")
rp.read()
allowed = rp.can_fetch("mybot", "https://www.example.com/page.html")
urllib.request.ProxyHandler Handler class for using a proxy server. It can be used with urllib.request.build_opener() to create an opener object. proxy_handler = urllib.request.ProxyHandler({'http': 'http://proxy.example.com:8080'})
urllib.request.HTTPBasicAuthHandler Handler class for performing HTTP Basic authentication. It adds the necessary headers for authentication. password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(None, "https://www.example.com", "username", "password")
auth_handler = urllib.request.HTTPBasicAuthHandler(password_mgr)
urllib.request.HTTPCookieProcessor Handler class for managing cookies. It automatically handles sending and receiving cookies. cookie_handler = urllib.request.HTTPCookieProcessor()
urllib.request.build_opener([handler, ...]) Constructs an opener object with the specified handlers. Handlers are used to customize the request and response handling. opener = urllib.request.build_opener(proxy_handler, auth_handler, cookie_handler)
urllib.request.install_opener(opener) Installs the opener object as the default opener. Subsequent requests made with urllib.request.urlopen() will use this opener.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment