April 17, 2023 at 9:03:59 AM
Prompt:
Write a Python script with no extra dependencies which can take a list of URLs and use a HEAD request to find the size of each one and then add those all up
ChatGPT:
You can achieve this using the urllib.request
module from the Python standard library. The urllib.request
module provides a function called urlopen
that can be used to open URLs. You can use this function to send a HEAD request to each URL in the list and retrieve the Content-Length
header to determine the size of each URL. The following Python script demonstrates how to do this:
import urllib.request
def get_total_size(urls):
total_size = 0
for url in urls:
try:
# Send a HEAD request to the URL
req = urllib.request.Request(url, method='HEAD')
with urllib.request.urlopen(req) as response:
# Get the 'Content-Length' header value
content_length = response.getheader('Content-Length')
if content_length is not None:
# Convert the content length to an integer and add it to the total size
total_size += int(content_length)
except Exception as e:
print(f"Error while processing URL '{url}': {e}")
return total_size
# Example usage
urls = [
'https://www.example.com/file1.txt',
'https://www.example.com/file2.txt',
# Add more URLs here
]
total_size = get_total_size(urls)
print(f"Total size of all URLs: {total_size} bytes")
In this script, the get_total_size
function takes a list of URLs as input and calculates the total size of all the URLs by sending a HEAD request to each URL and retrieving the Content-Length
header value. The total size is then returned by the function.
Note that this script does not handle all possible edge cases and error conditions. For example, some servers may not return the Content-Length
header, in which case the script will not be able to determine the size of the URL. Additionally, the script does not handle redirects or other HTTP response codes. Depending on your use case, you may need to add additional error handling and functionality to the script.
Prompt:
Send a Firefox user agent
ChatGPT:
To send a custom User-Agent header in your request, you can modify the Request
object to include the desired User-Agent string. Below is the updated script that sends a HEAD request with a Firefox User-Agent header:
import urllib.request
def get_total_size(urls):
total_size = 0
# Define a Firefox User-Agent string
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:97.0) Gecko/20100101 Firefox/97.0'
for url in urls:
try:
# Create a Request object with the desired User-Agent header
req = urllib.request.Request(url, headers={'User-Agent': user_agent}, method='HEAD')
with urllib.request.urlopen(req) as response:
# Get the 'Content-Length' header value
content_length = response.getheader('Content-Length')
if content_length is not None:
# Convert the content length to an integer and add it to the total size
total_size += int(content_length)
except Exception as e:
print(f"Error while processing URL '{url}': {e}")
return total_size
# Example usage
urls = [
'https://www.example.com/file1.txt',
'https://www.example.com/file2.txt',
# Add more URLs here
]
total_size = get_total_size(urls)
print(f"Total size of all URLs: {total_size} bytes")
In this updated script, the user_agent
variable contains a Firefox User-Agent string. When creating the Request
object, we include the headers
parameter with the desired User-Agent header. The rest of the script remains the same as before.
Please note that the User-Agent string provided in this example is just one of many possible Firefox User-Agent strings. User-Agent strings can vary based on the operating system, browser version, and other factors. You can use a different Firefox User-Agent string if needed.
Prompt:
Write it to use httpx instead
ChatGPT:
To use the httpx
library instead of urllib.request
, you can install the httpx
library using pip install httpx
and then modify the script as follows:
import httpx
async def get_total_size(urls):
total_size = 0
# Define a Firefox User-Agent string
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:97.0) Gecko/20100101 Firefox/97.0'
headers = {'User-Agent': user_agent}
async with httpx.AsyncClient() as client:
for url in urls:
try:
# Send a HEAD request to the URL with the desired User-Agent header
response = await client.head(url, headers=headers)
# Get the 'Content-Length' header value
content_length = response.headers.get('Content-Length')
if content_length is not None:
# Convert the content length to an integer and add it to the total size
total_size += int(content_length)
except Exception as e:
print(f"Error while processing URL '{url}': {e}")
return total_size
# Example usage
urls = [
'https://www.example.com/file1.txt',
'https://www.example.com/file2.txt',
# Add more URLs here
]
# Run the asynchronous function using an event loop
import asyncio
total_size = asyncio.run(get_total_size(urls))
print(f"Total size of all URLs: {total_size} bytes")
In this script, we use the httpx
library to send asynchronous HEAD requests to each URL. The get_total_size
function is defined as an asynchronous function (async def
). We use the httpx.AsyncClient
context manager to create an HTTP client, and we use the client.head
method to send HEAD requests.
The Content-Length
header value is retrieved using response.headers.get('Content-Length')
. The rest of the script is similar to the previous version.
Please note that you need to install the httpx
library before running this script. You can install it using the following command:
pip install httpx
Also, note that the script uses an event loop (asyncio.run
) to run the asynchronous function get_total_size
.
Prompt:
Rewrite that to send 10 requests at a time and show a progress bar
ChatGPT:
To send 10 requests at a time and show a progress bar, we can use the httpx
library's concurrency features along with the tqdm
library for the progress bar. The httpx
library allows us to send multiple requests concurrently using httpx.AsyncClient
, and the tqdm
library provides a simple way to display a progress bar in the terminal.
Here's the modified script:
import httpx
from tqdm import tqdm
import asyncio
async def get_total_size(urls):
total_size = 0
# Define a Firefox User-Agent string
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:97.0) Gecko/20100101 Firefox/97.0'
headers = {'User-Agent': user_agent}
async def fetch_size(url):
nonlocal total_size
try:
response = await client.head(url, headers=headers)
content_length = response.headers.get('Content-Length')
if content_length is not None:
return int(content_length)
except Exception as e:
print(f"Error while processing URL '{url}': {e}")
return 0
async with httpx.AsyncClient() as client:
# Create a progress bar using tqdm
with tqdm(total=len(urls), desc="Fetching sizes", unit="url") as pbar:
# Use asyncio.as_completed to process results as they arrive
coros = [fetch_size(url) for url in urls]
for coro in asyncio.as_completed(coros):
size = await coro
total_size += size
# Update the progress bar
pbar.update(1)
return total_size
# Example usage
urls = [
'https://www.example.com/file1.txt',
'https://www.example.com/file2.txt',
# Add more URLs here
]
# Run the asynchronous function using an event loop
total_size = asyncio.run(get_total_size(urls))
print(f"Total size of all URLs: {total_size} bytes")
In this script, we define an inner asynchronous function fetch_size
that sends a HEAD request to a single URL and returns its size. We then use asyncio.as_completed
to process the results as they arrive. The tqdm
library is used to create and update the progress bar.
To install the tqdm
library, you can use the following command:
pip install tqdm
Please note that the httpx
library automatically limits the maximum number of concurrent connections to a reasonable default value (usually 5 or 10, depending on the version), so you don't need to explicitly limit the number of concurrent requests to 10. If you want to customize the maximum number of concurrent connections, you can do so by passing the limits
parameter to httpx.AsyncClient
.
Prompt:
PYthon function to convert bytes to GB
ChatGPT:
You can create a Python function to convert bytes to gigabytes (GB) by dividing the number of bytes by 2**30 (which is equivalent to 1,073,741,824). Here's the function:
def bytes_to_gb(num_bytes):
# There are 2**30 bytes in a gigabyte
return num_bytes / (2**30)
# Example usage
num_bytes = 1073741824 # 1 GB in bytes
gb = bytes_to_gb(num_bytes)
print(f"{num_bytes} bytes is equivalent to {gb} GB")
In this example, the function bytes_to_gb
takes the number of bytes as input and returns the equivalent number of gigabytes by dividing the input by 2**30. The example usage demonstrates how to use the function to convert 1 GB (expressed in bytes) to gigabytes. The result is 1.0 GB.
Note that the function returns a floating-point number. If you need to format the result with a specific number of decimal places, you can use the round
function or string formatting. For example, to round the result to two decimal places, you can use round(gb, 2)
.
Hey @simonw ! Wondering how you gist the conversation?