Skip to content

Instantly share code, notes, and snippets.

@simonw
Created April 17, 2023 18:49
Show Gist options
  • Save simonw/38246d2f230bd1d5cf8b4907e8871ed1 to your computer and use it in GitHub Desktop.
Save simonw/38246d2f230bd1d5cf8b4907e8871ed1 to your computer and use it in GitHub Desktop.

URLs Size Calculator

April 17, 2023 at 9:03:59 AM

Prompt:

Write a Python script with no extra dependencies which can take a list of URLs and use a HEAD request to find the size of each one and then add those all up


ChatGPT:

You can achieve this using the urllib.request module from the Python standard library. The urllib.request module provides a function called urlopen that can be used to open URLs. You can use this function to send a HEAD request to each URL in the list and retrieve the Content-Length header to determine the size of each URL. The following Python script demonstrates how to do this:

import urllib.request

def get_total_size(urls):
    total_size = 0
    for url in urls:
        try:
            # Send a HEAD request to the URL
            req = urllib.request.Request(url, method='HEAD')
            with urllib.request.urlopen(req) as response:
                # Get the 'Content-Length' header value
                content_length = response.getheader('Content-Length')
                if content_length is not None:
                    # Convert the content length to an integer and add it to the total size
                    total_size += int(content_length)
        except Exception as e:
            print(f"Error while processing URL '{url}': {e}")
    return total_size

# Example usage
urls = [
    'https://www.example.com/file1.txt',
    'https://www.example.com/file2.txt',
    # Add more URLs here
]

total_size = get_total_size(urls)
print(f"Total size of all URLs: {total_size} bytes")

In this script, the get_total_size function takes a list of URLs as input and calculates the total size of all the URLs by sending a HEAD request to each URL and retrieving the Content-Length header value. The total size is then returned by the function.

Note that this script does not handle all possible edge cases and error conditions. For example, some servers may not return the Content-Length header, in which case the script will not be able to determine the size of the URL. Additionally, the script does not handle redirects or other HTTP response codes. Depending on your use case, you may need to add additional error handling and functionality to the script.


Prompt:

Send a Firefox user agent


ChatGPT:

To send a custom User-Agent header in your request, you can modify the Request object to include the desired User-Agent string. Below is the updated script that sends a HEAD request with a Firefox User-Agent header:

import urllib.request

def get_total_size(urls):
    total_size = 0
    # Define a Firefox User-Agent string
    user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:97.0) Gecko/20100101 Firefox/97.0'
    for url in urls:
        try:
            # Create a Request object with the desired User-Agent header
            req = urllib.request.Request(url, headers={'User-Agent': user_agent}, method='HEAD')
            with urllib.request.urlopen(req) as response:
                # Get the 'Content-Length' header value
                content_length = response.getheader('Content-Length')
                if content_length is not None:
                    # Convert the content length to an integer and add it to the total size
                    total_size += int(content_length)
        except Exception as e:
            print(f"Error while processing URL '{url}': {e}")
    return total_size

# Example usage
urls = [
    'https://www.example.com/file1.txt',
    'https://www.example.com/file2.txt',
    # Add more URLs here
]

total_size = get_total_size(urls)
print(f"Total size of all URLs: {total_size} bytes")

In this updated script, the user_agent variable contains a Firefox User-Agent string. When creating the Request object, we include the headers parameter with the desired User-Agent header. The rest of the script remains the same as before.

Please note that the User-Agent string provided in this example is just one of many possible Firefox User-Agent strings. User-Agent strings can vary based on the operating system, browser version, and other factors. You can use a different Firefox User-Agent string if needed.


Prompt:

Write it to use httpx instead


ChatGPT:

To use the httpx library instead of urllib.request, you can install the httpx library using pip install httpx and then modify the script as follows:

import httpx

async def get_total_size(urls):
    total_size = 0
    # Define a Firefox User-Agent string
    user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:97.0) Gecko/20100101 Firefox/97.0'
    headers = {'User-Agent': user_agent}

    async with httpx.AsyncClient() as client:
        for url in urls:
            try:
                # Send a HEAD request to the URL with the desired User-Agent header
                response = await client.head(url, headers=headers)
                # Get the 'Content-Length' header value
                content_length = response.headers.get('Content-Length')
                if content_length is not None:
                    # Convert the content length to an integer and add it to the total size
                    total_size += int(content_length)
            except Exception as e:
                print(f"Error while processing URL '{url}': {e}")
    return total_size

# Example usage
urls = [
    'https://www.example.com/file1.txt',
    'https://www.example.com/file2.txt',
    # Add more URLs here
]

# Run the asynchronous function using an event loop
import asyncio
total_size = asyncio.run(get_total_size(urls))
print(f"Total size of all URLs: {total_size} bytes")

In this script, we use the httpx library to send asynchronous HEAD requests to each URL. The get_total_size function is defined as an asynchronous function (async def). We use the httpx.AsyncClient context manager to create an HTTP client, and we use the client.head method to send HEAD requests.

The Content-Length header value is retrieved using response.headers.get('Content-Length'). The rest of the script is similar to the previous version.

Please note that you need to install the httpx library before running this script. You can install it using the following command:

pip install httpx

Also, note that the script uses an event loop (asyncio.run) to run the asynchronous function get_total_size.


Prompt:

Rewrite that to send 10 requests at a time and show a progress bar


ChatGPT:

To send 10 requests at a time and show a progress bar, we can use the httpx library's concurrency features along with the tqdm library for the progress bar. The httpx library allows us to send multiple requests concurrently using httpx.AsyncClient, and the tqdm library provides a simple way to display a progress bar in the terminal.

Here's the modified script:

import httpx
from tqdm import tqdm
import asyncio

async def get_total_size(urls):
    total_size = 0
    # Define a Firefox User-Agent string
    user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:97.0) Gecko/20100101 Firefox/97.0'
    headers = {'User-Agent': user_agent}

    async def fetch_size(url):
        nonlocal total_size
        try:
            response = await client.head(url, headers=headers)
            content_length = response.headers.get('Content-Length')
            if content_length is not None:
                return int(content_length)
        except Exception as e:
            print(f"Error while processing URL '{url}': {e}")
        return 0

    async with httpx.AsyncClient() as client:
        # Create a progress bar using tqdm
        with tqdm(total=len(urls), desc="Fetching sizes", unit="url") as pbar:
            # Use asyncio.as_completed to process results as they arrive
            coros = [fetch_size(url) for url in urls]
            for coro in asyncio.as_completed(coros):
                size = await coro
                total_size += size
                # Update the progress bar
                pbar.update(1)

    return total_size

# Example usage
urls = [
    'https://www.example.com/file1.txt',
    'https://www.example.com/file2.txt',
    # Add more URLs here
]

# Run the asynchronous function using an event loop
total_size = asyncio.run(get_total_size(urls))
print(f"Total size of all URLs: {total_size} bytes")

In this script, we define an inner asynchronous function fetch_size that sends a HEAD request to a single URL and returns its size. We then use asyncio.as_completed to process the results as they arrive. The tqdm library is used to create and update the progress bar.

To install the tqdm library, you can use the following command:

pip install tqdm

Please note that the httpx library automatically limits the maximum number of concurrent connections to a reasonable default value (usually 5 or 10, depending on the version), so you don't need to explicitly limit the number of concurrent requests to 10. If you want to customize the maximum number of concurrent connections, you can do so by passing the limits parameter to httpx.AsyncClient.


Prompt:

PYthon function to convert bytes to GB


ChatGPT:

You can create a Python function to convert bytes to gigabytes (GB) by dividing the number of bytes by 2**30 (which is equivalent to 1,073,741,824). Here's the function:

def bytes_to_gb(num_bytes):
    # There are 2**30 bytes in a gigabyte
    return num_bytes / (2**30)

# Example usage
num_bytes = 1073741824  # 1 GB in bytes
gb = bytes_to_gb(num_bytes)
print(f"{num_bytes} bytes is equivalent to {gb} GB")

In this example, the function bytes_to_gb takes the number of bytes as input and returns the equivalent number of gigabytes by dividing the input by 2**30. The example usage demonstrates how to use the function to convert 1 GB (expressed in bytes) to gigabytes. The result is 1.0 GB.

Note that the function returns a floating-point number. If you need to format the result with a specific number of decimal places, you can use the round function or string formatting. For example, to round the result to two decimal places, you can use round(gb, 2).

@amaarora
Copy link

Hey @simonw ! Wondering how you gist the conversation?

@simonw
Copy link
Author

simonw commented Jul 30, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment