Skip to content

Instantly share code, notes, and snippets.

@jccartwright
Created March 23, 2021 20:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jccartwright/6eceb69806cbc2263890ed7d1b68e1b3 to your computer and use it in GitHub Desktop.
Save jccartwright/6eceb69806cbc2263890ed7d1b68e1b3 to your computer and use it in GitHub Desktop.
list files w/ specific extension in http directory
import requests
from bs4 import BeautifulSoup
def get_url_paths(url, ext='', params={}):
response = requests.get(url, params=params)
if response.ok:
response_text = response.text
else:
return response.raise_for_status()
soup = BeautifulSoup(response_text, 'html.parser')
parent = [url + node.get('href') for node in soup.find_all('a') if node.get('href').endswith(ext)]
return parent
url = 'http://localhost/~jcc/bathymetry/'
ext = 'js'
result = get_url_paths(url, ext)
print(result)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment