Skip to content

Instantly share code, notes, and snippets.

@tosh1ki
Created June 27, 2015 05:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tosh1ki/29158bb440e2142fc457 to your computer and use it in GitHub Desktop.
Save tosh1ki/29158bb440e2142fc457 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import urllib
import requests
from bs4 import BeautifulSoup
if __name__ == '__main__':
## CSVのURLを取得
url = 'http://www.city.osaka.lg.jp/shimin/page/0000298810.html'
r = requests.get(url)
html = r.text.encode('ISO-8859-1').decode('utf-8')
soup = BeautifulSoup(html)
base_url = soup.base.get('href')
link_candidates = [link.get('href') for link in soup.find_all('a')
if link.get('href')]
relative_path = filter(lambda x: x.endswith('csv'), link_candidates)
absolute_path = map(lambda l: urllib.parse.urljoin(base_url, l),
relative_path)
with open('urllist.csv', 'w') as f:
f.write('\n'.join(list(absolute_path)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment