Created
December 15, 2019 21:20
-
-
Save Perishleaf/eb11a2322019bf90e1802dc25c06c59b to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# There are four pages for the list, so we stored the web links for these four pages into `domain` | |
headers = ({'User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}) | |
domain = [] | |
domain.append("https://en.wikipedia.org/wiki/Category:Suburbs_of_Sydney") | |
domain.append("https://en.wikipedia.org/w/index.php?title=Category:Suburbs_of_Sydney&pagefrom=Dharruk%2C+New+South+Wales#mw-pages") | |
domain.append("https://en.wikipedia.org/w/index.php?title=Category:Suburbs_of_Sydney&pagefrom=Macgraths+Hill%0AMcGraths+Hill%2C+New+South+Wales#mw-pages") | |
domain.append("https://en.wikipedia.org/w/index.php?title=Category:Suburbs_of_Sydney&pagefrom=Singletons+Mill%2C+New+South+Wales#mw-pages") | |
# Create a empty list to store content | |
suburb_list =[] | |
for i in range(len(domain)): | |
response = get(domain[i], headers=headers) | |
# Check if get infomation from the target website, "200" denotes ok. | |
print(response) | |
html_soup = BeautifulSoup(response.text, 'html.parser') | |
# After inspecting the first "find_all", the list we need is in [1]. The result will be a list of lists | |
suburb_list.append(html_soup.find_all('div', class_="mw-category")[1].find_all('div', class_="mw-category-group")) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment