Skip to content

Instantly share code, notes, and snippets.

@suriyadeepan
Created August 26, 2016 06:09
Show Gist options
  • Star 11 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save suriyadeepan/b940caf6cba552527613c1f93e26cc80 to your computer and use it in GitHub Desktop.
Save suriyadeepan/b940caf6cba552527613c1f93e26cc80 to your computer and use it in GitHub Desktop.
Scrap images from a wiki page using Beautiful Soup
from bs4 import BeautifulSoup
import requests
url = 'https://en.wikipedia.org/wiki/Transhumanism'
# get contents from url
content = requests.get(url).content
# get soup
soup = BeautifulSoup(content,'lxml') # choose lxml parser
# find the tag : <img ... >
image_tags = soup.findAll('img')
# print out image urls
for image_tag in image_tags:
print(image_tag.get('src'))
@henriquepeixoto
Copy link

Thanks man, this works perfect!

@cindychen0204
Copy link

this is really simple and useful

@kchiran
Copy link

kchiran commented Jun 16, 2020

Great code, works perfect. THANK YOU.

@Zamby92
Copy link

Zamby92 commented Jun 19, 2020

how i can download this imagenes?

@kchiran
Copy link

kchiran commented Aug 11, 2020

imagefile = open(filename + ".jpeg", 'wb')
imagefile.write(urllib.request.urlopen(image).read())
imagefile.close()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment