Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Wikipedia scraping with python
#Scraping wikipedia page according to your command line input
import sys
import requests
import bs4
RED = '\033[31m'
END = '\033[0m'
ascii_art = RED \
+ """
iiii kkkkkkkk iiii
i::::i k::::::k i::::i
iiii k::::::k iiii
wwwwwww wwwww wwwwwwwiiiiiii k:::::k kkkkkkkiiiiiiippppp pppppppppyyyyyyy yyyyyyy
w:::::w w:::::w w:::::w i:::::i k:::::k k:::::k i:::::ip::::ppp:::::::::py:::::y y:::::y
w:::::w w:::::::w w:::::w i::::i k:::::k k:::::k i::::ip:::::::::::::::::py:::::y y:::::y
w:::::w w:::::::::w w:::::w i::::i k:::::k k:::::k i::::ipp::::::ppppp::::::py:::::y y:::::y
w:::::w w:::::w:::::w w:::::w i::::i k::::::k:::::k i::::i p:::::p p:::::p y:::::y y:::::y
w:::::w w:::::w w:::::w w:::::w i::::i k:::::::::::k i::::i p:::::p p:::::p y:::::y y:::::y
w:::::w:::::w w:::::w:::::w i::::i k:::::::::::k i::::i p:::::p p:::::p y:::::y:::::y
w:::::::::w w:::::::::w i::::i k::::::k:::::k i::::i p:::::p p::::::p y:::::::::y
w:::::::w w:::::::w i::::::ik::::::k k:::::k i::::::ip:::::ppppp:::::::p y:::::::y
w:::::w w:::::w i::::::ik::::::k k:::::k i::::::ip::::::::::::::::p y:::::y
w:::w w:::w i::::::ik::::::k k:::::k i::::::ip::::::::::::::pp y:::::y
www www iiiiiiiikkkkkkkk kkkkkkkiiiiiiiip::::::pppppppp y:::::y
p:::::p y:::::y
p:::::p y:::::y
p:::::::p y:::::y
p:::::::p y:::::y
p:::::::p yyyyyyy
[++] wikipy is simple wikipedia scraper [++]
Coded By: Ankit Dobhal
Let's Begin To Scrape..!
wikipy version 1.0
""" \
res = requests.get('' + ' '.join(sys.argv[1:]))
#Just to raise the status code
wiki = bs4.BeautifulSoup(res.text,"lxml")
elems ='p')
for i in range(len(elems)):

This comment has been minimized.

Copy link

@heelrayner heelrayner commented May 3, 2020

could this be used on other wiki?


This comment has been minimized.

Copy link
Owner Author

@ankitdobhal ankitdobhal commented May 3, 2020

It was designed only for Wikipedia but i don't think it will work other one.
But you check the css of that wiki and made changes according this code.


This comment has been minimized.

Copy link

@danhowe0 danhowe0 commented Jul 15, 2020

I get the error:
Traceback (most recent call last):
File "/data/user/0/ru.iiec.pydroid3/files/accomp_files/iiec_run/", line 31, in
File "/data/user/0/ru.iiec.pydroid3/files/accomp_files/iiec_run/", line 30, in start
exec(open(mainpyfile).read(), main.dict)
File "", line 48, in
File "/data/user/0/ru.iiec.pydroid3/files/aarch64-linux-android/lib/python3.8/site-packages/bs4/", line 242, in init
raise FeatureNotFound(
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?


This comment has been minimized.

Copy link

@hugolpz hugolpz commented Feb 25, 2021

@heelrayner , just change the line 47. API are common across wikis (except wikidata). The question is more : where do we get the full list of wiki pages. See below:


  • 0: (main)
  • 1: Talk:
  • 2: User:
  • 3: User_talk:

Dumps' & paths

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment