a set of functions that uses the BeautifulSoup
module to scrap data from various websites
In the fast developing world, data is crucial! The webscrappers
contains a set of functions that can be used to easily retreive data into
python that uses the BeautifulSoup
at its core along with core modules like requests
for retreiving data.
The code is publicly available at webscrappers
by
ZenithClown. To use the code, simply clone using git
like:
git clone https://gist.github.com/ZenithClown/809642277fba2d8d2309e55ab307615f.git webscrappers
export PYTHONPATH="${PYTHONPATH}:webscrappers"
Done, now you can import individual required modules. All the functions are parameterized as much as possible. Check individual bots definations and usages in Web Scrapping BOTs section.
"Web scraping is the process of using bots to extract content and data from a website." Given a HTML page, a webscrapper tends to extract information
from a HTML tag
or elements
into a desired format. In python, Beautiful Soup is popular python
package for parsing HTML and XML documents. Some good tutorials on bs4
that I personally followed:
- Beautiful Soup: Build a Web Scraper With Python - RealPython,
- A Practical Introduction to Web Scraping in Python - RealPython,
- Implementing Web Scraping in Python - Geeks for Geeks,
- Web Scraping with Python - Beautiful Soup Crash Course - freeCodeCamp.org
In addition, one might need Google Chrome Dev-Tools or Microsoft Edge Dev-Tools introduction.
TODO: Documentation. Currently, check the function docs for more information.