Skip to content

Instantly share code, notes, and snippets.

@sarthakpranesh
Created February 16, 2021 09:26
Show Gist options
  • Save sarthakpranesh/5b89ea74c557abb1f55821e18d9241de to your computer and use it in GitHub Desktop.
Save sarthakpranesh/5b89ea74c557abb1f55821e18d9241de to your computer and use it in GitHub Desktop.
Parallel Soup

High Performance Library to parallelize BeautifulSoup

A library that wrappes BeautifulSoup to provide multi threaded scrapping, reducing the total time involved in the scrapping process. The library should implement the following affectively: (this list can be extended in future)

  • Parallelization
  • Should have a generic interface that maps to beautiful soup
  • All parts of the library should be documented heavily
  • All parts of the library should have unit tests written for verification of their functionality
  • Showcase written examples for different sorts of scrapping

Some Resources