Skip to content

Instantly share code, notes, and snippets.

@fastfingertips
Last active August 11, 2022 00:49
Show Gist options
  • Save fastfingertips/8cc0fbb5c35c22dd7238297ca742ecf8 to your computer and use it in GitHub Desktop.
Save fastfingertips/8cc0fbb5c35c22dd7238297ca742ecf8 to your computer and use it in GitHub Desktop.
Scraping tools
LXML
SCRAPY
HTTPLIB
REQUESTS
SELENIUM
HTMLparser
HTMLPARSER
BEAUTIFULSOUP
URLLIB / URLLIB2
https://lxml.de/
https://axiom.ai/
https://apify.com/
https://jsoup.org/
https://scrapy.org
https://texau.app/
https://page.rest/
https://80legs.com/
https://agenty.com/
http://gearman.org/
http://sikulix.com/
https://crawlee.dev/
https://serpapi.com/
https://wrapapi.com/
http://go-colly.org/
http://gigablast.com/
https://jaunt-api.com/
https://www.listly.io/
https://webscraping.ai/
https://cheerio.js.org/
https://playwright.dev/
https://gimmeproxy.com/
https://www.mixnode.com/
https://browserbird.com/
https://webautomation.io/
https://scrapingfish.com/
https://www.opengraph.io/
https://www.page2api.com/
https://www.parsehub.com/
https://serpapi.com/status
https://www.nongnu.org/txr/
https://www.browserless.io/
https://python-rq.org/docs/
https://www.botscraper.com/
https://www.scrapingbee.com/
https://pypi.org/project/sh/
https://www.rainforestqa.com/
https://chrome.browserless.io/
https://stedolan.github.io/jq/
https://videlibri.de/xidel.html
https://github.com/scrapy/scrapy
https://github.com/gocolly/colly
https://altilunium.my.id/psedex/
https://estela.bitmaker.la/docs/
https://github.com/scrapy/scrapyd
https://github.com/ericchiang/pup
https://github.com/dmi3kno/polite
https://www.zyte.com/scrapy-cloud/
https://github.com/featurist/coypu
https://pypi.org/project/explicit/
https://github.com/jahaynes/crawler
http://novosial.org/perl/one-liner/
https://github.com/AutomaApp/automa
http://docs.pyspider.org/en/latest/
https://codecanyon.net/user/ikajian
https://docs.celeryq.dev/en/stable/
https://github.com/bitmakerla/estela
https://github.com/cheeriojs/cheerio
https://github.com/AlexMili/Scraptory
https://substack.thewebscraping.club/
https://github.com/altilunium/wi-page
https://github.com/browserless/chrome
https://github.com/scrapinghub/portia
https://github.com/altilunium/wistalk
https://til.simonwillison.net/gpt3/jq
https://github.com/JCMais/node-libcurl
https://github.com/segmentio/nightmare
https://github.com/altilunium/arachnid
https://github.com/gotripod/ssscraper/
https://github.com/puppeteer/puppeteer
https://github.com/google/gumbo-parser
https://en.wikipedia.org/wiki/ISO_8601
https://github.com/clj-commons/hickory
https://github.com/microsoft/playwright
https://github.com/matthewmueller/x-ray
https://github.com/altilunium/makalahIF
https://github.com/kanishka-linux/hlspy
https://csvkit.readthedocs.io/en/latest/
https://mercury.postlight.com/web-parser/
https://en.wikipedia.org/wiki/Web_ARChive
https://github.com/sananth12/ImageScraper
https://github.com/sparklemotion/mechanize
https://github.com/brutuscat/medusa-crawler
https://cssselect.readthedocs.io/en/latest/
https://newspaper.readthedocs.io/en/latest/
https://github.com/prisma-archive/chromeless
https://github.com/lwthiker/curl-impersonate
https://github.com/alixaxel/chrome-aws-lambda
https://www.lambdatest.com/automation-testing
https://robobrowser.readthedocs.io/en/latest/
https://news.ycombinator.com/item?id=15694118
https://github.com/python-mechanize/mechanize
https://github.com/ruippeixotog/scala-scraper
https://github.com/vsupalov/docker-puppeteer-dev
https://github.com/MechanicalSoup/MechanicalSoup
https://www.drupal.org/project/example_web_scraper
https://splash.readthedocs.io/en/stable/index.html
https://developer.chrome.com/blog/headless-chrome/
https://github.com/sunra/php-simple-html-dom-parser
https://til.simonwillison.net/aws/boto-command-line
https://github.com/mherrmann/selenium-python-helium
https://developer.chrome.com/docs/devtools/recorder/
https://apify.com/petr_cermak/anti-captcha-recaptcha
https://splash.readthedocs.io/en/latest/install.html
https://brycematheson.io/webscraping-with-powershell/
https://blog.jeaye.com/2017/02/28/clojure-apartments/
https://vsupalov.com/headless-chrome-puppeteer-docker/
https://github.com/aaronhoffman/WebsiteContactHarvester
https://github.com/sambaiz/puppeteer-lambda-starter-kit
https://simplehtmldom.sourceforge.io/docs/1.9/index.html
https://dev.woob.tech/guides/module.html#parsing-of-pages
https://ui.vision/rpa/docs/selenium-ide/capturescreenshot
https://webarchive.jira.com/wiki/spaces/Heritrix/overview
https://sangaline.com/post/advanced-web-scraping-tutorial/
https://www.cloudflare.com/pg-lp/bot-mitigation-fight-mode/
https://bitmaker.la/blog/2022/06/24/estela-oss-release.html
https://www.npmjs.com/package/puppeteer-extra-plugin-stealth
https://github.com/ultrafunkamsterdam/undetected-chromedriver
https://www.chrismytton.com/2015/01/19/web-scraping-with-ruby/
https://franciskim.co/why-im-extremely-bullish-on-open-source-rpa/
https://www.imperva.com/blog/web-scraping-bots/?redirect=Incapsula
https://github.com/reanalytics-databoutique/webscraping-open-project
https://docs.browserflow.app/tutorials/tutorial-scrape-a-list-of-urls
https://cran.r-project.org/web/packages/httr/vignettes/quickstart.html
https://simonwillison.net/2022/Mar/14/scraping-web-pages-shot-scraper/
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser
https://sites.google.com/site/scriptsexamples/learn-by-example/parsing-html
https://blitapp.com/blog/take-screenshots-of-multiple-pages-behind-a-login/
https://www.guru99.com/page-object-model-pom-page-factory-in-selenium-ultimate-guide.html
https://medium.com/phantombuster/web-scraping-in-2017-headless-chrome-tips-tricks-4d6521d695e8
https://developers.cloudflare.com/logs/get-started/enable-destinations/s3-compatible-endpoints
https://sourcegraph.com/search?q=context:global+repo:chromium/chromium+kHeadless&patternType=literal
https://stackoverflow.com/questions/33225947/can-a-website-detect-when-you-are-using-selenium-with-chromedriver/41220267#41220267
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment