Last active
August 11, 2022 00:49
-
-
Save fastfingertips/8cc0fbb5c35c22dd7238297ca742ecf8 to your computer and use it in GitHub Desktop.
Scraping tools
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
LXML | |
SCRAPY | |
HTTPLIB | |
REQUESTS | |
SELENIUM | |
HTMLparser | |
HTMLPARSER | |
BEAUTIFULSOUP | |
URLLIB / URLLIB2 | |
https://lxml.de/ | |
https://axiom.ai/ | |
https://apify.com/ | |
https://jsoup.org/ | |
https://scrapy.org | |
https://texau.app/ | |
https://page.rest/ | |
https://80legs.com/ | |
https://agenty.com/ | |
http://gearman.org/ | |
http://sikulix.com/ | |
https://crawlee.dev/ | |
https://serpapi.com/ | |
https://wrapapi.com/ | |
http://go-colly.org/ | |
http://gigablast.com/ | |
https://jaunt-api.com/ | |
https://www.listly.io/ | |
https://webscraping.ai/ | |
https://cheerio.js.org/ | |
https://playwright.dev/ | |
https://gimmeproxy.com/ | |
https://www.mixnode.com/ | |
https://browserbird.com/ | |
https://webautomation.io/ | |
https://scrapingfish.com/ | |
https://www.opengraph.io/ | |
https://www.page2api.com/ | |
https://www.parsehub.com/ | |
https://serpapi.com/status | |
https://www.nongnu.org/txr/ | |
https://www.browserless.io/ | |
https://python-rq.org/docs/ | |
https://www.botscraper.com/ | |
https://www.scrapingbee.com/ | |
https://pypi.org/project/sh/ | |
https://www.rainforestqa.com/ | |
https://chrome.browserless.io/ | |
https://stedolan.github.io/jq/ | |
https://videlibri.de/xidel.html | |
https://github.com/scrapy/scrapy | |
https://github.com/gocolly/colly | |
https://altilunium.my.id/psedex/ | |
https://estela.bitmaker.la/docs/ | |
https://github.com/scrapy/scrapyd | |
https://github.com/ericchiang/pup | |
https://github.com/dmi3kno/polite | |
https://www.zyte.com/scrapy-cloud/ | |
https://github.com/featurist/coypu | |
https://pypi.org/project/explicit/ | |
https://github.com/jahaynes/crawler | |
http://novosial.org/perl/one-liner/ | |
https://github.com/AutomaApp/automa | |
http://docs.pyspider.org/en/latest/ | |
https://codecanyon.net/user/ikajian | |
https://docs.celeryq.dev/en/stable/ | |
https://github.com/bitmakerla/estela | |
https://github.com/cheeriojs/cheerio | |
https://github.com/AlexMili/Scraptory | |
https://substack.thewebscraping.club/ | |
https://github.com/altilunium/wi-page | |
https://github.com/browserless/chrome | |
https://github.com/scrapinghub/portia | |
https://github.com/altilunium/wistalk | |
https://til.simonwillison.net/gpt3/jq | |
https://github.com/JCMais/node-libcurl | |
https://github.com/segmentio/nightmare | |
https://github.com/altilunium/arachnid | |
https://github.com/gotripod/ssscraper/ | |
https://github.com/puppeteer/puppeteer | |
https://github.com/google/gumbo-parser | |
https://en.wikipedia.org/wiki/ISO_8601 | |
https://github.com/clj-commons/hickory | |
https://github.com/microsoft/playwright | |
https://github.com/matthewmueller/x-ray | |
https://github.com/altilunium/makalahIF | |
https://github.com/kanishka-linux/hlspy | |
https://csvkit.readthedocs.io/en/latest/ | |
https://mercury.postlight.com/web-parser/ | |
https://en.wikipedia.org/wiki/Web_ARChive | |
https://github.com/sananth12/ImageScraper | |
https://github.com/sparklemotion/mechanize | |
https://github.com/brutuscat/medusa-crawler | |
https://cssselect.readthedocs.io/en/latest/ | |
https://newspaper.readthedocs.io/en/latest/ | |
https://github.com/prisma-archive/chromeless | |
https://github.com/lwthiker/curl-impersonate | |
https://github.com/alixaxel/chrome-aws-lambda | |
https://www.lambdatest.com/automation-testing | |
https://robobrowser.readthedocs.io/en/latest/ | |
https://news.ycombinator.com/item?id=15694118 | |
https://github.com/python-mechanize/mechanize | |
https://github.com/ruippeixotog/scala-scraper | |
https://github.com/vsupalov/docker-puppeteer-dev | |
https://github.com/MechanicalSoup/MechanicalSoup | |
https://www.drupal.org/project/example_web_scraper | |
https://splash.readthedocs.io/en/stable/index.html | |
https://developer.chrome.com/blog/headless-chrome/ | |
https://github.com/sunra/php-simple-html-dom-parser | |
https://til.simonwillison.net/aws/boto-command-line | |
https://github.com/mherrmann/selenium-python-helium | |
https://developer.chrome.com/docs/devtools/recorder/ | |
https://apify.com/petr_cermak/anti-captcha-recaptcha | |
https://splash.readthedocs.io/en/latest/install.html | |
https://brycematheson.io/webscraping-with-powershell/ | |
https://blog.jeaye.com/2017/02/28/clojure-apartments/ | |
https://vsupalov.com/headless-chrome-puppeteer-docker/ | |
https://github.com/aaronhoffman/WebsiteContactHarvester | |
https://github.com/sambaiz/puppeteer-lambda-starter-kit | |
https://simplehtmldom.sourceforge.io/docs/1.9/index.html | |
https://dev.woob.tech/guides/module.html#parsing-of-pages | |
https://ui.vision/rpa/docs/selenium-ide/capturescreenshot | |
https://webarchive.jira.com/wiki/spaces/Heritrix/overview | |
https://sangaline.com/post/advanced-web-scraping-tutorial/ | |
https://www.cloudflare.com/pg-lp/bot-mitigation-fight-mode/ | |
https://bitmaker.la/blog/2022/06/24/estela-oss-release.html | |
https://www.npmjs.com/package/puppeteer-extra-plugin-stealth | |
https://github.com/ultrafunkamsterdam/undetected-chromedriver | |
https://www.chrismytton.com/2015/01/19/web-scraping-with-ruby/ | |
https://franciskim.co/why-im-extremely-bullish-on-open-source-rpa/ | |
https://www.imperva.com/blog/web-scraping-bots/?redirect=Incapsula | |
https://github.com/reanalytics-databoutique/webscraping-open-project | |
https://docs.browserflow.app/tutorials/tutorial-scrape-a-list-of-urls | |
https://cran.r-project.org/web/packages/httr/vignettes/quickstart.html | |
https://simonwillison.net/2022/Mar/14/scraping-web-pages-shot-scraper/ | |
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser | |
https://sites.google.com/site/scriptsexamples/learn-by-example/parsing-html | |
https://blitapp.com/blog/take-screenshots-of-multiple-pages-behind-a-login/ | |
https://www.guru99.com/page-object-model-pom-page-factory-in-selenium-ultimate-guide.html | |
https://medium.com/phantombuster/web-scraping-in-2017-headless-chrome-tips-tricks-4d6521d695e8 | |
https://developers.cloudflare.com/logs/get-started/enable-destinations/s3-compatible-endpoints | |
https://sourcegraph.com/search?q=context:global+repo:chromium/chromium+kHeadless&patternType=literal | |
https://stackoverflow.com/questions/33225947/can-a-website-detect-when-you-are-using-selenium-with-chromedriver/41220267#41220267 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment