Skip to content

Instantly share code, notes, and snippets.

@Korving-F
Korving-F / regexlib-raw.txt
Created September 25, 2021 06:58 — forked from JamoCA/regexlib-raw.txt
RXXR2 regular expression static analyzer
# 20161122 https://github.com/ConradIrwin/rxxr2/blob/master/data/input/regexlib-raw.txt http://www.cs.bham.ac.uk/~hxt/research/rxxr2/
# This will find URLs in plain text. With or without protocol. It matches against all toplevel domains to find the URL in the text.
# ID: 1016
([\d\w-.]+?\.(a[cdefgilmnoqrstuwz]|b[abdefghijmnorstvwyz]|c[acdfghiklmnoruvxyz]|d[ejkmnoz]|e[ceghrst]|f[ijkmnor]|g[abdefghilmnpqrstuwy]|h[kmnrtu]|i[delmnoqrst]|j[emop]|k[eghimnprwyz]|l[abcikrstuvy]|m[acdghklmnopqrstuvwxyz]|n[acefgilopruz]|om|p[aefghklmnrstwy]|qa|r[eouw]|s[abcdeghijklmnortuvyz]|t[cdfghjkmnoprtvwz]|u[augkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]|aero|arpa|biz|com|coop|edu|info|int|gov|mil|museum|name|net|org|pro)(\b|\W(?<!&|=)(?!\.\s|\.{3}).*?))(\s|$)
# Retrieves all anchor links in a html document, useful for spidering. You will need to do a replace of " and ' after the regular expression, as the expression gets all links. As far as I know there is no way, even with \1 groupings, of getting a condition on whether the link
#!/bin/bash
# SPDX-License-Identifier: MIT
## Copyright (C) 2009 Przemyslaw Pawelczyk <przemoc@gmail.com>
##
## This script is licensed under the terms of the MIT license.
## https://opensource.org/licenses/MIT
#
# Lockable script boilerplate