Skip to content

Instantly share code, notes, and snippets.

@trscavo
Last active January 10, 2016 18:54
Show Gist options
  • Save trscavo/055ffcd76952bd9603fe to your computer and use it in GitHub Desktop.
Save trscavo/055ffcd76952bd9603fe to your computer and use it in GitHub Desktop.
A bash script that probes a sequence of Shibboleth IdPs to determine which are based on the Shibboleth IdP V2 software
#!/bin/bash
#######################################################################
# Copyright 2015--2016 InCommon, LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#######################################################################
script_version="1.2"
user_agent_string="Shibboleth IdP Probe ${script_version}"
#######################################################################
# help message
#######################################################################
display_help () {
/bin/cat <<- HELP_MSG
${user_agent_string}
Given a list of identifiers (usually entityIDs), determine which
of those identifiers correspond to Shibboleth IdP V2 deployments.
Non-IdPs are ignored. Non-Shibboleth IdPs are also ignored. This
script probes Shibboleth IdP deployments only.
Usage: ${0##*/} [-hvq] [-t CONNECT_TIME -m MAX_TIME] (-u MDQ_BASE_URL | -f MD_PATH) [-b BIN_DIR] [-d OUT_DIR] [ID ...]
The script optionally takes a sequence of identifiers on the command
line. If none are given, the script takes its input from stdin.
The script iterates over all input identifiers. For each identifier,
if the corresponding entity is a Shibboleth IdP, the script sends a
Shibboleth IdP V2 Status request to a well-known endpoint location at
that IdP. If the HTTP response code is 200 and the response starts with
"ok", then we know the IdP is based on the Shibboleth IdP V2 software.
If, OTOH, the HTTP response code is 404, it is likely the IdP is based
on the Shibboleth IdP V3 software (since V3 has no such Status endpoint).
Other results are inconclusive.
Options:
-h Display this message
-v Write verbose messages to stdout
-q Run quietly (i.e., write no messages to stdout)
-t Time (in secs) to connect to the host
-m Maximum time (in secs) of a complete probe
-u Base URL of a Metadata Query Server
-f Path to a local metadata file
-b Path to a directory containing one or more scripts
-d Path to an output directory
Option -h is mutually exclusive of all other options. Options
-q and -v are mutually exclusive of each other. Options -u and -f
are mutually exclusive of each other as well.
The argument of the -t option is the TCP connect time, that is,
the maximum time (in secs) allotted to the TCP connection. Note
that the TCP connect time includes the time it takes to do a
DNS name lookup. Since the latter is unconstrained, it may
consume all available TCP connect time. Thus the TCP connect
time should be kept small (on the order of a few seconds) since
larger values will slow this script considerably.
The argument of the -m option is the maximum total time (in secs)
allotted to each probe. A reasonable value is a few seconds
beyond the TCP connect time. Any value less than the TCP connect
time causes the script to immediately fail.
Entity metadata is required to process each identifier. Metadata is
obtained in one of two ways, by consulting a Metadata Query Server
just-in-time or by using a pre-provisioned metadata aggregate. These
correspond to options -u and -f, respectively. Exactly one of these
options is required.
Option -f takes an optional file argument (MD_PATH), the absolute
path to a local SAML metadata file. The script searches this file for
a corresponding entity descriptor as it processes each identifier.
Option -u takes an optional URI argument (MDQ_BASE_URL), the base
URL of a Metadata Query Server (i.e., a server that conforms to the
Metadata Query Protocol). The base URL is used to construct an MDQ
request URL, which the script uses to request entity metadata
just-in-time.
The script requires a helper script (md_tools.sh) to resolve entity
metadata. By default, the helper script is assumed to be in the same
directory as this script. If not, use option -b to specify the
directory containing the helper script.
STDOUT
By default, the script outputs an abbreviated log to stdout (but
this may be suppressed by use of the -q option). A line of
standard output has the following space-delimited fields:
1) code: a curl exit code
2) output: a curl output string
3) statusURL: the URL of the probed Status endpoint
4) SHIBV: Shibboleth version indicator
See the curl man page (http://linux.die.net/man/1/curl) for a
brief description of possible exit codes.
The output string has the following format:
response:999;dns:9.999;tcp:9.999;ssl:9.999;total:9.999
The response in the output string is the HTTP response code of the
probed web server. If the probe does not complete, the HTTP response
will be 000. The remaining four values in the output string are times
(in secs) computed by curl:
dns is the elapsed time up to and including the DNS lookup
(curl time_namelookup variable)
tcp is the elapsed time up to and including the TCP connection
(curl time_connect variable)
ssl is the elapsed time up to and including the SSL exchange
(curl time_appconnect variable) (only curl 7.19.0 and later)
total is the total elapsed time of the probe
(curl time_total variable)
See the curl man page (curl --write-out option) for detailed
explanations of these timings.
By definition, a probe succeeds if its exit code is 0. For our
purposes, a probe completely fails if its exit code is either 6
or 7. (Exit code 6 indicates a DNS lookup failure while code 7
means the host is unreachable on the network.) A probe that times
out (exit code 28) is labeled as nonresponsive. All other exit codes
are regarded as indeterminate.
The statusURL is the actual URL probed by this script. It is
computed from an HTTP endpoint location in metadata.
The Shibboleth version indicator (SHIBV) takes on one of three
values: SHIB2, SHIB3, or SHIB?. These strings indicate Shibboleth
IdP V2, Shibboleth IdP V3, or an unknown version of the Shibboleth
IdP software, respectively.
Note: This script detects Shibboleth IdP V2 deployments with high
probability, that is, there is little or no chance of a false
positive. However, some V2 deployments may evade this script (for
various reasons) and thus be reported as "SHIB?". Similarly, the
script detects Shibboleth IdP V3 deployments with reasonable
likelihood, but there is a significant chance of a false positive
in this case.
FILES
The script writes a number of output files if (and only if) the
-d option is specified on the command line. The output files are
written to the given OUT_DIR.
${NO_SAML2_HTTP_ENDPOINT_FILENAME}
A list of IdPs that do not expose a suitable SAML2 HTTP endpoint
location in metadata. A suitable endpoint supports one of the
following SAML2 HTTP bindings: HTTP-Redirect, HTTP-POST, or
HTTP-POST-SimpleSign. An IdP that supports SAML1 only will
necessarily appear on this list, and will therefore not be probed.
A line in the output file has the following space-delimited fields:
1) entityID: the entityID of the IdP
2) registrarID: the registrar ID
The entityID is the name of the IdP. An entityID is an arbitrary URI,
as given by the entityID XML attribute on the <md:EntityDescriptor>
element in SAML metadata.
The registrarID is the name of the registrar that registered the IdP
metadata in the first place. By convention, a registrar ID is an
arbitrary URI, as given by the registrationAuthority XML attribute
on the <mdrpi:RegistrationInfo> element in SAML metadata. Since the
latter element is optional in metadata, this field may be blank in
the log file (which is why it is always the last field on any given
output line).
${NOT_SHIB_FILENAME}
A list of non-Shibboleth IdPs, determined by inspecting a suitable
SAML2 HTTP endpoint location in metadata. Such IdPs are not probed
by this script.
A line in the output file has the following space-delimited fields:
1) location: a SAML2 HTTP endpoint location
2) entityID: the entityID of the IdP
3) registrarID: the registrar ID
The location field gives the HTTP endpoint location used to identify
the IdP. If the location URL indicates the IdP is a Shibboleth IdP,
the statusURL is computed from the HTTP location on the fly.
The entityID and the registrarID fields are the same as in the
previous output file.
${SHIB_LOG_FILENAME}
A log of each probe. Each line records the result of the probe of
a single Shibboleth IdP. A line in the log file has the following
space-delimited fields:
1) code: a curl exit code
2) output: a curl output string
3) statusURL: the URL of the probed Status endpoint
4) location: a SAML2 HTTP endpoint location
5) entityID: the entityID of the Shibboleth IdP
6) registrarID: the registrar ID
The code, output, and statusURL fields are the same as those printed
to stdout.
The location, entityID, and registrarID fields are the same as in the
previous output file.
${SHIB2_LOG_FILENAME}
A log of each probe made to a Shibboleth IdP V2 deployment. If the HTTP
response code is 200 and the response body starts with "ok", then we
know the deployment is based on the Shibboleth IdP V2 software.
The format of this file is identical to the format of the previous file.
${SHIB3_LOG_FILENAME}
A log of each probe made to a Shibboleth IdP V3 deployment. If the HTTP
response code is 404, the deployment is likely based on the Shibboleth
IdP V3 software.
The format of this file is identical to the format of the previous file.
${SHIB_UNKNOWN_LOG_FILENAME}
A log of each probe made to an IdP deployment based on an unknown
version of the Shibboleth IdP software.
The format of this file is identical to the format of the previous file.
Examples: ${0##*/} -h
${0##*/} -t ${connect_timeout_default} -m ${max_time_default} \$id
cat \$id_file | ${0##*/} -v -t 4 -m 6
${0##*/} -q -f /path/to/md_file.xml \$id1 \$id2 \$id3
Note that the second example above is the same as no options at all.
HELP_MSG
}
#######################################################################
# Bootstrap
#######################################################################
script_bin=${0%/*} # equivalent to dirname $0
script_name=${0##*/} # equivalent to basename $0
connect_timeout_default=2
max_time_default=4
# output file_names
NO_SAML2_HTTP_ENDPOINT_FILENAME="idps-no-saml2-http-endpoint.txt"
NOT_SHIB_FILENAME="idps-not-shibboleth.txt"
SHIB_LOG_FILENAME="idps-shibboleth-log.txt"
SHIB2_LOG_FILENAME="idps-shibboleth2-log.txt"
SHIB3_LOG_FILENAME="idps-shibboleth3-log.txt"
SHIB_UNKNOWN_LOG_FILENAME="idps-shibboleth-version-unknown-log.txt"
init_out_files () {
local out_dir=$1 # TODO
local exit_status
# create the dir if necessary
if [ ! -d "$out_dir" ]; then
mkdir "$out_dir"
exit_status=$?
if [ $exit_status -ne 0 ]; then
echo "ERROR: $FUNCNAME failed to create dir: $out_dir" >&2
exit $exit_status
fi
fi
# output files
NO_SAML2_HTTP_ENDPOINT_FILE="$out_dir/$NO_SAML2_HTTP_ENDPOINT_FILENAME"
NOT_SHIB_FILE="$out_dir/$NOT_SHIB_FILENAME"
SHIB_LOG_FILE="$out_dir/$SHIB_LOG_FILENAME"
SHIB2_LOG_FILE="$out_dir/$SHIB2_LOG_FILENAME"
SHIB3_LOG_FILE="$out_dir/$SHIB3_LOG_FILENAME"
SHIB_UNKNOWN_LOG_FILE="$out_dir/$SHIB_UNKNOWN_LOG_FILENAME"
}
#######################################################################
# Process command-line options and arguments
#######################################################################
help_mode=false; quiet_mode=false; verbose_mode=false
md_query_mode=false; md_file_mode=false
local_opts=; connect_timeout=; max_time=
while getopts ":hqvt:m:u:f:b:d:" opt; do
case $opt in
h)
help_mode=true
;;
q)
quiet_mode=true
verbose_mode=false
#local_opts="$local_opts -$opt"
exec 1>/dev/null # redirect stdout to the bit bucket
;;
v)
quiet_mode=false
verbose_mode=true
local_opts="$local_opts -$opt"
;;
t)
connect_timeout="$OPTARG"
;;
m)
max_time="$OPTARG"
;;
u)
md_query_mode=true
md_file_mode=false
mdq_base_url="$OPTARG"
;;
f)
md_query_mode=false
md_file_mode=true
md_path="$OPTARG"
;;
b)
bin_dir="$OPTARG"
;;
d)
out_dir="$OPTARG"
;;
\?)
echo "ERROR: $script_name: Unrecognized option: -$OPTARG" >&2
exit 2
;;
:)
echo "ERROR: $script_name: Option -$OPTARG requires an argument" >&2
exit 2
;;
esac
done
if $help_mode; then
display_help
exit 0
fi
# determine the metadata source
if $md_query_mode; then
if [ -z "$mdq_base_url" ]; then
echo "ERROR: $script_name: option -u requires an argument" >&2
exit 2
fi
$verbose_mode && printf "$script_name using base URL: %s\n" "$mdq_base_url"
# global var for getEntityFromServer function
MDQ_BASE_URL="$mdq_base_url"
elif $md_file_mode; then
if [ -z "$md_path" ]; then
echo "ERROR: $script_name: option -f requires an argument" >&2
exit 2
fi
if [ ! -f "$md_path" ]; then
echo "ERROR: $script_name: file does not exist: $md_path" >&2
exit 2
fi
$verbose_mode && printf "$script_name using metadata file: %s\n" "$md_path"
# global var for getEntityFromFile function
MD_PATH="$md_path"
else
echo "ERROR: $script_name: one of options -u or -f required" >&2
exit 2
fi
# determine the bin directory
if [ -n "$bin_dir" ]; then
if [ ! -d "$bin_dir" ]; then
echo "ERROR: $script_name: directory does not exist: $bin_dir" >&2
exit 2
fi
BIN_DIR="$bin_dir"
else
BIN_DIR="$script_bin"
fi
$verbose_mode && printf "$script_name using bin directory: %s\n" "$BIN_DIR"
# determine the output directory
if [ -z "$out_dir" ]; then
DO_NOT_PRINT_OUT_FILES=true
$verbose_mode && printf "$script_name not printing output files\n"
else
DO_NOT_PRINT_OUT_FILES=false
init_out_files "$out_dir"
$verbose_mode && printf "$script_name using output dir: %s\n" "$out_dir"
fi
# check consistency of timeout options (both or neither are required)
if [ -z "$connect_timeout" -a -z "$max_time" ]; then
connect_timeout=$connect_timeout_default
max_time=$max_time_default
elif [ -n "$connect_timeout" -a -n "$max_time" ]; then
if [ ! "${connect_timeout}" -gt 0 ] ; then
echo "ERROR: $script_name: connect timeout must be a positive integer: ${connect_timeout}" >&2
exit 2
fi
if [ ! "${max_time}" -gt "${connect_timeout}" ]; then
echo "ERROR: $script_name: max time must be greater than the connect timeout: ${max_time}" >&2
exit 2
fi
else
echo "ERROR: $script_name: both (or neither) options -t and -m are required" >&2
exit 2
fi
if $verbose_mode; then
printf "$script_name using connect timeout: %d secs\n" $connect_timeout
printf "$script_name using max time: %d secs\n" $max_time
fi
shift $(( OPTIND - 1 ))
#####################################################################
# Initialization
#####################################################################
# create a temporary directory
TMP_DIR=$( mktemp -d 2>/dev/null || mktemp -d -t "${script_name%%.*}" )
if [ ! -d "$TMP_DIR" ] ; then
printf "ERROR: Unable to create temporary dir\n" >&2
exit 2
fi
$verbose_mode && printf "$script_name creating temp dir: %s\n" "$TMP_DIR"
# temp files
HTTP_RESPONSE_FILE="${TMP_DIR}/http_response.txt"
# read the input into a temporary file
IN_FILE="${TMP_DIR}/tmp_infile.txt"
if [ "$#" -gt 0 ]; then
# read input from the command line
while (( "$#" )); do
# copy command-line arg into the temp file
echo "$1" >> "$IN_FILE"
shift
done
else
# read input from stdin
/bin/cat - > "$IN_FILE"
fi
$verbose_mode && printf "$script_name processing temp input file: %s\n" "$IN_FILE"
# load metadata tools
md_tools_script="$BIN_DIR/md_tools.sh"
source "$md_tools_script" >&2
exit_status=$?
if [ $exit_status -ne 0 ]; then
echo "ERROR: ${script_name} failed to source script ${md_tools_script}" >&2
exit $exit_status
fi
#####################################################################
# Functions
#####################################################################
clean_up_files () {
$DO_NOT_PRINT_OUT_FILES && return
# clean up
/bin/rm -f "$NO_SAML2_HTTP_ENDPOINT_FILE"
/bin/rm -f "$NOT_SHIB_FILE"
/bin/rm -f "$SHIB_LOG_FILE"
/bin/rm -f "$SHIB2_LOG_FILE"
/bin/rm -f "$SHIB3_LOG_FILE"
/bin/rm -f "$SHIB_UNKNOWN_LOG_FILE"
}
print_no_saml2_http_endpoint_logfile () {
$DO_NOT_PRINT_OUT_FILES && return
local entityID=$1
local registrarID=$2
printf "%s %s\n" "$entityID" "$registrarID" >> "$NO_SAML2_HTTP_ENDPOINT_FILE"
}
print_not_shib_logfile () {
$DO_NOT_PRINT_OUT_FILES && return
local location=$1
local entityID=$2
local registrarID=$3
printf "%s %s %s\n" "$location" "$entityID" "$registrarID" >> "$NOT_SHIB_FILE"
}
print_logfile () {
$DO_NOT_PRINT_OUT_FILES && return
local logfile=$1
printf "%s %s %s " "$status_code" "$output" "$statusURL" >> "$logfile"
printf "%s %s %s\n" "$location" "$entityID" "$registrarID" >> "$logfile"
}
#####################################################################
# Main processing
#####################################################################
clean_up_files
if $verbose_mode; then
num_entityIDs=$( /bin/cat $IN_FILE | wc -l )
printf "$script_name processing %d entityIDs\n" $num_entityIDs
fi
# compute curl command-line options
curl_opts="--connect-timeout ${connect_timeout} --max-time ${max_time}"
curl_opts="${curl_opts} --insecure --tlsv1"
# iterate over all entityIDs in the file
/bin/cat $IN_FILE | while read entityID; do
# get the entity descriptor for this entityID
if $md_file_mode; then
entityDescriptor=$( getEntityFromFile $entityID )
else
entityDescriptor=$( getEntityFromServer $entityID )
fi
return_code=$?
if [ "$return_code" -ne 0 ]; then
echo "ERROR: $script_name: unable to obtain metadata for entityID: $entityID" >&2
[ "$return_code" -gt 1 ] && exit 1
continue
fi
# short-circuit the while-loop if this is not an IdP
if ! echo "$entityDescriptor" | grep -Fq 'IDPSSODescriptor '; then
echo "WARNING: $script_name: entity is not an IdP: $entityID" >&2
continue
fi
# extract the registrar ID from the entity descriptor
registrarID=$( echo "$entityDescriptor" \
| grep -F -m 1 ' registrationAuthority=' \
| sed -e 's/^.* registrationAuthority="\([^"]*\)".*$/\1/'
)
# extract a SAML2 HTTP endpoint location from the entity descriptor
for binding in Redirect POST POST-SimpleSign; do
location=$( echo "$entityDescriptor" \
| grep -E '<(md:)?SingleSignOnService ' \
| grep -F -m 1 ' Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-'$binding'"' \
| sed -e 's/^.* Location="\([^"]*\)".*$/\1/'
)
# terminate the for-loop if a location was found
[ -n "$location" ] && break
done
# if there is no SAML2 HTTP endpoint location, short-circuit the while-loop
if [ -z "$location" ]; then
print_no_saml2_http_endpoint_logfile "$entityID" "$registrarID"
echo "INFO: $script_name: IdP has no SAML2 HTTP endpoint location: $entityID"
continue
fi
# if the endpoint location indicates non-Shibboleth, short-circuit the while-loop
if [[ "$location" != */$binding/SSO ]]; then
print_not_shib_logfile "$location" "$entityID" "$registrarID"
echo "INFO: $script_name: entity is not a Shibboleth IdP: $entityID"
continue
fi
# compute the Status URL for a typical Shibboleth V2 IdP
statusURL=$( echo "$location" \
| sed -e 's/'$binding'\/SSO$//' -e 's/SAML2\/$//' -e 's/Shibboleth\/$//' -e 's/\/$/\/Status/'
)
# request the Status URL
output=$( /usr/bin/curl --silent \
--output "$HTTP_RESPONSE_FILE" \
$curl_opts \
--write-out 'response:%{http_code};dns:%{time_namelookup};tcp:%{time_connect};ssl:%{time_appconnect};total:%{time_total}' \
"$statusURL"
)
status_code=$?
# If the response code is 200 and the response is "ok", then the IdP is a
# a Shibboleth V2 IdP. If the response code is 404, then the IdP is probably
# a Shibboleth V3 IdP. All other results are indeterminate.
response_code=$( echo "$output" | sed -e 's/^response:\([^;]*\).*$/\1/' )
if [[ "$response_code" == 200 ]]; then
if cat "$HTTP_RESPONSE_FILE" | /usr/bin/head -n 1 | grep -q '^ok'; then
print_logfile "$SHIB2_LOG_FILE"
printf "%s %s %s %s\n" "$status_code" "$output" "$statusURL" SHIB2
else
print_logfile "$SHIB_UNKNOWN_LOG_FILE"
printf "%s %s %s %s\n" "$status_code" "$output" "$statusURL" SHIB?
fi
elif [[ "$response_code" == 404 ]]; then
print_logfile "$SHIB3_LOG_FILE"
printf "%s %s %s %s\n" "$status_code" "$output" "$statusURL" SHIB3
else
print_logfile "$SHIB_UNKNOWN_LOG_FILE"
printf "%s %s %s %s\n" "$status_code" "$output" "$statusURL" SHIB?
fi
print_logfile "$SHIB_LOG_FILE"
done
exit 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment