Skip to content

Instantly share code, notes, and snippets.

@ernstki
Last active May 22, 2024 13:52
Show Gist options
  • Save ernstki/3707675c8a4ddb06d128154947c49e29 to your computer and use it in GitHub Desktop.
Save ernstki/3707675c8a4ddb06d128154947c49e29 to your computer and use it in GitHub Desktop.
Query GitLab v4 API endpoints with curl and jq; handles record pagination
#!/usr/bin/env bash
##
## Query a GitLab v4 API endpoint, with pagination
##
## Author: Kevin Ernst <ernstki -at- mail.uc.edu>
## License: WTFPL
## Date: 22 May 2024
## Requires: jq (https://github.com/jqlang/jq)
## Homepage: https://gist.github.com/ernstki/3707675c8a4ddb06d128154947c49e29
##
ME=$(basename "${BASH_SOURCE[0]}")
_urlencode() (
# Author: Chris Down (https://gist.github.com/cdown/1163649)
# with to support multiple arguments by me
# License: Unknown
LC_COLLATE=C
while (( $# )); do
for (( i = 0; i < ${#1}; i++ )); do
c=${1:i:1}
case $c in
[a-zA-Z0-9.~_-]) printf "$c" ;;
*) printf '%%%02X' "'$c" ;;
esac
done
if (( $# > 1 )); then printf '+'; fi # delimit separate args w/ +'s
shift
done
)
_gitlab_api() (
# set TRACE=1 in the environment to enable execution tracing
if (( TRACE )); then set -x; fi
set -u
: ${GITLAB_URL:?Please define GITLAB_URL as the base URL for your GitLab instance}
: ${GITLAB_TOKEN:?Please define GITLAB_TOKEN with your private token for the GitLab API}
local api=$GITLAB_URL/api/v4
local endpoint=/search
local searchterms=()
local curlargs=(
--silent
--header "Authorization: Bearer $GITLAB_TOKEN"
)
local perpage=20
local queryargs= count= totals= all= pages= wantheader=
while (( $# )); do
case $1 in
-h|--help|--flags|-\?)
echo "
$ME - query GitLab v4 API endpoints with pagination
usage:
$ME [-h|--help]
$ME [-c|--count] [-t|--totals] { /endpoint | TERM [TERM…] }
$ME [-a|--all] [-p|--pages INT] [-pp|--per-page INT]
${ME//?/ } { /endpoint | TERM [TERM…] } [&qs_arg1[&qs_arg2…]]
where:
-h, --help shows this help
-c, --count just prints the number of results and returns
-t, --totals prints HTTP headers for # pages, # per page, total results
-a, --all returns all records instead of just the first page
-p, --pages limit results to this many pages (default: 1)
-pp, --per-page specifies page size (default: $perpage)
…and other options starting with a dash are passed through to \`curl\`
examples:
$ $ME search terms # do a code search for 'search' and 'terms'
$ $ME --all '\"exact phrase\" # search for an exact phrase, all results
$ $ME -I '\"search phrase\"' # see HTTP headers for the above
$ $ME --count /projects # count how many projects
homepage:
https://gist.github.com/ernstki/3707675c8a4ddb06d128154947c49e29
"
return
;;
-c|--count)
count=1
;;
-t|--totals)
totals=1
;;
-a|--all)
all=1
;;
-p|--pages)
shift
pages=$1
;;
-pp|--pp|--per-page)
shift
perpage=$1
;;
-*)
# FIXME: should probably _only_ accept -I / --head
if [[ $1 =~ -(i|-include) ]]; then
# because it intersperses headers into JSON output which
# `jq` can't handle
echo 'Ignoring unsupported `-i` / `--include` curl option.' >&2
else
curlargs+=("$1")
fi
;;
/*)
endpoint=$1
;;
\&*)
queryargs+=$1
;;
*)
searchterms+=("$1")
;;
esac
shift
done
if [[ ${#searchterms[*]} -gt 0 ]]; then
if [[ $endpoint != /search ]]; then
echo 'ERROR: Bare search terms only accepted for `/search` endpoint.' >&2
return 1
fi
# otherwise
queryargs+="&scope=blobs&search=$(set +x; _urlencode "${searchterms[@]}")"
(( ${TRACE:-} )) && declare -p queryargs
fi
queryargs+="&per_page=$perpage"
if (( all && pages )); then
echo 'ERROR: The `--all` and `--pages` options are mutually-exclusive.' >&2
return 1
fi
if (( count )); then
# HTTP headers end with CR+LF, so make sure to get _only_ the digits
curl --head "${curlargs[@]}" "$api$endpoint?$queryargs" \
| sed -n 's/X-Total: \([[:digit:]][[:digit:]]*\).*/\1/p'
return
elif (( totals )); then
# print summary of results using HTTP request headers
curl --head "${curlargs[@]}" "$api$endpoint?$queryargs" \
| sed -nE '/X-(Page|Per-Page|Total|Total-Pages):/p'
return
elif (( all )); then
# get the total number of pages
pages=$(
curl --head "${curlargs[@]}" "$api$endpoint?$queryargs" \
| sed -n 's/X-Total-Pages: \([[:digit:]][[:digit:]]*\).*/\1/p'
)
if ! [[ $pages =~ [[:digit:]]* ]]; then
echo "ERROR: Problem fetching total pages; try TRACE=1." >&2
return 1
fi
else
if (( !pages )); then pages=1; fi
fi
if [[ "${curlargs[*]}" =~ -(I|-head[^e]) ]]; then
# only need first page of results; don't pipe through `jq` because
# it'll be confused by the HTTP headers
curl "${curlargs[@]}" "$api$endpoint?$queryargs"
else
# the first unwraps each results array, the second combines all results
# back into an array
for (( p=1; p<=pages; p++ )); do
if (( pages > 1 )); then echo "Fetching page ${p} of results…" >&2; fi
# the `[^e]` is so the regexp doesn't match `--header` (a default from above)
curl "${curlargs[@]}" "$api$endpoint?$queryargs&page=$p" | jq '.[]'
done | jq --slurp .
fi
)
# https://stackoverflow.com/a/28776166/785213
# works because you can't `return` from a script
(return 0 2>/dev/null) && sourced=1 || sourced=0
if (( !sourced )); then
_gitlab_api "$@"
fi
@ernstki
Copy link
Author

ernstki commented May 22, 2024

Borne of the need to return all the results from a GitLab code search, and run some simple summary stats on those results. This seems like something jq or a competing utility would do, but searching the InterWebs, I turned up empty-handed. The usual suggestions were just use a for loop in shell script.

Probably Xidel has some kind of support for pagination, but the way Xidel works is sometimes difficult to reason about. Shell script, though, I can do.

Tested on macOS, with the *BSD version of sed, but I don't think I've done anything there that won't work on Linux. Feedback welcome.

Installation

Make sure you have jq available in your search path.

Create a personal access token in your GitLab settings with the read_api scope.

Then:

GIST=https://gist.github.com/ernstki/3707675c8a4ddb06d128154947c49e29
mkdir -p ~/bin
( curl -L $GIST/raw || wget -O - $GIST/raw ) > ~/bin/gitlapi
chmod a+x ~/bin/gitlapi

# define these in your login scripts, or the current shell session
export GITLAB_URL=https://url.to.your/gitlab
export GITLAB_TOKEN='personal access token for read-only API'

# make sure it works
gitlapi --help

Your ~/bin is typically already in your $PATH for most modern Unixes. You may need to log out and back in if your ~/.profile or similar checks for the existence of ~/bin on login, though.

Examples

$ gitlapi search terms            # do a code search for 'search' and 'terms'
$ gitlapi --all '"exact phrase"'  # search for an exact phrase, all results
$ gitlapi -I '"search phrase"'    # see HTTP headers for the above
$ gitlapi --count /projects       # count how many projects

Bugs and misfeatures

There is no error handling if you mess up the GITLAB_URL (remember to include e.g. the /gitlab part of the URL if not served from the root) or your GITLAB_TOKEN is wrong. Here's how you can troubleshoot that, though:

TRACE=1 gitlapi -I [other options]

That is curl's -I / --head option. Other curl options like -f / --fail may work, too.

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment