Last active
May 22, 2024 13:52
-
-
Save ernstki/3707675c8a4ddb06d128154947c49e29 to your computer and use it in GitHub Desktop.
Query GitLab v4 API endpoints with curl and jq; handles record pagination
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env bash | |
## | |
## Query a GitLab v4 API endpoint, with pagination | |
## | |
## Author: Kevin Ernst <ernstki -at- mail.uc.edu> | |
## License: WTFPL | |
## Date: 22 May 2024 | |
## Requires: jq (https://github.com/jqlang/jq) | |
## Homepage: https://gist.github.com/ernstki/3707675c8a4ddb06d128154947c49e29 | |
## | |
ME=$(basename "${BASH_SOURCE[0]}") | |
_urlencode() ( | |
# Author: Chris Down (https://gist.github.com/cdown/1163649) | |
# with to support multiple arguments by me | |
# License: Unknown | |
LC_COLLATE=C | |
while (( $# )); do | |
for (( i = 0; i < ${#1}; i++ )); do | |
c=${1:i:1} | |
case $c in | |
[a-zA-Z0-9.~_-]) printf "$c" ;; | |
*) printf '%%%02X' "'$c" ;; | |
esac | |
done | |
if (( $# > 1 )); then printf '+'; fi # delimit separate args w/ +'s | |
shift | |
done | |
) | |
_gitlab_api() ( | |
# set TRACE=1 in the environment to enable execution tracing | |
if (( TRACE )); then set -x; fi | |
set -u | |
: ${GITLAB_URL:?Please define GITLAB_URL as the base URL for your GitLab instance} | |
: ${GITLAB_TOKEN:?Please define GITLAB_TOKEN with your private token for the GitLab API} | |
local api=$GITLAB_URL/api/v4 | |
local endpoint=/search | |
local searchterms=() | |
local curlargs=( | |
--silent | |
--header "Authorization: Bearer $GITLAB_TOKEN" | |
) | |
local perpage=20 | |
local queryargs= count= totals= all= pages= wantheader= | |
while (( $# )); do | |
case $1 in | |
-h|--help|--flags|-\?) | |
echo " | |
$ME - query GitLab v4 API endpoints with pagination | |
usage: | |
$ME [-h|--help] | |
$ME [-c|--count] [-t|--totals] { /endpoint | TERM [TERM…] } | |
$ME [-a|--all] [-p|--pages INT] [-pp|--per-page INT] | |
${ME//?/ } { /endpoint | TERM [TERM…] } [&qs_arg1[&qs_arg2…]] | |
where: | |
-h, --help shows this help | |
-c, --count just prints the number of results and returns | |
-t, --totals prints HTTP headers for # pages, # per page, total results | |
-a, --all returns all records instead of just the first page | |
-p, --pages limit results to this many pages (default: 1) | |
-pp, --per-page specifies page size (default: $perpage) | |
…and other options starting with a dash are passed through to \`curl\` | |
examples: | |
$ $ME search terms # do a code search for 'search' and 'terms' | |
$ $ME --all '\"exact phrase\" # search for an exact phrase, all results | |
$ $ME -I '\"search phrase\"' # see HTTP headers for the above | |
$ $ME --count /projects # count how many projects | |
homepage: | |
https://gist.github.com/ernstki/3707675c8a4ddb06d128154947c49e29 | |
" | |
return | |
;; | |
-c|--count) | |
count=1 | |
;; | |
-t|--totals) | |
totals=1 | |
;; | |
-a|--all) | |
all=1 | |
;; | |
-p|--pages) | |
shift | |
pages=$1 | |
;; | |
-pp|--pp|--per-page) | |
shift | |
perpage=$1 | |
;; | |
-*) | |
# FIXME: should probably _only_ accept -I / --head | |
if [[ $1 =~ -(i|-include) ]]; then | |
# because it intersperses headers into JSON output which | |
# `jq` can't handle | |
echo 'Ignoring unsupported `-i` / `--include` curl option.' >&2 | |
else | |
curlargs+=("$1") | |
fi | |
;; | |
/*) | |
endpoint=$1 | |
;; | |
\&*) | |
queryargs+=$1 | |
;; | |
*) | |
searchterms+=("$1") | |
;; | |
esac | |
shift | |
done | |
if [[ ${#searchterms[*]} -gt 0 ]]; then | |
if [[ $endpoint != /search ]]; then | |
echo 'ERROR: Bare search terms only accepted for `/search` endpoint.' >&2 | |
return 1 | |
fi | |
# otherwise | |
queryargs+="&scope=blobs&search=$(set +x; _urlencode "${searchterms[@]}")" | |
(( ${TRACE:-} )) && declare -p queryargs | |
fi | |
queryargs+="&per_page=$perpage" | |
if (( all && pages )); then | |
echo 'ERROR: The `--all` and `--pages` options are mutually-exclusive.' >&2 | |
return 1 | |
fi | |
if (( count )); then | |
# HTTP headers end with CR+LF, so make sure to get _only_ the digits | |
curl --head "${curlargs[@]}" "$api$endpoint?$queryargs" \ | |
| sed -n 's/X-Total: \([[:digit:]][[:digit:]]*\).*/\1/p' | |
return | |
elif (( totals )); then | |
# print summary of results using HTTP request headers | |
curl --head "${curlargs[@]}" "$api$endpoint?$queryargs" \ | |
| sed -nE '/X-(Page|Per-Page|Total|Total-Pages):/p' | |
return | |
elif (( all )); then | |
# get the total number of pages | |
pages=$( | |
curl --head "${curlargs[@]}" "$api$endpoint?$queryargs" \ | |
| sed -n 's/X-Total-Pages: \([[:digit:]][[:digit:]]*\).*/\1/p' | |
) | |
if ! [[ $pages =~ [[:digit:]]* ]]; then | |
echo "ERROR: Problem fetching total pages; try TRACE=1." >&2 | |
return 1 | |
fi | |
else | |
if (( !pages )); then pages=1; fi | |
fi | |
if [[ "${curlargs[*]}" =~ -(I|-head[^e]) ]]; then | |
# only need first page of results; don't pipe through `jq` because | |
# it'll be confused by the HTTP headers | |
curl "${curlargs[@]}" "$api$endpoint?$queryargs" | |
else | |
# the first unwraps each results array, the second combines all results | |
# back into an array | |
for (( p=1; p<=pages; p++ )); do | |
if (( pages > 1 )); then echo "Fetching page ${p} of results…" >&2; fi | |
# the `[^e]` is so the regexp doesn't match `--header` (a default from above) | |
curl "${curlargs[@]}" "$api$endpoint?$queryargs&page=$p" | jq '.[]' | |
done | jq --slurp . | |
fi | |
) | |
# https://stackoverflow.com/a/28776166/785213 | |
# works because you can't `return` from a script | |
(return 0 2>/dev/null) && sourced=1 || sourced=0 | |
if (( !sourced )); then | |
_gitlab_api "$@" | |
fi |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Borne of the need to return all the results from a GitLab code search, and run some simple summary stats on those results. This seems like something
jq
or a competing utility would do, but searching the InterWebs, I turned up empty-handed. The usual suggestions were just use afor
loop in shell script.Probably Xidel has some kind of support for pagination, but the way Xidel works is sometimes difficult to reason about. Shell script, though, I can do.
Tested on macOS, with the *BSD version of
sed
, but I don't think I've done anything there that won't work on Linux. Feedback welcome.Installation
Make sure you have
jq
available in your search path.Create a personal access token in your GitLab settings with the
read_api
scope.Then:
Your
~/bin
is typically already in your$PATH
for most modern Unixes. You may need to log out and back in if your~/.profile
or similar checks for the existence of~/bin
on login, though.Examples
Bugs and misfeatures
There is no error handling if you mess up the
GITLAB_URL
(remember to include e.g. the/gitlab
part of the URL if not served from the root) or yourGITLAB_TOKEN
is wrong. Here's how you can troubleshoot that, though:That is curl's
-I
/--head
option. Other curl options like-f
/--fail
may work, too.References