Skip to content

Instantly share code, notes, and snippets.

@meowsbits
Last active December 4, 2019 21:10
Show Gist options
  • Save meowsbits/b5a8f9d1304a77c72807c2b981091452 to your computer and use it in GitHub Desktop.
Save meowsbits/b5a8f9d1304a77c72807c2b981091452 to your computer and use it in GitHub Desktop.
Github Discourse Archiver

This small collection of scripts can be used to archive Github Discourse data (Pull Requests, Issues) using Git version control.

How it works:

  • gh-clone* scripts use the Github v3 API to download discourse information.

    • Issues
      • Issue comments, with reactions
      • Issue events
    • Pull Requests
      • PR comments, with reactions
      • PR reviews
        • PR review comments
      • PR events
    • Repo Events
      • Includes Issues and PRs (which can possibly record events in duplicate, where they will exist in the .gh-issue dir under an issue, and also under .gh-events).

        I use this with creative freedom to not care about duplicate data, but in prioritizing a FULL PICTURE. For example, if a label is added to an issue, that doesn't necessarily show up in the issue events (as scripted) unless the Issue itself actually changes (since we only process updated Issues, instead of having to process them all all the time.)

  • scribe.sh is the "manager" script, and calls the gh-clone* scripts and some trivial repository state management.

You can either use these as one-off scripts, or list them as a cron job, or whatever.

These are adhoc, and not intended to be super robust (or well coded), but are a working proof-of-concept for my own personal use.

#!/usr/bin/env bash
help() {
if [[ ! -z "$1" ]]
then
echo "Error: $1"
trap 'exit 1' RETURN
fi
cat <<EOF
Overview:
Queries the Github APIv3 to collect all issues and their comments from a repository.
The token you use must have read access to the repository.
Data will be referenced and stored as such:
${ISSUES_DIR}/.response.json <- temporary
${ISSUES_DIR}/.response-header <- temporary
${ISSUES_DIR}/.state
${ISSUES_DIR}/<issue_number>.json
${ISSUES_DIR}/<issue_number>_<issuecomment_id>.json
The '${ISSUES_DIR}/.state' file will contain an ISO8601 datetime, which the script will use
as the 'since' parameter for it's queries, to avoid a lot of redundancy and API use.
When the script is finished, it will update this value with the datetime at which
the script began to run.
Dependencies:
- jj , https://github.com/tidwall/jj , Must be in PATH
- Environment variable GITHUB_TOKEN must be set in order to access the Github API.
Basic use:
Run:
$0 :owner/:repo
Advanced use:
Force re-download.
rm ./${ISSUES_DIR}/.state
Download all issues+issuecomments since ____.
vim ./${ISSUES_DIR}/.state/
EOF
}
ISSUES_DIR=".gh-issues"
owner_repo="$1"
[[ -z "$owner_repo" ]] && help "Invalid argument(s)"
[[ $# -gt 1 ]] && help "Invalid argument(s)"
[[ -z "$GITHUB_TOKEN" ]] && help "GITHUB_TOKEN not set"
command -v jj || { help "Dependency unmet"; }
mkdir -p ${ISSUES_DIR}
[[ -f ${ISSUES_DIR}/.state && $(wc -l <${ISSUES_DIR}/.state) -gt 0 ]] || date --date="2009-01-02 03:04:05" +"%Y-%m-%dT%H:%M:%SZ" >${ISSUES_DIR}/.state
# Because we'll want to use a datetime for state that doesn't leave much
# abyss time;
# say this script took 12 minues to run (which it doesn't, but bear with me),
# then if someone posted a comment during those 12 minutes and we were to
# stamp the state with the time of the script's completion -- and not it's start --
# then unbeknownst the us, that comment would be permanently foresaken to an
# abysmal pergatory of unremembrance.
start="$(date -u +"%Y-%m-%dT%H:%M:%SZ")"
process_issue_events() {
local _n=0
local _max; _max=$(jj -i ./${ISSUES_DIR}/.response-events.json '#')
while [[ $_n -lt $_max ]]; do
echo "Processing issue events issue_number: $1 index: $_n"
_j_cmd=/"$(which jj) -i ${ISSUES_DIR}/.response-events.json -n $_n"
[[ ! -z $($_j_cmd) ]] || break
_issue_number="$1"
# Ensure issue/pr for the id'd event resource actually exists.
# NOTE This assumes that Issues and PRs have been downloaded before their respective comments.
[[ ! -f "${ISSUES_DIR}/${_issue_number}.json" ]] && _n=$((_n + 1)) && continue
_issueevent_number="$($_j_cmd.id)"
$_j_cmd >"${ISSUES_DIR}/${_issue_number}_event_${_issueevent_number}.json"
_n=$((_n + 1))
done
}
get_issue_events() {
issue_number="$(echo $1 | sed -E 's/^0+//')"
# Sailor V preview: get lock reasons
curl >${ISSUES_DIR}/.response-events.json 2>&1 \
--silent --show-error \
-H "Authorization: token ${GITHUB_TOKEN}" \
-H "Accept: application/vnd.github.sailor-v-preview+json" \
-D "${ISSUES_DIR}/.response-events-header" \
'https://api.github.com/repos/'"${owner_repo}"'/issues/'${issue_number}'/events?page='$2'&per_page=100&sort=updated&since='"$(head -n1 <${ISSUES_DIR}/.state)"
echo "Finished issues events request issue_number ${issue_number}"
grep -v "200" ${ISSUES_DIR}/.response-events.json && process_issue_events ${1} # Use issue_number WITHOUT leading 0's trimmed
}
# It's possible these could be refactored to be DRYer.
# But there's something to be said for saying something.
process_issues() {
local _n=0
local _max; _max=$(jj -i ${ISSUES_DIR}/.response.json '#')
while [[ $_n -lt $_max ]]; do
echo "Processing issue index $_n"
_j_cmd=/"$(which jj) -i ${ISSUES_DIR}/.response.json -n $_n"
[[ ! -z $($_j_cmd) ]] || break
[[ ! -z $($_j_cmd.pull_request) ]] && _n=$((_n + 1)) && continue
_issue_number="$(printf '%05d' $($_j_cmd.number))"
$_j_cmd >"${ISSUES_DIR}/${_issue_number}.json"
page=1
while grep -q 'next' ${ISSUES_DIR}/.response-events-header || [[ $page == 1 ]]; do
get_issue_events ${_issue_number} ${page}
page=$((page + 1))
done
_n=$((_n + 1))
done
}
get_issues() {
# Squirrel girl alert: Developer preview for reactions summary
# https://developer.github.com/v3/issues/#reactions-summary
curl >${ISSUES_DIR}/.response.json 2>&1 \
--silent --show-error \
-H "Authorization: token ${GITHUB_TOKEN}" \
-H "Accept: application/vnd.github.squirrel-girl-preview" \
-D "${ISSUES_DIR}/.response-header" \
'https://api.github.com/repos/'"${owner_repo}"'/issues?state=all&page='$1'&per_page=100&sort=updated&since='"$(head -n1 <${ISSUES_DIR}/.state)"
echo "Finished issues request"
grep -v "200" ${ISSUES_DIR}/.response.json && process_issues
}
process_issuecomments() {
local _n=0
local _max; _max=$(jj -i ./${ISSUES_DIR}/.response.json '#')
while [[ $_n -lt $_max ]]; do
echo "Processing issuecomment index $_n"
_j_cmd=/"$(which jj) -i ${ISSUES_DIR}/.response.json -n $_n"
[[ ! -z $($_j_cmd) ]] || break
_issue_number="$(printf '%05d' $(basename $($_j_cmd.issue_url)))" # HACK
# We need a way to tell Issue Comments vs. PR Comments
# This assumes that Issues have been downloaded before their respective comments.
[[ ! -f "${ISSUES_DIR}/${_issue_number}.json" ]] && _n=$((_n + 1)) && continue
_issuecomment_number="$($_j_cmd.id)"
$_j_cmd >"${ISSUES_DIR}/${_issue_number}_${_issuecomment_number}.json"
_n=$((_n + 1))
done
}
get_issuecomments() {
# Squirrel girl alert: Developer preview for reactions summary
# https://developer.github.com/v3/issues/comments/#reactions-summary-1
curl >${ISSUES_DIR}/.response.json 2>&1 \
--silent --show-error \
-H "Authorization: token ${GITHUB_TOKEN}" \
-H "Accept: application/vnd.github.squirrel-girl-preview" \
-D "${ISSUES_DIR}/.response-header" \
'https://api.github.com/repos/'"${owner_repo}"'/issues/comments?state=all&page='$1'&per_page=100&sort=updated&since='"$(head -n1 <${ISSUES_DIR}/.state)"
echo "Finished issuecomments request"
grep -v "200" ${ISSUES_DIR}/.response.json && process_issuecomments
}
onexit() {
rm ${ISSUES_DIR}/.response{.json,-header}
rm -rf ${ISSUES_DIR}/.response-events{.json,-header} # use flags to allow fail
echo "${start}" >${ISSUES_DIR}/.state
}
trap onexit EXIT
touch ${ISSUES_DIR}/.response-header
touch ${ISSUES_DIR}/.response-events-header
page=1
while grep -q 'next' ${ISSUES_DIR}/.response-header || [[ $page == 1 ]]; do
get_issues ${page}
page=$((page + 1))
done
page=1
while grep -q 'next' ${ISSUES_DIR}/.response-header || [[ $page == 1 ]]; do
get_issuecomments ${page}
page=$((page + 1))
done
#!/usr/bin/env bash
help() {
if [[ ! -z "$1" ]]
then
echo "Error: $1"
trap 'exit 1' RETURN
fi
cat <<EOF
Overview:
Queries the Github APIv3 to collect all pull requests and their comments from a repository.
The token you use must have read access to the repository.
Data will be referenced and stored as such:
${ISSUES_DIR}/.response.json <- temporary
${ISSUES_DIR}/.response-header <- temporary
${ISSUES_DIR}/.state
${ISSUES_DIR}/<issue_number>.json
${ISSUES_DIR}/<issue_number>_<issuecomment_id>.json
The '${ISSUES_DIR}/.state' file will contain an ISO8601 datetime, which the script will use
as the 'since' parameter for it's queries, to avoid a lot of redundancy and API use.
When the script is finished, it will update this value with the datetime at which
the script began to run.
Developer's note:
With Github's v3 API, all Pull Requests are Issues, but not
all Issues are Pull Requests. Since I'm reusing the script that clones Issues,
and since Pull Requests are (kind of) Issues, I'm going to leave the variable
and function names the same, changing as little as possible.
Dependencies:
- jj , https://github.com/tidwall/jj , Must be in PATH
- Environment variable GITHUB_TOKEN must be set in order to access the Github API.
Basic use:
Run:
$0 :owner/:repo
Advanced use:
Force re-download.
rm ./${ISSUES_DIR}/.state
Download all issues+issuecomments since ____.
vim ./${ISSUES_DIR}/.state/
EOF
}
ISSUES_DIR=".gh-pullrequests"
owner_repo="$1"
[[ -z "$owner_repo" ]] && help "Invalid argument(s)"
[[ $# -gt 1 ]] && help "Invalid argument(s)"
[[ -z "$GITHUB_TOKEN" ]] && help "GITHUB_TOKEN not set"
command -v jj || { help "Dependency unmet"; }
mkdir -p ${ISSUES_DIR}
[[ -f ${ISSUES_DIR}/.state && $(wc -l <${ISSUES_DIR}/.state) -gt 0 ]] || date --date="2009-01-02 03:04:05" +"%Y-%m-%dT%H:%M:%SZ" >${ISSUES_DIR}/.state
# Because we'll want to use a datetime for state that doesn't leave much
# abyss time;
# say this script took 12 minues to run (which it doesn't, but bear with me),
# then if someone posted a comment during those 12 minutes and we were to
# stamp the state with the time of the script's completion -- and not it's start --
# then unbeknownst the us, that comment would be permanently foresaken to an
# abysmal pergatory of unremembrance.
start="$(date -u +"%Y-%m-%dT%H:%M:%SZ")"
process_issue_events() {
local _n=0
local _max; _max=$(jj -i ./${ISSUES_DIR}/.response-events.json '#')
while [[ $_n -lt $_max ]]; do
echo "Processing issue events issue_number: $1 index: $_n"
_j_cmd=/"$(which jj) -i ${ISSUES_DIR}/.response-events.json -n $_n"
[[ ! -z $($_j_cmd) ]] || break
_issue_number="$1"
# Ensure issue/pr for the id'd event resource actually exists.
# NOTE This assumes that Issues and PRs have been downloaded before their respective comments.
[[ ! -f "${ISSUES_DIR}/${_issue_number}.json" ]] && _n=$((_n + 1)) && continue
_issueevent_number="$($_j_cmd.id)"
$_j_cmd >"${ISSUES_DIR}/${_issue_number}_event_${_issueevent_number}.json"
_n=$((_n + 1))
done
}
get_issue_events() {
issue_number="$(echo $1 | sed -E 's/^0+//')"
# Sailor V preview: get lock reasons
curl >${ISSUES_DIR}/.response-events.json 2>&1 \
--silent --show-error \
-H "Authorization: token ${GITHUB_TOKEN}" \
-H "Accept: application/vnd.github.sailor-v-preview+json" \
-D "${ISSUES_DIR}/.response-events-header" \
'https://api.github.com/repos/'"${owner_repo}"'/issues/'${issue_number}'/events?page='$2'&per_page=100&sort=updated&since='"$(head -n1 <${ISSUES_DIR}/.state)"
echo "Finished issues events request issue_number ${issue_number}"
grep -v "200" ${ISSUES_DIR}/.response-events.json && process_issue_events ${1} # Use issue_number WITHOUT leading 0's trimmed
}
process_issue_reviewcomments() {
local _n=0
local _max; _max=$(jj -i ./${ISSUES_DIR}/.response-reviewcomments.json '#')
while [[ $_n -lt $_max ]]; do
echo "Processing issue review comments issue_number: $1 review number: $2 index: $_n"
_j_cmd=/"$(which jj) -i ${ISSUES_DIR}/.response-reviewcomments.json -n $_n"
[[ ! -z $($_j_cmd) ]] || break
_issue_number="$1"
_review_id="$2"
# Ensure issue/pr for the id'd reviews resource actually exists.
# NOTE This assumes that Issues and PRs have been downloaded before their respective comments.
[[ ! -f "${ISSUES_DIR}/${_issue_number}.json" ]] && _n=$((_n + 1)) && continue
_issuereviewcomment_id="$($_j_cmd.id)"
$_j_cmd >"${ISSUES_DIR}/${_issue_number}_review_${_review_id}_${_issuereviewcomment_id}.json"
_n=$((_n + 1))
done
}
get_issue_reviewcomments() {
issue_number="$1"
review_id="$2"
# Sailor V preview: get lock reasons
curl >${ISSUES_DIR}/.response-reviewcomments.json 2>&1 \
--silent --show-error \
-H "Authorization: token ${GITHUB_TOKEN}" \
-H "Accept: application/vnd.github.sailor-v-preview+json" \
-D "${ISSUES_DIR}/.response-reviewcomments-header" \
'https://api.github.com/repos/'"${owner_repo}"'/pulls/'$(echo $1 | sed -E 's/^0+//')'/reviews/'${review_id}'/comments?page='$3'&per_page=100&sort=updated&since='"$(head -n1 <${ISSUES_DIR}/.state)"
echo "Finished review comments request issue(pull) number: $1 review_id ${review_id}"
grep -v "200" ${ISSUES_DIR}/.response-reviewcomments.json && process_issue_reviewcomments ${issue_number} ${review_id}
}
process_issue_reviews() {
local _n=0
local _max; _max=$(jj -i ./${ISSUES_DIR}/.response-reviews.json '#')
while [[ $_n -lt $_max ]]; do
echo "Processing issue reviews issue_number: $1 index: $_n"
_j_cmd=/"$(which jj) -i ${ISSUES_DIR}/.response-reviews.json -n $_n"
[[ ! -z $($_j_cmd) ]] || break
_issue_number="$1"
# Ensure issue/pr for the id'd reviews resource actually exists.
# NOTE This assumes that Issues and PRs have been downloaded before their respective comments.
[[ ! -f "${ISSUES_DIR}/${_issue_number}.json" ]] && _n=$((_n + 1)) && continue
_issuereview_id="$($_j_cmd.id)"
$_j_cmd >"${ISSUES_DIR}/${_issue_number}_review_${_issuereview_id}.json"
page=1
while grep -q 'next' ${ISSUES_DIR}/.response-reviewcomments-header || [[ $page == 1 ]]; do
get_issue_reviewcomments ${_issue_number} ${_issuereview_id} ${page}
page=$((page + 1))
done
_n=$((_n + 1))
done
}
get_issue_reviews() {
issue_number="$(echo $1 | sed -E 's/^0+//')"
# Sailor V preview: get lock reasons
curl >${ISSUES_DIR}/.response-reviews.json 2>&1 \
--silent --show-error \
-H "Authorization: token ${GITHUB_TOKEN}" \
-H "Accept: application/vnd.github.sailor-v-preview+json" \
-D "${ISSUES_DIR}/.response-reviews-header" \
'https://api.github.com/repos/'"${owner_repo}"'/pulls/'${issue_number}'/reviews?page='$2'&per_page=100&sort=updated&since='"$(head -n1 <${ISSUES_DIR}/.state)"
echo "Finished issues reviews request issue_number ${issue_number}"
grep -v "200" ${ISSUES_DIR}/.response-reviews.json && process_issue_reviews ${1} # Use issue_number WITHOUT leading 0's trimmed
}
# It's possible these could be refactored to be DRYer.
# But there's something to be said for saying something.
process_issues() {
local _n=0
local _max; _max=$(jj -i ${ISSUES_DIR}/.response.json '#')
while [[ $_n -lt $_max ]]; do
echo "Processing issue index $_n"
_j_cmd=/"$(which jj) -i ${ISSUES_DIR}/.response.json -n $_n"
[[ ! -z $($_j_cmd) ]] || break
[[ -z $($_j_cmd.pull_request) ]] && _n=$((_n + 1)) && continue
_issue_number="$(printf '%05d' $($_j_cmd.number))"
$_j_cmd >"${ISSUES_DIR}/${_issue_number}.json"
curl > ${ISSUES_DIR}/${_issue_number}.patch 2>&1 \
-L --silent --show-error \
-H "Authorization: token ${GITHUB_TOKEN}" \
-D "${ISSUES_DIR}/.response-header" \
"$($_j_cmd.pull_request.patch_url)"
page=1
while grep -q 'next' ${ISSUES_DIR}/.response-events-header || [[ $page == 1 ]]; do
get_issue_events ${_issue_number} ${page}
page=$((page + 1))
done
page=1
while grep -q 'next' ${ISSUES_DIR}/.response-reviews-header || [[ $page == 1 ]]; do
get_issue_reviews ${_issue_number} ${page}
page=$((page + 1))
done
_n=$((_n + 1))
done
}
get_issues() {
# Squirrel girl alert: Developer preview for reactions summary
# https://developer.github.com/v3/issues/#reactions-summary
curl >${ISSUES_DIR}/.response.json 2>&1 \
--silent --show-error \
-H "Authorization: token ${GITHUB_TOKEN}" \
-H "Accept: application/vnd.github.squirrel-girl-preview" \
-D "${ISSUES_DIR}/.response-header" \
'https://api.github.com/repos/'"${owner_repo}"'/issues?state=all&page='$1'&per_page=100&sort=updated&since='"$(head -n1 <${ISSUES_DIR}/.state)"
echo "Finished issues request"
grep -v "200" ${ISSUES_DIR}/.response.json && process_issues
}
process_issuecomments() {
local _n=0
local _max; _max=$(jj -i ./${ISSUES_DIR}/.response.json '#')
while [[ $_n -lt $_max ]]; do
echo "Processing issuecomment index $_n"
_j_cmd=/"$(which jj) -i ${ISSUES_DIR}/.response.json -n $_n"
[[ ! -z $($_j_cmd) ]] || break
_issue_number="$(printf '%05d' $(basename $($_j_cmd.issue_url)))" # HACK
# We need a way to tell Issue Comments vs. PR Comments
# This assumes that Issues have been downloaded before their respective comments.
[[ ! -f "${ISSUES_DIR}/${_issue_number}.json" ]] && _n=$((_n + 1)) && continue
_issuecomment_number="$($_j_cmd.id)"
$_j_cmd >"${ISSUES_DIR}/${_issue_number}_${_issuecomment_number}.json"
_n=$((_n + 1))
done
}
get_issuecomments() {
# Squirrel girl alert: Developer preview for reactions summary
# https://developer.github.com/v3/issues/comments/#reactions-summary-1
curl >${ISSUES_DIR}/.response.json 2>&1 \
--silent --show-error \
-H "Authorization: token ${GITHUB_TOKEN}" \
-H "Accept: application/vnd.github.squirrel-girl-preview" \
-D "${ISSUES_DIR}/.response-header" \
'https://api.github.com/repos/'"${owner_repo}"'/issues/comments?state=all&page='$1'&per_page=100&sort=updated&since='"$(head -n1 <${ISSUES_DIR}/.state)"
echo "Finished issuecomments request"
grep -v "200" ${ISSUES_DIR}/.response.json && process_issuecomments
}
process_prcomments() {
local _n=0
local _max; _max=$(jj -i ./${ISSUES_DIR}/.response.json '#')
while [[ $_n -lt $_max ]]; do
echo "Processing prcomment index $_n"
_j_cmd=/"$(which jj) -i ${ISSUES_DIR}/.response.json -n $_n"
[[ ! -z $($_j_cmd) ]] || break
_issue_number="$(printf '%05d' $(basename $($_j_cmd.pull_request_url)))" # HACK
# We need a way to tell Issue Comments vs. PR Comments
# This assumes that Issues have been downloaded before their respective comments.
[[ ! -f "${ISSUES_DIR}/${_issue_number}.json" ]] && _n=$((_n + 1)) && continue
_issuecomment_number="$($_j_cmd.id)"
$_j_cmd >"${ISSUES_DIR}/${_issue_number}_prcomment_${_issuecomment_number}.json"
_n=$((_n + 1))
done
}
get_prcomments() {
# Comfort fade alert: Developer preview for multi-line comments
# https://developer.github.com/v3/pulls/comments/#list-comments-in-a-repository
curl >${ISSUES_DIR}/.response.json 2>&1 \
--silent --show-error \
-H "Authorization: token ${GITHUB_TOKEN}" \
-H "Accept: application/vnd.github.comfort-fade-preview+json" \
-D "${ISSUES_DIR}/.response-header" \
'https://api.github.com/repos/'"${owner_repo}"'/pulls/comments?state=all&page='$1'&per_page=100&sort=updated&since='"$(head -n1 <${ISSUES_DIR}/.state)"
echo "Finished prcomments request"
grep -v "200" ${ISSUES_DIR}/.response.json && process_prcomments
}
onexit() {
# rm ${ISSUES_DIR}/.response{.json,-header}
# rm -rf ${ISSUES_DIR}/.response-events{.json,-header} # use flags to allow fail
rm -rf ${ISSUES_DIR}/.response*
echo "${start}" >${ISSUES_DIR}/.state
}
trap onexit EXIT
touch ${ISSUES_DIR}/.response{,-events,-reviews,-reviewcomments}-header
# touch ${ISSUES_DIR}/.response-header
# touch ${ISSUES_DIR}/.response-events-header
# touch ${ISSUES_DIR}/.response-reviews-header
# touch ${ISSUES_DIR}/.response-reviewcomments-header
page=1
while grep -q 'next' ${ISSUES_DIR}/.response-header || [[ $page == 1 ]]; do
get_issues ${page}
page=$((page + 1))
done
page=1
while grep -q 'next' ${ISSUES_DIR}/.response-header || [[ $page == 1 ]]; do
get_issuecomments ${page}
page=$((page + 1))
done
page=1
while grep -q 'next' ${ISSUES_DIR}/.response-header || [[ $page == 1 ]]; do
get_prcomments ${page}
page=$((page + 1))
done
#!/usr/bin/env bash
help() {
if [[ ! -z "$1" ]]
then
echo "Error: $1"
trap 'exit 1' RETURN
fi
cat <<EOF
Overview:
Queries the Github APIv3 to collect all events from a repository.
The token you use must have read access to the repository.
Data will be referenced and stored as such:
${EVENTS_DIR}/.response.json <- temporary
${EVENTS_DIR}/.response-header <- temporary
${EVENTS_DIR}/.state
${EVENTS_DIR}/<issue_number>.json
${EVENTS_DIR}/<issue_number>_<issuecomment_id>.json
The '${EVENTS_DIR}/.state' file will contain an ISO8601 datetime, which the script will use
as the 'since' parameter for it's queries, to avoid a lot of redundancy and API use.
When the script is finished, it will update this value with the datetime at which
the script began to run.
Dependencies:
- jj , https://github.com/tidwall/jj , Must be in PATH
- Environment variable GITHUB_TOKEN must be set in order to access the Github API.
Basic use:
Run:
$0 :owner/:repo
Advanced use:
Force re-download.
rm ./${EVENTS_DIR}/.state
Download all issues+issuecomments since ____.
vim ./${EVENTS_DIR}/.state/
EOF
}
EVENTS_DIR=".gh-events"
owner_repo="$1"
[[ -z "$owner_repo" ]] && help "Invalid argument(s)"
[[ $# -gt 1 ]] && help "Invalid argument(s)"
[[ -z "$GITHUB_TOKEN" ]] && help "GITHUB_TOKEN not set"
command -v jj || { help "Dependency unmet"; }
mkdir -p ${EVENTS_DIR}
[[ -f ${EVENTS_DIR}/.state && $(wc -l <${EVENTS_DIR}/.state) -gt 0 ]] || date --date="2009-01-02 03:04:05" +"%Y-%m-%dT%H:%M:%SZ" >${EVENTS_DIR}/.state
# Because we'll want to use a datetime for state that doesn't leave much
# abyss time;
# say this script took 12 minues to run (which it doesn't, but bear with me),
# then if someone posted a comment during those 12 minutes and we were to
# stamp the state with the time of the script's completion -- and not it's start --
# then unbeknownst the us, that comment would be permanently foresaken to an
# abysmal pergatory of unremembrance.
start="$(date -u +"%Y-%m-%dT%H:%M:%SZ")"
should_update_state=0
process_repo_events() {
local _n=0
local _max; _max=$(jj -i ./${EVENTS_DIR}/.response.json '#')
while [[ $_n -lt $_max ]]; do
echo "Processing repo events issue_number: $1 index: $_n"
_j_cmd=/"$(which jj) -i ${EVENTS_DIR}/.response.json -n $_n"
[[ ! -z $($_j_cmd) ]] || break
_event_number="$($_j_cmd.id)"
$_j_cmd >"${EVENTS_DIR}/${_event_number}.json"
_n=$((_n + 1))
done
}
get_repo_events() {
etag=""
if [[ -f ${EVENTS_DIR}/.etag ]]; then
etag="$(cat ${EVENTS_DIR}/.etag | sed 's/^ETag://g' | sed -E 's/\s//g' | head -n1)"
fi
# Sailor V preview: get lock reasons
curl >${EVENTS_DIR}/.response.json 2>&1 \
--silent --show-error \
-H "Authorization: token ${GITHUB_TOKEN}" \
-H "If-None-Match: ${etag}" \
-H "Accept: application/vnd.github.sailor-v-preview+json" \
-H "Accept: application/vnd.github.symmetra-preview+json" \
-D "${EVENTS_DIR}/.response-header" \
'https://api.github.com/repos/'"${owner_repo}"'/issues/events?page='$1'&per_page=100&sort=updated'
echo "Finished repo events request"
if grep -q "400" ${EVENTS_DIR}/.response-header; then
echo "Bad request."
exit 1
fi
if grep -q "304" ${EVENTS_DIR}/.response-header; then
should_update_state=1
echo "No news is good news."
exit 0
fi
if grep -q "200" ${EVENTS_DIR}/.response-header; then
process_repo_events
cat "${EVENTS_DIR}/.response-header" | grep -E '^ETag' > ${EVENTS_DIR}/.etag
else
echo Weird response.
exit 1
fi
}
onexit() {
rm -rf ${EVENTS_DIR}/.response{.json,-header}
if [[ $should_update_state -eq 0 ]]; then
echo "${start}" >${EVENTS_DIR}/.state
fi
}
trap onexit EXIT
touch ${EVENTS_DIR}/.response-header
page=1
while grep -q 'next' ${EVENTS_DIR}/.response-header || [[ $page == 1 ]]; do
get_repo_events ${page}
page=$((page + 1))
done
#!/usr/bin/env bash
set -e
target_path=${1:-"$HOME/dev/ethereumclassic/ECIPs"}
target_origin=${2:-'ethereumclassic/ECIPS'}
target_fork=${3:-'meowsbits/ECIPs'}
remote_ssh=${4:-'git@github.com'} # meows
if [[ ! -d "$target_path" ]]; then
mkdir -p "$(dirname $target_path)"
git clone "meows:$target_origin.git" "$target_path"
git --git-dir "$target_path/.git" remote add fork "meows:$target_fork"
fi
pushd "$target_path"
git fetch --all
git checkout -B gh
git pull --no-edit fork gh || true
gh-clonepullrequests.sh "$target_origin"
gh-cloneissues.sh "$target_origin"
gh-clonerepoevents.sh "$target_origin"
if git diff-files --quiet; then
echo "Super quiet on the octocat front... maybe too quiet.. aborting."
exit 1
fi
files_diff_l="$(git diff-files | grep -v .state | grep -v .etag | wc -l)"
untracked_ls="$(git ls-files --exclude-standard --others | wc -l)"
if [[ $files_diff_l -gt 0 ]] || [[ $untracked_ls -gt 0 ]]; then
git add .
git -c 'user.name=meows' -c 'user.email=b5c6@protonmail.com' commit -m "master: $(git rev-parse origin/master)"
git push fork gh
else
echo "Quiet on the octocat front."
fi
popd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment