The following snippet helps to copy script from enterprise2 to a ghebooted instance:
#!/bin/bash
#/ Usage: ./copy.sh <ghe-boot-host-name>
TARGET_HOST="${1}"
TARGET_USER="admin"
TARGET_PORT="122"
set -e
The following snippet helps to copy script from enterprise2 to a ghebooted instance:
#!/bin/bash
#/ Usage: ./copy.sh <ghe-boot-host-name>
TARGET_HOST="${1}"
TARGET_USER="admin"
TARGET_PORT="122"
set -e
We initiated a systematic availability review process following our July 2024 offsite (see Revival of the GHES Availability Review Process). The first availability issue was then created on August 16th, marking almost a year since our previous review.
Our journey began by exploring what availability truly means for GHES. We recognized that an escalation's value extends beyond mere resolution - we aimed to foster deeper discussions, prevent recurrence through measured repair items, and share knowledge via comprehensive runbooks.
Over the past 6 months, we've made significant strides in our availability review processes:
You are an enterprise engineer!
Because GitHub Enterprise Server (GHES) drives our revenue and supports our largest and most recognizable clients, every engineer at GitHub, including yourself, is an enterprise engineer! This lab is an opportunity to practice a few of the concepts you'll need to test code that you write in a GHES environment.
As a prerequisite to this lab, you should watch each part of the Engineering for Enterprise Lecture(TODO). The lecture provides an overview of the tools and concepts that we will be practicing during this self-directed exercise. After watching the lecture, you should be familiar with the key concepts required to complete this lab:
{ | |
"incidentStatusedTime": "2024-10-07T11:30", | |
// "resolutionTime": "2024-10-31T04:50", // Dotcom specific | |
// "visibility": "public", //Dotcom specific | |
// "mostSignificantServiceStatus": "red",// Dotcom specific | |
//"impactedServices": [],// Dotcom specific | |
"resolvingIncidentCommander": "hubot", | |
// "incidentUrl": "https://status-staging.githubapp.com/incidents/27863", // Dotcom specific | |
// "impactStartTime": "2024-10-31T03:40", // Dotcom specific | |
// "impactDetectionTime": "2024-10-31T03:40", // Dotcom specific |
In the past two weeks, we've held two Availability Review meetings featuring excellent presenters. These meetings facilitated fruitful discussions on how we can reflect and learn from customer incidents. (In case you missed any, you can find the recording for 08-27 and 09-04)๐ค
To enhance the efficiency of our AR meetings, here's a guide on the current Availability Review process and how AR issues should be completed. We're also integrating GHES-specific requirements into overall GitHub automations. Future improvements are expected to ease and eliminate more manual steps.๐
# | Step | Info |
---|---|---|
1 | Availability Review created at end of GHES SEV 1 | this will be automated in future |
We would like to have runbook to facilitate engineer oncall
A spreadsheet is created now to start with the runbooks that are over 1 year old
#!/bin/bash | |
BUNDLE_PATH=$1 | |
git_ssh_clone=$(zgrep 'proto=ssh.*cmd=git-upload-pack.*op done' $BUNDLE_PATH/babeld-logs/babeld.log.1.gz | grep -o 'ts=[^:]*' | uniq -c | sort -nr | head -n1 | awk '{printf "%.2f", $1/3600}') | |
git_http_clone=$(zgrep 'proto=http.*cmd=git-upload-pack.*op done' $BUNDLE_PATH/babeld-logs/babeld.log.1.gz | grep -o 'ts=[^:]*' | uniq -c | sort -nr | head -n1 | awk '{printf "%.2f", $1/3600}') | |
git_ssh_push=$(zgrep 'proto=ssh.*cmd=git-receive-pack.*op done' $BUNDLE_PATH/babeld-logs/babeld.log.1.gz | grep -o 'ts=[^:]*' | uniq -c | sort -nr | head -n1 | awk '{printf "%.2f", $1/60}') | |
git_http_push=$(zgrep 'proto=http.*cmd=git-receive-pack.*op done' $BUNDLE_PATH/babeld-logs/babeld.log.1.gz | grep -o 'ts=[^:]*' | uniq -c | sort -nr | head -n1 | awk '{printf "%.2f", $1/60}') | |
code_zip=$(grep -E 'GET .*/archive/.*.zip' $BUNDLE_PATH/system-logs/haproxy.log.1 | cut -c 1-11 | uniq -c | sort -rn -k1 | head -n1 | awk '{printf "%.2f", $1/10}') |
Since this is the 3rd time a ticket is created because of this check added a while back, I think it is a good idea to have a summary and clarification on what will be allowed as a noproxy
IP address
rule | example | actual | expected | issue | checked by |
---|---|---|---|---|---|
full valid IPV4 is allowed | 192.168.10.10 | allowed | allowed | no | IPAddr.new(host_to_check) |
full valid IPV6 is allowed | 2001:42:4:0:0:1:34:0 | allowed | allowed | no | IPAddr.new(host_to_check) |
partial IP with preceeding dot as wild card is not allowed | .192.168 | not allowed | unknown | no | [IPAddr.new(host_to_check)](https://github.com/github/enterprise2/blob/master/enterprise-manage/lib/manage/validators/enterpri |