Skip to content

Instantly share code, notes, and snippets.

Adding tremendous value

Steven Acreman sacreman

Adding tremendous value
  • ThunderOps
  • United Kingdom
View GitHub Profile
sacreman /
Last active July 12, 2019 13:07
Backup all K8s config to a timestamped location
#!/bin/bash -e
BACKUP_DIR="/var/tmp/k8sbackup/$(date +%s)"
echo "Backing up cluster to ${BACKUP_DIR}"
NAMESPACES=$(kubectl get ns -o jsonpath={.items[*]})
RESOURCETYPES="${RESOURCETYPES:-"ingress deployment configmap secret svc rc ds networkpolicy statefulset cronjob pvc"}"
GLOBALRESOURCES="${GLOBALRESOURCES:-"namespace storageclass clusterrole clusterrolebinding customresourcedefinition"}"
mkdir -p ${BACKUP_DIR}
sacreman / aks.txt
Created October 23, 2018 11:56
Dolos Results 2018-10-23
View aks.txt
2018-10-22 08:39:01 INFO Starting test
2018-10-22 08:39:01 INFO Creating Resource Group
2018-10-22 08:39:03 INFO Creating the AKS cluster
2018-10-22 08:52:22 INFO Getting cluster credentials
2018-10-22 08:52:23 INFO Get Nodes
2018-10-22 08:52:26 INFO b'NAME STATUS ROLES AGE VERSION\naks-nodepool1-18093422-0 Ready agent 3m v1.9.11\n'
2018-10-22 08:52:26 INFO Applying Deployment
2018-10-22 08:52:32 INFO b'deployment.apps/azure-vote-back created\nservice/azure-vote-back created\ndeployment.apps/azure-vote-front created\nservice/azure-vote-front created\n'
2018-10-22 08:52:32 INFO Getting external IP
2018-10-22 08:56:18 INFO Getting web contents from
View hosts.json
"dashboard": {
"title": "Hosts",
"description": "Basic host stats: CPU, Memory Usage, Disk Utilisation, Filesystem usage and Predicted time to filesystems filling",
"id": null,
"rows": [{
"collapse": false,
"editable": true,
"height": "250px",
"panels": [{
sacreman / kubernetes-dashboard.json
Created August 6, 2017 07:07
Kubernetes Dashboard for Grafana
View kubernetes-dashboard.json
"annotations": {
"list": []
"description": "Monitors Kubernetes cluster using Prometheus. Shows overall cluster CPU / Memory / Filesystem usage as well as individual pod, containers, systemd services statistics. Uses cAdvisor metrics only.",
"editable": true,
"gnetId": 315,
"graphTooltip": 0,
"hideControls": false,
"id": 2,
sacreman / prometheus.yml
Last active June 23, 2022 09:06
Prometheus configuration to scrape Kubernetes outside the cluster
View prometheus.yml
# Prometheus configuration to scrape Kubernetes outside the cluster
# Change master_ip and api_password to match your master server address and admin password
scrape_interval: 15s
evaluation_interval: 15s
# metrics for the prometheus server
- job_name: 'prometheus'

Running an online service isn't easy. Every day you make complex decisions about how to solve problems and often there is no right or wrong answer, there are just different ways with different results. On the infrastructure side you have to weigh up where everything will be hosted. Is that on a cloud service like AWS, or in your own data centres, or any number of other options, perhaps even a mix.

Monitoring choices are equally hard. There are the tools that are familiar and a known quantity, some new ones that look interesting from reading blogs, and then the option to buy one of any number of SaaS products.

Let's imagine for the sake of brevity of this blog that you are looking to move into AWS from your traditional data centre and want to upgrade from your Nagios, Graphite and StatsD stack to something a bit newer. This is actually an incredibly common scenario that we see every day.

The first decision to make is to analyse up front whether to build or buy. To properly make that decision you'll need to

  • aws
  • awsapi
  • metricsBrowser
  • aws-i


Welcome to the Dataloop API documentation!

To use the API you'll need an api key which can be created in Dataloop under your user account settings. When integrating services you may want to consider creating an application specific user in Dataloop with access to accounts at the correct role level.

You will also need to know the organisation name and account name that you want to work with. These match the organisation and account names in Dataloop. Use these details where you see <org name> and <account name> in the examples.

View sum base.count metrics
#!/usr/bin/env python
import sys
from dlcli import api
Returns a sum of the number of agents that have returned the base.count metric in the last minute.
You will need to update the TAG, org, account and key variables in the settings below.