Skip to content

Instantly share code, notes, and snippets.

@benjvi
benjvi / hw2-bosh-alerts.yml
Last active April 20, 2023 10:50
BOSH System-level Alerts for Healthwatch2, based on BOSH-Prometheus alerts
groups:
- name: bosh-system
rules:
- alert: BOSHVMLowFreeRAM
expr: avg(system_mem_percent{exported_job!~"^compilation.*",deployment!="bosh-health"}) by(system_domain, deployment, exported_job, index) > 90
for: 10m
labels:
service: bosh-system
severity: warning
annotations:
@benjvi
benjvi / dashboard.json
Created February 22, 2022 13:06
Grafana Dashboard for Identifying Overprovisioned & Underprovisioned Workloads
{
"__inputs": [
{
"name": "DS_PROMETHEUS",
"label": "prometheus",
"description": "",
"type": "datasource",
"pluginId": "prometheus",
"pluginName": "Prometheus"
}
@benjvi
benjvi / dashboard-apps-basic-checks-by-namespace-1643215963443.json
Created January 26, 2022 17:00
Apps in Namespace Basic Health Dashboard
{
"__inputs": [
{
"name": "DS_PROMETHEUS",
"label": "prometheus",
"description": "",
"type": "datasource",
"pluginId": "prometheus",
"pluginName": "Prometheus"
}
@benjvi
benjvi / dashboard-backup-execution-1643216020334.json
Created January 26, 2022 16:58
Backup Execution Dashboard - CronJobs & Velero
{
"__inputs": [
{
"name": "DS_PROMETHEUS",
"label": "prometheus",
"description": "",
"type": "datasource",
"pluginId": "prometheus",
"pluginName": "Prometheus"
}
@benjvi
benjvi / cert-manager-cert-alert.yml
Created January 26, 2022 16:56
Sample Alert For Expiring Cert Manager Certificates
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: ingress-certificate-expiry
namespace: monitoring
labels:
prometheus: k8s
role: alert-rules
spec:
groups:
resources:
- name: test-file
type: s3
source:
endpoint: http://MY_MINIO_ENDPOINT:9000
access_key_id: MY_ACCESS_KEY
bucket: concourse-resources
secret_access_key: MY_SECRET
regexp: test-(.*).txt
---
az-configuration:
- name: ((availability_zones.0))
- name: ((availability_zones.1))
- name: ((availability_zones.2))
network-assignment:
network:
name: management
singleton_availability_zone:
name: ((availability_zones.0))
---
opsman-configuration:
vsphere:
vcenter:
url: ((vcenter_host))
username: ((vcenter_username))
password: ((vcenter_password))
datastore: ((vcenter_datastore))
# ca_cert: TODO: add to terraform if necessary
host: example-host # vCenter host to deploy Ops Manager in
# dependencies
# cli: kubectl, pks, om, yq
alias k=kubectl
alias kube=kubectl
alias kube-use-ns="kubectl config set-context $(kubectl config current-context) --namespace"
pks-get-admin-pw() {
om curl --path "https://example.com/api/v0/deployed/products/$pks_product_id/credentials/.properties.uaa_admin_password" | jq -r ".credential.value.secret" 2> /dev/null
}
# depends on:
# CLIs: bosh, fly, credhub, om
# aliases: bosh_env_aliases (another of my gists)
alias fly='fly -t concourse'
alias whichom="env | grep -i om_"
alias ch=credhub
fly-cp-login() {
# arg[1] - control plane external url (no port)