Skip to content

Instantly share code, notes, and snippets.

@taxilian
Last active March 21, 2023 16:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save taxilian/f5f7364eafe754d12793ce2c789548bc to your computer and use it in GitHub Desktop.
Save taxilian/f5f7364eafe754d12793ce2c789548bc to your computer and use it in GitHub Desktop.
This is my working kubernetes sentry configuration using the helm chart

Example configuration for a working sentry config on kubernetes bare metal

Special thanks to Kanadaj from the #self-hosted channel on the sentry discord who helped me with a lot of the servers I hadn't used before

Caveats

This is customized in a number of ways, so you'll probably want/need to change things, but it's working for me and I just set it up from scratch again, so I thought I'd share.

PostgreSQL

I wanted a highly available postgresql server, so I went with this operator: https://github.com/zalando/postgres-operator

Because I had my own postgres server already I used that.

Clickhouse and Zookeeper

I had weird issues (likely stemming from not knowing what I was doing) with the included clickhouse and zookeeper, so I set up my own.

The zookeeper is just a statefulset, clickhouse uses this operator: https://operatorhub.io/operator/clickhouse

To generate a new password for click house use this:

PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha256sum | tr -d '-'

The clickhouse password should be updated by you, but this is what is in the config:

  • writer password: Cd9Sow5R
  • writer password sha256 hex: 9805dc31f357e8bef91cf6e5bbb080f7ae1747a9b1f981a3f987f4b9aefe4658
  • reader password: W13qXQQQ
  • reader password sha256 hex: 7da094325cc2d9e3a61cdc1982403b9797485623616f528919c59e6d827e37a2

storageClass

I'm using local-hostpath as my storageClass which uses openebs to allocate space on the node's filesystem. That won't work for anything shared but is ideal for anything with built-in replication to avoid unneeded network activity and latency from putting things on the network that don't need to be

Customized helm chart

At the time of writing this (Mar 21, 2023) the released helm chart doesn't yet support sentry 23.3.0 quite; there are some pull requests which will fix that, they just aren't released yet. I forked it and used my own here: https://github.com/taxilian/sentry-k8s-charts/tree/develop/sentry

Then to install:

git clone <path to chart> sentry-k8s-charts
helm upgrade -n sentry --create-namespace --install sentry ./sentry-k8s-charts/sentry -f sentry-values.yaml

Remaining TODO:

To actually have this be highly available the redis cluster needs to be updated; currently it is not. It seems sentry doesn't support redis-sentinel but there is a project around that I've used which will proxy to sentinel so you can access it like a regular redis server and it will redirect you to the right one -- I'll set that up one of these days.

apiVersion: clickhouse.altinity.com/v1
kind: ClickHouseInstallation
metadata:
name: clickhouse
namespace: sentry
spec:
configuration:
clusters:
- layout:
replicasCount: 3
shardsCount: 1
name: sentry-main
templates:
podTemplate: clickhouse-21.11
zookeeper:
nodes:
- host: zk-0.zk.sentry.svc.cluster.local
port: 2181
- host: zk-1.zk.sentry.svc.cluster.local
port: 2181
- host: zk-2.zk.sentry.svc.cluster.local
port: 2181
profiles:
reader/max_memory_usage: '10000000000'
reader/readonly: '2'
writer/max_memory_usage: '10000000000'
writer/readonly: '0'
settings:
mysql_port: '9004'
users:
reader/networks/host_regexp: .*?
reader/networks/ip: 0.0.0.0/0
reader/password_sha256_hex: 7da094325cc2d9e3a61cdc1982403b9797485623616f528919c59e6d827e37a2
reader/quota: default
reader/profile: reader
reader/allow_databases/database:
- system
- default
writer/password_sha256_hex: 9805dc31f357e8bef91cf6e5bbb080f7ae1747a9b1f981a3f987f4b9aefe4658
writer/profile: writer
writer/networks/host_regexp: .*?
writer/networks/ip: 0.0.0.0/0
writer/quota: default
writer/allow_databases/database:
- system
- default
zookeeper:
nodes:
- host: zk-0.zk.sentry.svc.cluster.local
port: 2181
- host: zk-1.zk.sentry.svc.cluster.local
port: 2181
- host: zk-2.zk.sentry.svc.cluster.local
port: 2181
operation_timeout_ms: 10000
session_timeout_ms: 30000
defaults:
distributedDDL: {}
templates:
dataVolumeClaimTemplate: data
serviceTemplate: svc-template
reconciling: {}
templates:
podTemplates:
- distribution: ''
metadata:
creationTimestamp: null
name: clickhouse-21.11
spec:
containers:
- image: 'yandex/clickhouse-server:21.11.4'
name: clickhouse
resources:
limits:
cpu: '4'
memory: 24Gi
requests:
cpu: 500m
memory: 512Mi
volumeMounts:
- mountPath: /var/lib/clickhouse
name: data
zone: {}
serviceTemplates:
- metadata:
creationTimestamp: null
name: svc-template
spec:
ports:
- name: http
port: 8123
targetPort: 0
- name: tcp
port: 9000
targetPort: 0
- name: mysql
port: 9004
targetPort: 0
type: ClusterIP
volumeClaimTemplates:
- metadata:
creationTimestamp: null
name: data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: local-hostpath
templating: {}
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
cas.openebs.io/config: |
#hostpath type will create a PV by
# creating a sub-directory under the
# BASEPATH provided below.
- name: StorageType
value: "hostpath"
# Specify the location (directory) where
# where PV(volume) data will be saved.
# A sub-directory with pv-name will be
# created. When the volume is deleted,
# the PV sub-directory will be deleted.
# Default value is /var/openebs/local
- name: BasePath
value: "/data/localpv/"
openebs.io/cas-type: local
name: local-hostpath
provisioner: openebs.io/local
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
# These are the config options I used for sentry, with a few passwords and such redacted
images:
relay:
repository: getsentry/relay
tag: 23.3.0
sentry:
repository: getsentry/sentry
tag: 23.3.0
snuba:
repository: getsentry/snuba
tag: 23.3.0
symbolicator:
repository: getsentry/symbolicator
tag: 0.7.0
externalClickhouse:
database: default
host: clickhouse-clickhouse
httpPort: 8123
password: "Cd9Sow5R"
singleNode: false
tcpPort: 9000
username: writer
clusterName: sentry-main
externalPostgresql:
sslMode: require
database: sentry_new
port: 5432
username: sentry
password: foobar # update this to match your postgres config
host: acid-pgserver.patroni.svc
filestore:
backend: filesystem
filesystem:
path: /var/lib/sentry/files
persistence:
accessMode: ReadWriteMany
enabled: true
persistentWorkers: true
size: 30Gi
storageClass: rook-ssdfs # I have a shared ceph filesystem
ingress:
enabled: true
metrics:
enabled: true
serviceMonitor:
enabled: true
additionalLabels:
release: kp
team: cluster
redis:
common:
storageClass: local-hostpath
master:
persistence:
enabled: true
storageClass: local-hostpath
replica:
persistence:
enabled: true
storageClass: local-hostpath
sentry:
clickhouse:
cluster:
name: sentry-main
web:
affinity: # It would be better to add support for topologySpreadConstraint
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- sentry
- key: "role"
operator: In
values:
- web
topologyKey: "kubernetes.io/hostname"
autoscaling:
enabled: true
maxReplicas: 8
minReplicas: 3
targetCPUUtilizationPercentage: 80
worker:
autoscaling:
enabled: true
maxReplicas: 5
minReplicas: 3
targetCPUUtilizationPercentage: 80
rabbitmq:
metrics:
enabled: true
serviceMonitor:
enabled: true
additionalLabels:
release: kp
team: cluster
persistence:
enabled: true
size: 8Gi
storageClass: local-hostpath
kafka:
enabled: true
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app.kubernetes.io/name"
operator: In
values:
- kafka
- key: "app.kubernetes.io/instance"
operator: In
values:
- sentry
topologyKey: "kubernetes.io/hostname"
persistence:
enabled: true
size: 8Gi
storageClass: local-hostpath
metrics:
kafka:
enabled: true
serviceMonitor:
enabled: true
labels:
release: kp
team: cluster
persistence:
enabled: true
size: 30Gi
storageClass: local-hostpath
zookeeper:
enabled: false
externalZookeeper:
servers:
- zk-0.zk.sentry.svc.cluster.local:2181
- zk-1.zk.sentry.svc.cluster.local:2181
- zk-2.zk.sentry.svc.cluster.local:2181
postgresql:
enabled: false
clickhouse:
enabled: false
relay:
autoscaling:
enabled: true
maxReplicas: 5
minReplicas: 3
targetCPUUtilizationPercentage: 80
mode: managed
replicas: 3
mail:
backend: smtp
from: "no-reply@mmydomain.com"
host: "smtp.mydomain.com"
password: "1234567890abcdefg"
port: 465
useSsl: true
username: "thisismyusername"
sourcemaps:
enabled: true
system:
adminEmail: "me@mydomain.com"
public: false
url: "https://sentryio.mydomain.com"
symbolicator:
enabled: true
api:
autoscaling:
enabled: false
maxReplicas: 4
minReplicas: 2
targetCPUUtilizationPercentage: 80
replicas: 2
user:
create: true
email: me@mydomain.com
password: foobarraboof
zookeeper:
enabled: false
# Setup Headless Service for StatefulSet
apiVersion: v1
kind: Service
metadata:
name: zk
namespace: sentry
spec:
ports:
- port: 2181
name: client
targetPort: client
- port: 7000
name: prometheus
targetPort: prometheus
- port: 2888
name: server
targetPort: server
- port: 3888
name: leader-election
targetPort: leader-election
clusterIP: None
selector:
app: zookeeper
---
# Setup max number of unavailable pods in StatefulSet
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: zookeeper-pod-disruption-budget
namespace: sentry
spec:
selector:
matchLabels:
app: zookeeper
maxUnavailable: 1
---
# Setup Zookeeper StatefulSet
# Possible params:
# 1. replicas
# 2. memory
# 3. cpu
# 4. storage
# 5. storageClassName
# 6. user to run app
apiVersion: apps/v1
kind: StatefulSet
metadata:
# nodes would be named as zookeeper-0, zookeeper-1, zookeeper-2
name: zk
labels:
app: zookeeper
namespace: sentry
spec:
selector:
matchLabels:
app: zookeeper
serviceName: zk
replicas: 3
updateStrategy:
type: RollingUpdate
podManagementPolicy: Parallel
template:
metadata:
labels:
app: zookeeper
annotations:
prometheus.io/port: '7000'
prometheus.io/scrape: 'true'
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- zookeeper
topologyKey: "kubernetes.io/hostname"
containers:
- name: zookeeper
imagePullPolicy: IfNotPresent
image: "docker.io/zookeeper:3.7.0"
resources:
requests:
memory: "512M"
cpu: "1"
limits:
memory: "4Gi"
cpu: "2"
ports:
- containerPort: 2181
name: client
- containerPort: 2888
name: server
- containerPort: 3888
name: leader-election
- containerPort: 7000
name: prometheus
# See those links for proper startup settings:
# https://github.com/kow3ns/kubernetes-zookeeper/blob/master/docker/scripts/start-zookeeper
# https://clickhouse.yandex/docs/en/operations/tips/#zookeeper
# https://github.com/ClickHouse/ClickHouse/issues/11781
command:
- bash
- -x
- -c
- |
SERVERS=3 &&
HOST=`hostname -s` &&
DOMAIN=`hostname -d` &&
CLIENT_PORT=2181 &&
SERVER_PORT=2888 &&
ELECTION_PORT=3888 &&
PROMETHEUS_PORT=7000 &&
ZOO_DATA_DIR=/var/lib/zookeeper/data &&
ZOO_DATA_LOG_DIR=/var/lib/zookeeper/datalog &&
{
echo "clientPort=${CLIENT_PORT}"
echo 'tickTime=2000'
echo 'initLimit=300'
echo 'syncLimit=10'
echo 'maxClientCnxns=2000'
echo 'maxSessionTimeout=60000000'
echo "dataDir=${ZOO_DATA_DIR}"
echo "dataLogDir=${ZOO_DATA_LOG_DIR}"
echo 'autopurge.snapRetainCount=10'
echo 'autopurge.purgeInterval=1'
echo 'preAllocSize=131072'
echo 'snapCount=3000000'
echo 'leaderServes=yes'
echo 'standaloneEnabled=false'
echo '4lw.commands.whitelist=*'
echo 'metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider'
echo "metricsProvider.httpPort=${PROMETHEUS_PORT}"
} > /conf/zoo.cfg &&
{
echo "zookeeper.root.logger=CONSOLE"
echo "zookeeper.console.threshold=INFO"
echo "log4j.rootLogger=\${zookeeper.root.logger}"
echo "log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender"
echo "log4j.appender.CONSOLE.Threshold=\${zookeeper.console.threshold}"
echo "log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout"
echo "log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} [myid:%X{myid}] - %-5p [%t:%C{1}@%L] - %m%n"
} > /conf/log4j.properties &&
echo 'JVMFLAGS="-Xms128M -Xmx4G -XX:+UseG1GC -XX:+CMSParallelRemarkEnabled -XX:ActiveProcessorCount=8"' > /conf/java.env &&
if [[ $HOST =~ (.*)-([0-9]+)$ ]]; then
NAME=${BASH_REMATCH[1]}
ORD=${BASH_REMATCH[2]}
else
echo "Failed to parse name and ordinal of Pod"
exit 1
fi &&
mkdir -p ${ZOO_DATA_DIR} &&
mkdir -p ${ZOO_DATA_LOG_DIR} &&
export MY_ID=$((ORD+1)) &&
echo $MY_ID > $ZOO_DATA_DIR/myid &&
for (( i=1; i<=$SERVERS; i++ )); do
echo "server.$i=$NAME-$((i-1)).$DOMAIN:$SERVER_PORT:$ELECTION_PORT" >> /conf/zoo.cfg;
done &&
if [[ $SERVERS -eq 1 ]]; then
echo "group.1=1" >> /conf/zoo.cfg;
else
echo "group.1=1:2:3" >> /conf/zoo.cfg;
fi &&
for (( i=1; i<=$SERVERS; i++ )); do
WEIGHT=1
if [[ $i == 1 ]]; then
WEIGHT=10
fi
echo "weight.$i=$WEIGHT" >> /conf/zoo.cfg;
done &&
chown -Rv zookeeper "$ZOO_DATA_DIR" "$ZOO_DATA_LOG_DIR" "$ZOO_LOG_DIR" "$ZOO_CONF_DIR" &&
zkServer.sh start-foreground
readinessProbe:
exec:
command:
- bash
- -c
- "OK=$(echo ruok | nc 127.0.0.1 2181);
if [[ \"$OK\" == \"imok\" ]];
then
STATE=$(echo mntr | nc 127.0.0.1 2181 | grep zk_server_state | cut -d \" \" -f 2);
if [[ \"$STATE\" == \"leader\" ]]; then
SYNCED_FOLLOWERS=$(echo mntr | nc 127.0.0.1 2181 | grep zk_synced_followers | cut -d \" \" -f 2 | cut -d \".\" -f 1);
if [[ $SYNCED_FOLLOWERS == $(( $SERVERS - 1 )) ]]; then
./bin/zkCli.sh ls /;
exit $?;
else
exit 1;
fi;
elif [[ \"$STATE\" == \"follower\" ]]; then
PEER_STATE=$(echo mntr | nc 127.0.0.1 2181 | grep zk_peer_state);
if [[ \"$PEER_STATE\" == \"following - broadcast\" ]]; then
./bin/zkCli.sh ls /;
exit $?;
else
exit 1;
fi;
fi;
else
exit 1;
fi"
initialDelaySeconds: 10
timeoutSeconds: 5
livenessProbe:
exec:
command:
- bash
- -c
- "OK=$(echo ruok | nc 127.0.0.1 2181); if [[ \"$OK\" == \"imok\" ]]; then exit 0; else exit 1; fi"
initialDelaySeconds: 10
timeoutSeconds: 5
volumeMounts:
- name: datadir-volume
mountPath: /var/lib/zookeeper
# Run as a non-privileged user
securityContext:
runAsUser: 1000
fsGroup: 1000
volumeClaimTemplates:
- metadata:
name: datadir-volume
spec:
accessModes:
- ReadWriteOnce
storageClassName: local-hostpath
resources:
requests:
storage: 25Gi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment