Skip to content

Instantly share code, notes, and snippets.

Avatar

Alexander Trost galexrt

View GitHub Profile
@galexrt
galexrt / rook-ceph-mon50
Last active Aug 20, 2017
rook Ceph mon dying
View rook-ceph-mon50
2017-08-20 15:13:12.638284 I | cephmon: parsing mon endpoints: rook-ceph-mon48=10.3.44.89:6790,rook-ceph-mon49=10.3.150.63:6790,rook-ceph-mon51=10.3.157.215:6790,rook-ceph-mon50=10.3.162.13:6790
2017-08-20 15:13:12.638615 I | cephmon: writing config file /var/lib/rook/rook/rook.config
2017-08-20 15:13:12.638724 I | cephmon: generated admin config in /var/lib/rook/rook
2017-08-20 15:13:12.638823 I | cephmon: writing config file /var/lib/rook/rook-ceph-mon49/rook.config
2017-08-20 15:13:12.638905 I | cephmon: initializing mon
2017-08-20 15:13:12.638913 I | exec: Running command: monmaptool /var/lib/rook/rook-ceph-mon49/monmap --create --clobber --fsid 2c50d7fc-8533-45bd-bf42-970c435e4a3d --add rook-ceph-mon50 10.3.162.13:6790 --add rook-ceph-mon48 10.3.44.89:6790 --add rook-ceph-mon49 10.3.150.63:6790 --add rook-ceph-mon51 10.3.157.215:6790
2017-08-20 15:13:12.657429 I | monmaptool: monmap file /var/lib/rook/rook-ceph-mon49/monmap
2017-08-20 15:13:12.657448 I | monmaptool: set fsid to 2c50d7fc-8533-45bd-bf42-97
View keybase.md

Keybase proof

I hereby claim:

  • I am galexrt on github.
  • I am galexrt (https://keybase.io/galexrt) on keybase.
  • I have a public key whose fingerprint is 0604 9CBE C64F 07E9 5098 7B6D 5CF1 D460 0D4C BFDF

To claim this, I am signing this object:

@galexrt
galexrt / with-fs-and-normal-pool
Last active Dec 28, 2017
Rook Ceph Mgr Metrics Output
View with-fs-and-normal-pool
# HELP ceph_mds_cache_recovery_completed File recoveries completed
# TYPE ceph_mds_cache_recovery_completed counter
ceph_mds_cache_recovery_completed{ceph_daemon="mds.mhgmp2"} 0.0
ceph_mds_cache_recovery_completed{ceph_daemon="mds.m7dwwf"} 0.0
# HELP ceph_osd_op_out_bytes Client operations total read size
# TYPE ceph_osd_op_out_bytes counter
ceph_osd_op_out_bytes{ceph_daemon="osd.0"} 5477.0
# HELP ceph_pg_incomplete PG incomplete
# TYPE ceph_pg_incomplete gauge
View clone-all-galexrt-github-repos-jq.sh
GITHUB_USERNAME="galexrt"
for repo in $(curl -s https://api.github.com/users/${GITHUB_USERNAME}/repos\?per_page\=200 | jq --raw-output '.[] | select(.fork != true) | .ssh_url'); do
git clone "${repo}"
done
View rook-cluster.yaml
apiVersion: rook.io/v1alpha1
kind: Cluster
metadata:
name: rook
namespace: rook
spec:
versionTag: master
dataDirHostPath: /var/lib/rook/config
hostNetwork: true
monCount: 3
@galexrt
galexrt / ceph-reboot-status-check.sh
Last active Mar 2, 2018
Improved version of the ceph reboot status check script from [rook/rook - Pull Request: Added ceph-reboot-script using the Container Linux Update Operator #1492](https://github.com/rook/rook/pull/1492).
View ceph-reboot-status-check.sh
#!/bin/bash
# preflightCheck checks for existence of "dependencies"
preflightCheck() {
if [ ! -f "/var/run/secrets/kubernetes.io/serviceaccount/token" ]; then
echo "$(date) | No Kubernetes ServiceAccount token found."
exit 1
fi
}
@galexrt
galexrt / ceph-mon-recovery.md
Last active Apr 5, 2018
A WIP doc page on Ceph Mon recovery when running Rook.
View ceph-mon-recovery.md

Rook <= 0.7.0 mon hostNetwork: true node IP issue

NOTE If you need assistance with steps 3 through 5, let us know on the Rook Slack and we are happy to help you.

WARNING You should not have to go through this section when having hostNetwork: false (or haven't even set it)! WARNING If you have/had multiple Filesystems created, this guide may not work for you because of a bug in Ceph that causes the mons to crash during the "FS Map" assertion.

  1. Scale rook-operator down (e.g. replicas: 0).
  2. Edit all rook-ceph-mon ReplicaSets to have command: ['sleep', '3600'] in the mon container, but copy the other args and command values somewhere safe for each mon.
  3. Exec into the first mon and run: monmap --print /var/lib/rook/rook-ceph-mon-$MON_ID/monmap.
  • Where MON_ID is the ID of the mon you execed into.
@galexrt
galexrt / ceph-mgr-failover.log
Created Jun 2, 2018
The new active mgr gets stuck in `active, starting`.
View ceph-mgr-failover.log
+ rook-ceph-mgr0-7fd45d88f5-w59tt › rook-ceph-mgr0
rook-ceph-mgr0-7fd45d88f5-w59tt rook-ceph-mgr0 2018-06-02 11:15:00.480868 I | rookcmd: starting Rook v0.7.0-163.g5d1d3b5a with arguments '/usr/local/bin/rook mgr --config-dir=/var/lib/rook'
rook-ceph-mgr0-7fd45d88f5-w59tt rook-ceph-mgr0 2018-06-02 11:15:00.480962 I | rookcmd: flag values: --admin-secret=*****, --ceph-config-override=/etc/rook/config/override.conf, --cluster-name=rook, --config-dir=/var/lib/rook, --fsid=, --help=false, --log-level=INFO, --mgr-keyring=AQDOXhJba4L5LBAADDhc0csZbyOME5RfzvALvg==, --mgr-name=rook-ceph-mgr0, --mon-endpoints=rook-ceph-mon0=10.96.149.29:6790,rook-ceph-mon1=10.108.88.252:6790,rook-ceph-mon2=10.101.36.178:6790, --mon-secret=*****, --private-ipv4=172.17.0.8, --public-ipv4=172.17.0.8
rook-ceph-mgr0-7fd45d88f5-w59tt rook-ceph-mgr0 2018-06-02 11:15:00.480972 I | cephmon: parsing mon endpoints: rook-ceph-mon0=10.96.149.29:6790,rook-ceph-mon1=10.108.88.252:6790,rook-ceph-mon2=10.101.36.178:6790
rook-ceph-mgr0-7fd45d88f5-w59tt
@galexrt
galexrt / rook-cluster.yaml
Last active Jul 16, 2018
CoreOS is used for the nodes. Every node has sda10 mounted to /var/lib/rook and sdb empty used as a whole. The nodes have been labelled according to the placements.
View rook-cluster.yaml
apiVersion: rook.io/v1alpha1
kind: Cluster
metadata:
name: rook
namespace: rook
spec:
versionTag: master
dataDirHostPath: /var/lib/rook-config
# toggle to use hostNetwork
hostNetwork: true
@galexrt
galexrt / crio.ini
Last active May 26, 2019
My current CRI-O config file. The first file name is actually `crio.conf`, the gist file extension is `ini` to allow for the "correct" syntax highlighting to be applied. The second file is `/etc/containers/policy.json`. See https://edenmal.moe/2018/03/09/CRI-O-Container-Linux-How-to-Install/#Step-4-Configure-CRI-O.
View crio.ini
# The CRI-O configuration file specifies all of the available configuration
# options and command-line flags for the crio(8) OCI Kubernetes Container Runtime
# daemon, but in a TOML format that can be more easily modified and versioned.
#
# Please refer to crio.conf(5) for details of all configuration options.
# CRI-O reads its storage defaults from the containers-storage.conf(5) file
# located at /etc/containers/storage.conf. Modify this storage configuration if
# you want to change the system's defaults. If you want to modify storage just
# for CRI-O, you can change the storage configuration options here.