Skip to content

Instantly share code, notes, and snippets.

@freeseacher
Last active March 12, 2021 14:43
Show Gist options
  • Save freeseacher/9dd3566a5d917c9fd3f41bd5f87427bc to your computer and use it in GitHub Desktop.
Save freeseacher/9dd3566a5d917c9fd3f41bd5f87427bc to your computer and use it in GitHub Desktop.
one-more-prom-slides
version: '2.3'
services:
prom:
image: prom/prometheus:v2.23.0
ports:
- 9090:9090
volumes:
- "$PWD/configs:/etc/prometheus:ro"
command:
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.retention=1d
sample-app:
build:
dockerfile: Dockerfile
context: sample-app
# ports:
# - 8017:8000
hey_summary:
restart: always
image: williamyeh/hey:latest
command: -n 10 -c 1 http://sample-app:8000/summary/
hey_histogram:
restart: always
image: williamyeh/hey:latest
command: -n 10 -c 2 http://sample-app:8000/histogram/
hey_counter:
restart: always
image: williamyeh/hey:latest
command: -n 10 -c 1 http://sample-app:8000/counter/1/
node_exporter:
image: prom/node-exporter:v0.18.1
ports:
- 7080:9100
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.ignored-mount-points'
- "^/(sys|proc|dev|host|etc|rootfs/var/lib/docker/containers|rootfs/var/lib/docker/overlay2|rootfs/run/docker/netns|rootfs/var/lib/docker/aufs)($$|/)"
---
type: slide
---
# Yet another prometheus getting started
---
## before we begin
```shell
git clone \
git@....
cd prom-workshop-v2
docker-compose build
```
---
## docker-compose
```shell
sudo curl -L "https://github.com/docker/compose/releases/download/1.27.4/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
```
https://docs.docker.com/compose/install/
---
# monitoring models
---
## Push
``` mermaid
graph LR;
App[App \'] -->C[collector \'];
C[collector \'] --> S(storage \');
G[grafana \'] -->S((storage \'));
```
---
## Pull
```mermaid
graph LR;
C[collector \'] --> A[App \'];
C[collector \'] --> S((storage \'));
G[grafana \'] --> S((storage \'));
```
---
# sample app
sample-app/app.py
---
# Metric types
* gauge
* conter
* summary
* histogram
---
## Gauge
* temperature
* status
* memory used
* disk used
---
## gauge typical operations
* show raw value
* deriv -- per second change
* delta -- delta between two points
Note:
температура вырасла на 10. не учитывает скачки между значениями
---
```python
from prometheus_client import Gauge
g = Gauge('sensor_temperature', 'Cpu temperature')
g.inc() # Increment by 1
g.dec(10) # Decrement by given value
g.set(4.2) # Set to a given value
```
---
## Counter
* increments each time event happen
* http requests count
* error count
---
## Counter typical operations
* rate -- per second change
* increase -- delta between two points
---
```python=
from prometheus_client import Counter
c = Counter('app_failures', 'Errors count')
c.inc() # Increment by 1
c.inc(1.6) # Increment by given value
```
---
# counter -- is a king!
## Use counter each time you doubt
---
# Seems we are ready to start
```shell
% docker-compose up -d
```
---
# Latency && sizes
Note:
Некоторые события имеют много событий в пределах периода опроса и несколько измерений. Например размер ответа или время ответа
---
# Complex types
* summary
* histogram
---
## summary
* precalculated
* incomparable
* uses gauge
* 2/4 of people jumps lower than 50 cm
* 3/4 of people jumps lower than 90 cm
* 99/100 of people jumps lower than 120 cm
---
## summary example
```
go_gc_duration_seconds{quantile="0"} 0.000008394
go_gc_duration_seconds{quantile="0.25"} 0.000010507
go_gc_duration_seconds{quantile="0.5"} 0.000011205
go_gc_duration_seconds{quantile="0.75"} 0.000012347
go_gc_duration_seconds{quantile="1"} 0.000040238
```
---
```shell
Latency distribution:
10% in 0.1564 secs
25% in 0.2939 secs
50% in 0.4126 secs
75% in 0.8355 secs
90% in 1.0241 secs
95% in 1.4024 secs
99% in 1.4448 secs
```
---
## summary typical operations
* show specific quantile
* alert on it :)
---
```python=
from prometheus_client import Summary
s = Summary('request_latency_seconds', 'Description of summary')
s.observe(4.7) # Observe 4.7 (seconds in this case)
```
<span>Useless for official python client<!-- .element: class="fragment" data-fragment-index="1" --></span>
---
## histogram
* uses counters
* comparable
---
## histogram example
```
sample_app_histogram_bucket{le="0.005"} 0
sample_app_histogram_bucket{le="0.01"} 0
sample_app_histogram_bucket{le="0.025"} 0
sample_app_histogram_bucket{le="0.05"} 0
sample_app_histogram_bucket{le="0.075"} 0
sample_app_histogram_bucket{le="0.1"} 0
...
sample_app_histogram_bucket{le="2.5"} 3
sample_app_histogram_bucket{le="5.0"} 8
sample_app_histogram_bucket{le="7.5"} 11
sample_app_histogram_bucket{le="10.0"} 19
sample_app_histogram_bucket{le="+Inf"} 20
```
---
```
Response time histogram:
0.069 [1] |■
0.214 [24] |■■■■■■■■■■■■■■
0.359 [71] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.505 [12] |■■■■■■■
0.650 [33] |■■■■■■■■■■■■■■■■■■■
0.795 [5] |■■■
0.941 [19] |■■■■■■■■■■■
1.086 [19] |■■■■■■■■■■■
1.231 [1] |■
1.376 [1] |■
1.522 [14] |■■■■■■■■
---
## histogram typical operations
* make it quantile with some precision
* rate it to make distribution
---
```python=
from prometheus_client import Histogram
h = Histogram('request_latency_seconds', 'Description of histogram')
h.observe(4.7) # Observe 4.7 (seconds in this case)
```
---
# Labels
* kv
* label value always qouted
```{tier="prod"}```
---
## Labels with special meaning
* instance -- one examplar of service
* job -- some group of instances
---
## label operations
* key = "value"
* key =~ "value"
* key != "value"
* key !~ "value"
---
# Re2 warning
https://github.com/google/re2/wiki/Syntax
Lot's of (NOT SUPPORTED)
---
## typical re operations
```
{status=~"200|201|3.."}
{status!~"5.."}
```
---
# Lets get back to our workshop
---
```shell
% docker-compose up -d
```
---
## description
* http://0.0.0.0:9090 -- prometheus
* http://0.0.0.0:8017 -- sample app (disabled)
---
# basic quering
http://0.0.0.0:9090
---
## prepare
http://0.0.0.0:9090/targets
green ?
---
## simple query
```plaintext
up
```
---
## add labels
```plaintext
up{job="prometheus"}
```
---
## apply function
```
count(up{})
```
---
## add group by
```
count(up{}) by (job)
count(up{}) by (job, instance)
```
Note:
all functions and help
---
# all metrics returned by app
```
{job="app"}
```
---
tasks
1. show up instances for prometheus/node_exporter using regexp
2. show disk space usage last minute. какой тип файловой системы используется? как вы это поняли?
3. make distibution for prometheus_http_request_duration_seconds_bucket
4. avalaibility of exporters
Note:
```
1. count(up) by (job)
2. deriv(node_filesystem_free_bytes[1m])
3. rate(prometheus_http_request_duration_seconds_bucket[1m])
4. sum_over_time(up[1h]) / count_over_time(up[1h])
```
---
# Questions?
---
# Next workshop
1. how to get an alert
2. How our monitoring related to this
---
# homework
* latency buckets for your app. exclude static
from uvicorn import run
from fastapi import FastAPI
from fastapi.responses import PlainTextResponse
import random
from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram, Summary
app = FastAPI()
PROMETHEUS_COUNTER: Counter = Counter('sample_app_counter', 'count')
PROMETHEUS_GAUGE: Gauge = Gauge('sample_app_gauge', 'gauge')
PROMETHEUS_HISTOGRAM: Histogram = Histogram('sample_app_histogram', 'histogram')
PROMETHEUS_SUMMARY: Summary = Summary('sample_app_summary', 'summary')
# The Python client doesn't store or expose quantile information at this time.
@app.get('/summary/{num}')
def summary(num: int) -> None:
PROMETHEUS_SUMMARY.observe(num)
@app.get('/histogram/')
def histogram() -> None:
num = random.uniform(0, 11.0)
PROMETHEUS_HISTOGRAM.observe(num)
@app.get('/gauge/+/{num}')
def gauge_inc(num: int) -> None:
PROMETHEUS_GAUGE.inc(num)
@app.get('/gauge/-/{num}')
def gauge_dec(num: int) -> None:
PROMETHEUS_GAUGE.dec(num)
@app.get('/gauge/=/{num}')
def gauge_set(num: int) -> None:
PROMETHEUS_GAUGE.set(num)
@app.get('/counter/{num}')
def inc(num: int) -> None:
PROMETHEUS_COUNTER.inc(num)
@app.get('/metrics', response_class=PlainTextResponse)
def metrics():
return generate_latest(REGISTRY)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment