freeseacher/docker-compose.yml

## docker-compose.yml
version: '2.3'
services:
  prom:
    image: prom/prometheus:v2.23.0
    ports:
      - 9090:9090
    volumes:
      - "$PWD/configs:/etc/prometheus:ro"
    command:
      - --config.file=/etc/prometheus/prometheus.yml
      - --storage.tsdb.retention=1d

  sample-app:
    build:
      dockerfile: Dockerfile
      context: sample-app
#    ports:
#      - 8017:8000

  hey_summary:
    restart: always
    image: williamyeh/hey:latest
    command: -n 10 -c 1 http://sample-app:8000/summary/

  hey_histogram:
    restart: always
    image: williamyeh/hey:latest
    command: -n 10 -c 2 http://sample-app:8000/histogram/

  hey_counter:
    restart: always
    image: williamyeh/hey:latest
    command: -n 10 -c 1 http://sample-app:8000/counter/1/

  node_exporter:
    image: prom/node-exporter:v0.18.1
    ports:
      - 7080:9100
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.ignored-mount-points'
      - "^/(sys|proc|dev|host|etc|rootfs/var/lib/docker/containers|rootfs/var/lib/docker/overlay2|rootfs/run/docker/netns|rootfs/var/lib/docker/aufs)($$|/)"

## gistfile1.txt
---
type: slide
---

# Yet another prometheus getting started

---

## before we begin
```shell
git clone \
 git@....
cd prom-workshop-v2
docker-compose build
```

---
## docker-compose
```shell
sudo curl -L "https://github.com/docker/compose/releases/download/1.27.4/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
```
https://docs.docker.com/compose/install/

---

# monitoring models

---

## Push

``` mermaid
graph LR;
    App[App   \'] -->C[collector  \'];
    C[collector  \'] --> S(storage   \');
    G[grafana   \'] -->S((storage   \'));
```

---

## Pull

```mermaid
graph LR;
    C[collector   \'] --> A[App   \'];
    C[collector   \'] --> S((storage   \'));
    G[grafana   \'] --> S((storage   \'));
```

---

# sample app

sample-app/app.py

---

# Metric types

* gauge
* conter
* summary
* histogram

---

## Gauge

* temperature
* status
* memory used
* disk used

---

## gauge typical operations

* show raw value
* deriv -- per second change
* delta -- delta between two points

Note:
  температура вырасла на 10. не учитывает скачки между значениями

---

```python
from prometheus_client import Gauge
g = Gauge('sensor_temperature', 'Cpu temperature')
g.inc()      # Increment by 1
g.dec(10)    # Decrement by given value
g.set(4.2)   # Set to a given value
```

---

## Counter

* increments each time event happen
* http requests count
* error count

---

## Counter typical operations

* rate -- per second change
* increase -- delta between two points

---

```python=
from prometheus_client import Counter
c = Counter('app_failures', 'Errors count')
c.inc()     # Increment by 1
c.inc(1.6)  # Increment by given value
```

---

# counter -- is a king!

## Use counter each time you doubt


---

# Seems we are ready to start

```shell
% docker-compose up -d
```

---

# Latency && sizes

Note:
  Некоторые события имеют много событий в пределах периода опроса и несколько измерений. Например размер ответа или время ответа

---

# Complex types

* summary
* histogram

---

## summary

* precalculated
* incomparable
* uses gauge
* 2/4 of people jumps lower than 50 cm
* 3/4 of people jumps lower than 90 cm
* 99/100 of people jumps lower than 120 cm

---

## summary example

```
go_gc_duration_seconds{quantile="0"}    0.000008394
go_gc_duration_seconds{quantile="0.25"} 0.000010507
go_gc_duration_seconds{quantile="0.5"}  0.000011205
go_gc_duration_seconds{quantile="0.75"} 0.000012347
go_gc_duration_seconds{quantile="1"}    0.000040238
```

---

```shell
Latency distribution:
  10% in 0.1564 secs
  25% in 0.2939 secs
  50% in 0.4126 secs
  75% in 0.8355 secs
  90% in 1.0241 secs
  95% in 1.4024 secs
  99% in 1.4448 secs
```


---

## summary typical operations

* show specific quantile
* alert on it :)

---

```python=
from prometheus_client import Summary
s = Summary('request_latency_seconds', 'Description of summary')
s.observe(4.7)    # Observe 4.7 (seconds in this case)
```

<span>Useless for official python client<!-- .element: class="fragment" data-fragment-index="1" --></span>

---

## histogram

* uses counters
* comparable

---

## histogram example

```
sample_app_histogram_bucket{le="0.005"} 0
sample_app_histogram_bucket{le="0.01"}  0
sample_app_histogram_bucket{le="0.025"} 0
sample_app_histogram_bucket{le="0.05"}  0
sample_app_histogram_bucket{le="0.075"} 0
sample_app_histogram_bucket{le="0.1"}   0
...
sample_app_histogram_bucket{le="2.5"}   3
sample_app_histogram_bucket{le="5.0"}   8
sample_app_histogram_bucket{le="7.5"}   11
sample_app_histogram_bucket{le="10.0"}  19
sample_app_histogram_bucket{le="+Inf"}  20
```

---

```
Response time histogram:
  0.069 [1] |■
  0.214 [24]  |■■■■■■■■■■■■■■
  0.359 [71]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.505 [12]  |■■■■■■■
  0.650 [33]  |■■■■■■■■■■■■■■■■■■■
  0.795 [5] |■■■
  0.941 [19]  |■■■■■■■■■■■
  1.086 [19]  |■■■■■■■■■■■
  1.231 [1] |■
  1.376 [1] |■
  1.522 [14]  |■■■■■■■■


---

## histogram typical operations

* make it quantile with some precision
* rate it to make distribution


---

```python=
from prometheus_client import Histogram
h = Histogram('request_latency_seconds', 'Description of histogram')
h.observe(4.7)    # Observe 4.7 (seconds in this case)
```

---

# Labels

* kv
* label value always qouted

```{tier="prod"}```

---

## Labels with special meaning

* instance -- one examplar of service
* job -- some group of instances


---

## label operations

* key = "value"
* key =~ "value"
* key != "value"
* key !~ "value"

---

# Re2 warning

https://github.com/google/re2/wiki/Syntax

Lot's of (NOT SUPPORTED)

---

## typical re operations

```
{status=~"200|201|3.."}
{status!~"5.."}
```

---

# Lets get back to our workshop

---

```shell
% docker-compose up -d
```

---

## description

* http://0.0.0.0:9090 -- prometheus
* http://0.0.0.0:8017 -- sample app (disabled)

---

# basic quering

http://0.0.0.0:9090


---

## prepare

http://0.0.0.0:9090/targets

green ?

---

## simple query

```plaintext
up
```

---

## add labels

```plaintext
up{job="prometheus"}
```

---

## apply function

```
count(up{})
```

---

## add group by

```
count(up{}) by (job)
count(up{}) by (job, instance)
```

Note:
  all functions and help


---

# all metrics returned by app

```
{job="app"}
```

---

tasks

1. show up instances for prometheus/node_exporter using regexp
2. show disk space usage last minute. какой тип файловой системы используется? как вы это поняли?
3. make distibution for prometheus_http_request_duration_seconds_bucket
4. avalaibility of exporters

Note:
```
1. count(up) by (job)
2. deriv(node_filesystem_free_bytes[1m])
3. rate(prometheus_http_request_duration_seconds_bucket[1m])
4. sum_over_time(up[1h]) / count_over_time(up[1h])
```

---

# Questions?

---


# Next workshop

1. how to get an alert
2. How our monitoring related to this

---

# homework

* latency buckets for your app. exclude static

## sample-app
from uvicorn import run
from fastapi import FastAPI
from fastapi.responses import PlainTextResponse
import random

from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram, Summary

app = FastAPI()

PROMETHEUS_COUNTER: Counter = Counter('sample_app_counter', 'count')
PROMETHEUS_GAUGE: Gauge = Gauge('sample_app_gauge', 'gauge')
PROMETHEUS_HISTOGRAM: Histogram = Histogram('sample_app_histogram', 'histogram')
PROMETHEUS_SUMMARY: Summary = Summary('sample_app_summary', 'summary')


# The Python client doesn't store or expose quantile information at this time.
@app.get('/summary/{num}')
def summary(num: int) -> None:
    PROMETHEUS_SUMMARY.observe(num)


@app.get('/histogram/')
def histogram() -> None:
    num = random.uniform(0, 11.0)
    PROMETHEUS_HISTOGRAM.observe(num)


@app.get('/gauge/+/{num}')
def gauge_inc(num: int) -> None:
    PROMETHEUS_GAUGE.inc(num)


@app.get('/gauge/-/{num}')
def gauge_dec(num: int) -> None:
    PROMETHEUS_GAUGE.dec(num)


@app.get('/gauge/=/{num}')
def gauge_set(num: int) -> None:
    PROMETHEUS_GAUGE.set(num)


@app.get('/counter/{num}')
def inc(num: int) -> None:
    PROMETHEUS_COUNTER.inc(num)


@app.get('/metrics', response_class=PlainTextResponse)
def metrics():
    return generate_latest(REGISTRY)
	version: '2.3'
	services:
	prom:
	image: prom/prometheus:v2.23.0
	ports:
	- 9090:9090
	volumes:
	- "$PWD/configs:/etc/prometheus:ro"
	command:
	- --config.file=/etc/prometheus/prometheus.yml
	- --storage.tsdb.retention=1d

	sample-app:
	build:
	dockerfile: Dockerfile
	context: sample-app
	# ports:
	# - 8017:8000

	hey_summary:
	restart: always
	image: williamyeh/hey:latest
	command: -n 10 -c 1 http://sample-app:8000/summary/

	hey_histogram:
	restart: always
	image: williamyeh/hey:latest
	command: -n 10 -c 2 http://sample-app:8000/histogram/

	hey_counter:
	restart: always
	image: williamyeh/hey:latest
	command: -n 10 -c 1 http://sample-app:8000/counter/1/

	node_exporter:
	image: prom/node-exporter:v0.18.1
	ports:
	- 7080:9100
	volumes:
	- /proc:/host/proc:ro
	- /sys:/host/sys:ro
	- /:/rootfs:ro
	command:
	- '--path.procfs=/host/proc'
	- '--path.sysfs=/host/sys'
	- '--collector.filesystem.ignored-mount-points'
	- "^/(sys\|proc\|dev\|host\|etc\|rootfs/var/lib/docker/containers\|rootfs/var/lib/docker/overlay2\|rootfs/run/docker/netns\|rootfs/var/lib/docker/aufs)($$\|/)"
	---
	type: slide
	---

	# Yet another prometheus getting started

	---

	## before we begin
	```shell
	git clone \
	git@....
	cd prom-workshop-v2
	docker-compose build
	```

	---
	## docker-compose
	```shell
	sudo curl -L "https://github.com/docker/compose/releases/download/1.27.4/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
	sudo chmod +x /usr/local/bin/docker-compose
	```
	https://docs.docker.com/compose/install/

	---

	# monitoring models

	---

	## Push

	``` mermaid
	graph LR;
	App[App \'] -->C[collector \'];
	C[collector \'] --> S(storage \');
	G[grafana \'] -->S((storage \'));
	```

	---

	## Pull

	```mermaid
	graph LR;
	C[collector \'] --> A[App \'];
	C[collector \'] --> S((storage \'));
	G[grafana \'] --> S((storage \'));
	```

	---

	# sample app

	sample-app/app.py

	---

	# Metric types

	* gauge
	* conter
	* summary
	* histogram

	---

	## Gauge

	* temperature
	* status
	* memory used
	* disk used

	---

	## gauge typical operations

	* show raw value
	* deriv -- per second change
	* delta -- delta between two points

	Note:
	температура вырасла на 10. не учитывает скачки между значениями

	---

	```python
	from prometheus_client import Gauge
	g = Gauge('sensor_temperature', 'Cpu temperature')
	g.inc() # Increment by 1
	g.dec(10) # Decrement by given value
	g.set(4.2) # Set to a given value
	```

	---

	## Counter

	* increments each time event happen
	* http requests count
	* error count

	---

	## Counter typical operations

	* rate -- per second change
	* increase -- delta between two points

	---

	```python=
	from prometheus_client import Counter
	c = Counter('app_failures', 'Errors count')
	c.inc() # Increment by 1
	c.inc(1.6) # Increment by given value
	```

	---

	# counter -- is a king!

	## Use counter each time you doubt


	---

	# Seems we are ready to start

	```shell
	% docker-compose up -d
	```

	---

	# Latency && sizes

	Note:
	Некоторые события имеют много событий в пределах периода опроса и несколько измерений. Например размер ответа или время ответа

	---

	# Complex types

	* summary
	* histogram

	---

	## summary

	* precalculated
	* incomparable
	* uses gauge
	* 2/4 of people jumps lower than 50 cm
	* 3/4 of people jumps lower than 90 cm
	* 99/100 of people jumps lower than 120 cm

	---

	## summary example

	```
	go_gc_duration_seconds{quantile="0"} 0.000008394
	go_gc_duration_seconds{quantile="0.25"} 0.000010507
	go_gc_duration_seconds{quantile="0.5"} 0.000011205
	go_gc_duration_seconds{quantile="0.75"} 0.000012347
	go_gc_duration_seconds{quantile="1"} 0.000040238
	```

	---

	```shell
	Latency distribution:
	10% in 0.1564 secs
	25% in 0.2939 secs
	50% in 0.4126 secs
	75% in 0.8355 secs
	90% in 1.0241 secs
	95% in 1.4024 secs
	99% in 1.4448 secs
	```


	---

	## summary typical operations

	* show specific quantile
	* alert on it :)

	---

	```python=
	from prometheus_client import Summary
	s = Summary('request_latency_seconds', 'Description of summary')
	s.observe(4.7) # Observe 4.7 (seconds in this case)
	```

	<span>Useless for official python client<!-- .element: class="fragment" data-fragment-index="1" --></span>

	---

	## histogram

	* uses counters
	* comparable

	---

	## histogram example

	```
	sample_app_histogram_bucket{le="0.005"} 0
	sample_app_histogram_bucket{le="0.01"} 0
	sample_app_histogram_bucket{le="0.025"} 0
	sample_app_histogram_bucket{le="0.05"} 0
	sample_app_histogram_bucket{le="0.075"} 0
	sample_app_histogram_bucket{le="0.1"} 0
	...
	sample_app_histogram_bucket{le="2.5"} 3
	sample_app_histogram_bucket{le="5.0"} 8
	sample_app_histogram_bucket{le="7.5"} 11
	sample_app_histogram_bucket{le="10.0"} 19
	sample_app_histogram_bucket{le="+Inf"} 20
	```

	---

	```
	Response time histogram:
	0.069 [1] \|■
	0.214 [24] \|■■■■■■■■■■■■■■
	0.359 [71] \|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
	0.505 [12] \|■■■■■■■
	0.650 [33] \|■■■■■■■■■■■■■■■■■■■
	0.795 [5] \|■■■
	0.941 [19] \|■■■■■■■■■■■
	1.086 [19] \|■■■■■■■■■■■
	1.231 [1] \|■
	1.376 [1] \|■
	1.522 [14] \|■■■■■■■■



	---

	## histogram typical operations

	* make it quantile with some precision
	* rate it to make distribution


	---

	```python=
	from prometheus_client import Histogram
	h = Histogram('request_latency_seconds', 'Description of histogram')
	h.observe(4.7) # Observe 4.7 (seconds in this case)
	```

	---

	# Labels

	* kv
	* label value always qouted

	```{tier="prod"}```

	---

	## Labels with special meaning

	* instance -- one examplar of service
	* job -- some group of instances


	---

	## label operations

	* key = "value"
	* key =~ "value"
	* key != "value"
	* key !~ "value"

	---

	# Re2 warning

	https://github.com/google/re2/wiki/Syntax

	Lot's of (NOT SUPPORTED)

	---

	## typical re operations

	```
	{status=~"200\|201\|3.."}
	{status!~"5.."}
	```

	---

	# Lets get back to our workshop

	---

	```shell
	% docker-compose up -d
	```

	---

	## description

	* http://0.0.0.0:9090 -- prometheus
	* http://0.0.0.0:8017 -- sample app (disabled)

	---

	# basic quering

	http://0.0.0.0:9090


	---

	## prepare

	http://0.0.0.0:9090/targets

	green ?

	---

	## simple query

	```plaintext
	up
	```

	---

	## add labels

	```plaintext
	up{job="prometheus"}
	```

	---

	## apply function

	```
	count(up{})
	```

	---

	## add group by

	```
	count(up{}) by (job)
	count(up{}) by (job, instance)
	```

	Note:
	all functions and help


	---

	# all metrics returned by app

	```
	{job="app"}
	```

	---

	tasks

	1. show up instances for prometheus/node_exporter using regexp
	2. show disk space usage last minute. какой тип файловой системы используется? как вы это поняли?
	3. make distibution for prometheus_http_request_duration_seconds_bucket
	4. avalaibility of exporters

	Note:
	```
	1. count(up) by (job)
	2. deriv(node_filesystem_free_bytes[1m])
	3. rate(prometheus_http_request_duration_seconds_bucket[1m])
	4. sum_over_time(up[1h]) / count_over_time(up[1h])
	```

	---

	# Questions?

	---


	# Next workshop

	1. how to get an alert
	2. How our monitoring related to this

	---

	# homework

	* latency buckets for your app. exclude static
	from uvicorn import run
	from fastapi import FastAPI
	from fastapi.responses import PlainTextResponse
	import random

	from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram, Summary

	app = FastAPI()

	PROMETHEUS_COUNTER: Counter = Counter('sample_app_counter', 'count')
	PROMETHEUS_GAUGE: Gauge = Gauge('sample_app_gauge', 'gauge')
	PROMETHEUS_HISTOGRAM: Histogram = Histogram('sample_app_histogram', 'histogram')
	PROMETHEUS_SUMMARY: Summary = Summary('sample_app_summary', 'summary')


	# The Python client doesn't store or expose quantile information at this time.
	@app.get('/summary/{num}')
	def summary(num: int) -> None:
	PROMETHEUS_SUMMARY.observe(num)


	@app.get('/histogram/')
	def histogram() -> None:
	num = random.uniform(0, 11.0)
	PROMETHEUS_HISTOGRAM.observe(num)


	@app.get('/gauge/+/{num}')
	def gauge_inc(num: int) -> None:
	PROMETHEUS_GAUGE.inc(num)


	@app.get('/gauge/-/{num}')
	def gauge_dec(num: int) -> None:
	PROMETHEUS_GAUGE.dec(num)


	@app.get('/gauge/=/{num}')
	def gauge_set(num: int) -> None:
	PROMETHEUS_GAUGE.set(num)


	@app.get('/counter/{num}')
	def inc(num: int) -> None:
	PROMETHEUS_COUNTER.inc(num)


	@app.get('/metrics', response_class=PlainTextResponse)
	def metrics():
	return generate_latest(REGISTRY)