nodox/0_monitoring.md

## 0_monitoring.md

      
    Raw
  

              0_monitoring.md
            
          
    Monitoring applications with Prometheus

Today you are going to dive deep into the observability and build a monitoring solution for your container environment using Prometheus and Grafana.
What is Observability?

You know what really grinds my gear? Incomplete software tutorials. Everyone wants to be a teach the world about that shiny new tool or that brand new Javascript framework to build the application ofyour dreams. Developers will often write these elaborate tutorialsto teach you how to write X feature or implement Y functionality. The tutorial then ends with a disclaimer that you wouldn't want to run this in production. Or my favorite they will tell you how to deploy the application but not how to maintain the new piece of software that you released into the wild. I mean how are you so sure the new piece of software your released won't take down us-east-1 North Virginia data center AKA the cloud? That's right you're not sure. Well today that's all going to change.
In this tutorial we're going to take a look at things from the post-production world. We're going to build a monitoring solution that will allow us to see what our applications are doing, what resources they are consuming when they are deployed to their production environment. As you can see that where we get the term observability.
Observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. the more information we get from the internal states of the system, the more visibility we have into the system which increases the obseravility of our system. There are 4 pillars of obserbavility.

Metrics aggregation
Log aggregation
Tracing
Visualization and Alerting

When we measure them, we can get a better picture of what our systems are doing in production. You might be asking why do I care? Well, with these principles you can do a couple of interesting things when your system is in an undesired state. You can:

visualize the state of you systems and alert when the system goes into an undesired state
you can improve your disaster recovery efforts during on-call incidents
you can measure how reliably your systems are, and determine with great confidence when you can take your system down for planned downtime

You're not limited to these benefits but surely you can see how knowing more about what your system is during when live can give you the confidence to take better risks.
Now to increase visibilty you need the proper tools to touch each one of those pillars. Today we're going to focus on metrics collection and visualization components of obsservability. Let's get started with Prometheus.
Working with Prometheus

Prometheus is an open source, time-series database, written in Golang used for metric colleciton. Prometheus has risen to high popularity due to highly dimensional day and powerful query language among other things. The prometheus model for metric collection is different than most metric collection styles because it relies on pulling metrics from targets via scraping at some determined interval. Prometheus also integrates with other services to complete its monitoring infrastructure.

Prometheus components include

the main Prometheus server which scrapes and stores time series data
client libraries for instrumenting application code
a push gateway for supporting short-lived jobs
special-purpose exporters for services like HAProxy, StatsD, Graphite, etc.
an alertmanager to handle alerts
various support tools

Monitoring with prometheus means all components are containers, you get alerting and visualization services, and you get a large ecosystem of exporters to collect metrics from commonly used applications and databases (postgres, mongodb, nginx, AWS, and more). Prometheus scrapes metrics from instrumented jobs, either directly or via an intermediary push gateway for short-lived jobs. It stores all scraped samples locally and runs rules over this data to either aggregate and record new time series from existing data or generate alerts. Grafana or other API consumers can be used to visualize the collected data.
Outline.

Talk about prometheus, architecture
Show the dockerfile
Walk through each section piece by piece

======

Its  never about release strategies or what a production environment of the tutorial product looks like
And i know for a fact we don’t even begin to talk about how to monitor the tutorial app we release into the north virgina i.e. the internet or the cloud

Spongebob pic of imagination meme with caption the cloud


We’ll I’m here to take you down the road less travelled. We’re going to talk about how we monitor the bad pieces of software we release into the work. Cause cmon you know every line of code you wrote was shitty. especially if you’re working with javascript, for better or worst ecosystem changes every second
We are going to build a kickass operational monitoring stack that will give you visibility into what you app is doing and how its behaving. No more push and pray deployments.
We’re going to deploy grafana and prometheus stack to collect metrics and visualize the data we get from our apps and servers. Let’s get started

Let’s collect metrics with Prometheus

ok time to talk about the M word. No not money you greedy bastard. I’m talking metrics. In order to monitor an app or an API you need query that thing for some metrics and then store them for later analysis. For this use case we will use Prometheus.

Let’s monitor some things using metrics - exporters

Machine/Host Metrics - Node exporter

Ok so first and foremost we need to collect metrics form the computer nodes in our closet
This is good to know how exhausted


Container metrics - cadvisor exporter

now we can collect container metrics to  analyzes and exposes resource usage and performance data from running containers


Github repo metrics - github exporter

for added fun let’s monitor something more interesting like metrics from a GitHub repo
this exporter lets us monitor for example repo traffic, stars, issues and pull requests


Is Production still running? - Alerting

Its not enough to collect metrics you have to be alerted when something goes wrong. Last deploy killed your site? You should know about it immediately

What about short lived jobs - Pushgateway

an intermediary service which allows you to push metrics from jobs which cannot be scraped.
You might not need this right now but let’s say you have short lived jobs like a lambda function that fires every time a user buys a product.
You still want those metrics even though its doesn’t have a server endpoint to query.
Thats where we use push gateway. Instead of using the normal pull architecture prometheus exposes, we can push our metrics once the job is finishing to push gateway which aggregates them and then prometheus can scrape pushgateway
Trust me you’ll thank me later, plus its no work with docker compose


## Dockerfile
version: '3.7'

networks:
  monitor-net:
    driver: bridge

volumes:
    prometheus_data: {}
    grafana_data: {}

services:

  prometheus:
    image: prom/prometheus:v2.3.2
    container_name: prometheus
    volumes:
      - ./prometheus/:/etc/prometheus/
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.enable-lifecycle'
    restart: unless-stopped
    expose:
      - 9090
    ports:
      - 9090:9090
    networks:
      - monitor-net
    labels:
      org.label-schema.group: "monitoring"
    depends_on:
      - influxdb

  alertmanager:
    image: prom/alertmanager:v0.15.1
    container_name: alertmanager
    volumes:
      - ./alertmanager/:/etc/alertmanager/
    command:
      - '--config.file=/etc/alertmanager/config.yml'
      - '--storage.path=/alertmanager'
    restart: unless-stopped
    expose:
      - 9093
    networks:
      - monitor-net
    labels:
      org.label-schema.group: "monitoring"

  nodeexporter:
    image: prom/node-exporter:v0.16.0
    container_name: nodeexporter
    user: root
    privileged: true
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)'
    restart: unless-stopped
    expose:
      - 9100
    networks:
      - monitor-net
    labels:
      org.label-schema.group: "monitoring"

  github-exporter:
    image: gatsbymanor/github-exporter:latest
    container_name: github-exporter
    tty: true
    stdin_open: true
    restart: unless-stopped
    expose:
      - 5001
    networks:
      - monitor-net
    environment:
      - REPO=gatsbyjs/gatsby
      - GITHUB_TOKEN=6120e58326cf0eb38c344eb7333de50f9db61cb3
    labels:
      org.label-schema.group: "monitoring"

  cadvisor:
    image: google/cadvisor:v0.28.3
    container_name: cadvisor
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:rw
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      #- /cgroup:/cgroup:ro #doesn't work on MacOS only for Linux
    restart: unless-stopped
    expose:
      - 8080
    networks:
      - monitor-net
    labels:
      org.label-schema.group: "monitoring"

  grafana:
    image: grafana/grafana:5.2.2
    container_name: grafana
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/datasources:/etc/grafana/datasources
      - ./grafana/dashboards:/etc/grafana/dashboards
      - ./grafana/setup.sh:/setup.sh
    entrypoint: /setup.sh
    environment:
      - GF_SECURITY_ADMIN_USER=${ADMIN_USER:-admin}
      - GF_SECURITY_ADMIN_PASSWORD=${ADMIN_PASSWORD:-admin}
      - GF_USERS_ALLOW_SIGN_UP=false
    restart: unless-stopped
    expose:
      - 3000
    ports:
      - 3000:3000
    networks:
      - monitor-net
    labels:
      org.label-schema.group: "monitoring"

  pushgateway:
    image: prom/pushgateway
    container_name: pushgateway
    restart: unless-stopped
    expose:
      - 9091
    networks:
      - monitor-net
    labels:
      org.label-schema.group: "monitoring"

  influxdb:
    image: influxdb:1.6
    volumes:
      - ./influxdb/:/etc/influxdb/
    environment:
      - INFLUXDB_DB=prometheus
      - INFLUXDB_HTTP_AUTH_ENABLED=false
    ports:
      - 8086:8086
      - 8083:8083
    tty: true
    networks:
      - monitor-net
	version: '3.7'

	networks:
	monitor-net:
	driver: bridge

	volumes:
	prometheus_data: {}
	grafana_data: {}

	services:

	prometheus:
	image: prom/prometheus:v2.3.2
	container_name: prometheus
	volumes:
	- ./prometheus/:/etc/prometheus/
	- prometheus_data:/prometheus
	command:
	- '--config.file=/etc/prometheus/prometheus.yml'
	- '--storage.tsdb.path=/prometheus'
	- '--web.console.libraries=/etc/prometheus/console_libraries'
	- '--web.console.templates=/etc/prometheus/consoles'
	- '--web.enable-lifecycle'
	restart: unless-stopped
	expose:
	- 9090
	ports:
	- 9090:9090
	networks:
	- monitor-net
	labels:
	org.label-schema.group: "monitoring"
	depends_on:
	- influxdb

	alertmanager:
	image: prom/alertmanager:v0.15.1
	container_name: alertmanager
	volumes:
	- ./alertmanager/:/etc/alertmanager/
	command:
	- '--config.file=/etc/alertmanager/config.yml'
	- '--storage.path=/alertmanager'
	restart: unless-stopped
	expose:
	- 9093
	networks:
	- monitor-net
	labels:
	org.label-schema.group: "monitoring"

	nodeexporter:
	image: prom/node-exporter:v0.16.0
	container_name: nodeexporter
	user: root
	privileged: true
	volumes:
	- /proc:/host/proc:ro
	- /sys:/host/sys:ro
	- /:/rootfs:ro
	command:
	- '--path.procfs=/host/proc'
	- '--path.sysfs=/host/sys'
	- '--collector.filesystem.ignored-mount-points=^/(sys\|proc\|dev\|host\|etc)($$\|/)'
	restart: unless-stopped
	expose:
	- 9100
	networks:
	- monitor-net
	labels:
	org.label-schema.group: "monitoring"

	github-exporter:
	image: gatsbymanor/github-exporter:latest
	container_name: github-exporter
	tty: true
	stdin_open: true
	restart: unless-stopped
	expose:
	- 5001
	networks:
	- monitor-net
	environment:
	- REPO=gatsbyjs/gatsby
	- GITHUB_TOKEN=6120e58326cf0eb38c344eb7333de50f9db61cb3
	labels:
	org.label-schema.group: "monitoring"

	cadvisor:
	image: google/cadvisor:v0.28.3
	container_name: cadvisor
	volumes:
	- /:/rootfs:ro
	- /var/run:/var/run:rw
	- /sys:/sys:ro
	- /var/lib/docker/:/var/lib/docker:ro
	#- /cgroup:/cgroup:ro #doesn't work on MacOS only for Linux
	restart: unless-stopped
	expose:
	- 8080
	networks:
	- monitor-net
	labels:
	org.label-schema.group: "monitoring"

	grafana:
	image: grafana/grafana:5.2.2
	container_name: grafana
	volumes:
	- grafana_data:/var/lib/grafana
	- ./grafana/datasources:/etc/grafana/datasources
	- ./grafana/dashboards:/etc/grafana/dashboards
	- ./grafana/setup.sh:/setup.sh
	entrypoint: /setup.sh
	environment:
	- GF_SECURITY_ADMIN_USER=${ADMIN_USER:-admin}
	- GF_SECURITY_ADMIN_PASSWORD=${ADMIN_PASSWORD:-admin}
	- GF_USERS_ALLOW_SIGN_UP=false
	restart: unless-stopped
	expose:
	- 3000
	ports:
	- 3000:3000
	networks:
	- monitor-net
	labels:
	org.label-schema.group: "monitoring"

	pushgateway:
	image: prom/pushgateway
	container_name: pushgateway
	restart: unless-stopped
	expose:
	- 9091
	networks:
	- monitor-net
	labels:
	org.label-schema.group: "monitoring"

	influxdb:
	image: influxdb:1.6
	volumes:
	- ./influxdb/:/etc/influxdb/
	environment:
	- INFLUXDB_DB=prometheus
	- INFLUXDB_HTTP_AUTH_ENABLED=false
	ports:
	- 8086:8086
	- 8083:8083
	tty: true
	networks:
	- monitor-net