Monitoring applications with Prometheus
Today you are going to dive deep into the observability and build a monitoring solution for your container environment using Prometheus and Grafana.
What is Observability?
In this tutorial we're going to take a look at things from the post-production world. We're going to build a monitoring solution that will allow us to see what our applications are doing, what resources they are consuming when they are deployed to their production environment. As you can see that where we get the term observability.
Observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. the more information we get from the internal states of the system, the more visibility we have into the system which increases the obseravility of our system. There are 4 pillars of obserbavility.
- Metrics aggregation
- Log aggregation
- Visualization and Alerting
When we measure them, we can get a better picture of what our systems are doing in production. You might be asking why do I care? Well, with these principles you can do a couple of interesting things when your system is in an undesired state. You can:
- visualize the state of you systems and alert when the system goes into an undesired state
- you can improve your disaster recovery efforts during on-call incidents
- you can measure how reliably your systems are, and determine with great confidence when you can take your system down for planned downtime
You're not limited to these benefits but surely you can see how knowing more about what your system is during when live can give you the confidence to take better risks.
Now to increase visibilty you need the proper tools to touch each one of those pillars. Today we're going to focus on metrics collection and visualization components of obsservability. Let's get started with Prometheus.
Working with Prometheus
Prometheus is an open source, time-series database, written in Golang used for metric colleciton. Prometheus has risen to high popularity due to highly dimensional day and powerful query language among other things. The prometheus model for metric collection is different than most metric collection styles because it relies on pulling metrics from targets via scraping at some determined interval. Prometheus also integrates with other services to complete its monitoring infrastructure.
Prometheus components include
- the main Prometheus server which scrapes and stores time series data
- client libraries for instrumenting application code
- a push gateway for supporting short-lived jobs
- special-purpose exporters for services like HAProxy, StatsD, Graphite, etc.
- an alertmanager to handle alerts
- various support tools
Monitoring with prometheus means all components are containers, you get alerting and visualization services, and you get a large ecosystem of exporters to collect metrics from commonly used applications and databases (postgres, mongodb, nginx, AWS, and more). Prometheus scrapes metrics from instrumented jobs, either directly or via an intermediary push gateway for short-lived jobs. It stores all scraped samples locally and runs rules over this data to either aggregate and record new time series from existing data or generate alerts. Grafana or other API consumers can be used to visualize the collected data.
- Talk about prometheus, architecture
- Show the dockerfile
- Walk through each section piece by piece
- Its never about release strategies or what a production environment of the tutorial product looks like
- And i know for a fact we don’t even begin to talk about how to monitor the tutorial app we release into the north virgina i.e. the internet or the cloud
- Spongebob pic of imagination meme with caption the cloud
- We are going to build a kickass operational monitoring stack that will give you visibility into what you app is doing and how its behaving. No more push and pray deployments.
- We’re going to deploy grafana and prometheus stack to collect metrics and visualize the data we get from our apps and servers. Let’s get started
Let’s collect metrics with Prometheus
- ok time to talk about the M word. No not money you greedy bastard. I’m talking metrics. In order to monitor an app or an API you need query that thing for some metrics and then store them for later analysis. For this use case we will use Prometheus.
Let’s monitor some things using metrics - exporters
- Machine/Host Metrics - Node exporter
- Ok so first and foremost we need to collect metrics form the computer nodes in our closet
- This is good to know how exhausted
- Container metrics - cadvisor exporter
- now we can collect container metrics to analyzes and exposes resource usage and performance data from running containers
- Github repo metrics - github exporter
- for added fun let’s monitor something more interesting like metrics from a GitHub repo
- this exporter lets us monitor for example repo traffic, stars, issues and pull requests
Is Production still running? - Alerting
- Its not enough to collect metrics you have to be alerted when something goes wrong. Last deploy killed your site? You should know about it immediately
What about short lived jobs - Pushgateway
- an intermediary service which allows you to push metrics from jobs which cannot be scraped.
- You might not need this right now but let’s say you have short lived jobs like a lambda function that fires every time a user buys a product.
- You still want those metrics even though its doesn’t have a server endpoint to query.
- Thats where we use push gateway. Instead of using the normal pull architecture prometheus exposes, we can push our metrics once the job is finishing to push gateway which aggregates them and then prometheus can scrape pushgateway
- Trust me you’ll thank me later, plus its no work with docker compose