Skip to content

Instantly share code, notes, and snippets.

Application Observability in Kubernetes with Datadog APM and Logging - A simple and actionable example

Last year I shared an example on how to realize application tracing in Kuberntes with Istio and Jaeger. After that, the industry has made some substantial headway on this front and we are seeing more vendor support as a result. At Buffer, since we primarily use Datadog for Kubernetes and application monitoring, it's only fitting to complete the circle with Datadog APM and Logging. I had a chance to create a small example for the team and would very much love to share with the community.

Okay, without further ado, let's dive in!

Installing Datadog agent

First thing first, in order to collect metrics and logs from Kubernetes an Datadog agent has to be installed in the cluster. The Datadog team ma

Tainting and Labeling Kubernetes Nodes to Run Special Workload - A quick guide that is finally NOT confusing

All right folks, I intend to keep this one short and that's what I will do. I mean, it's supposed to be easy but the official documentations(1, 2) make it unnecessary confusing. So I think maybe I can help filling in the gap.

I will be using one of our business requirements at Buffer in this project, as the example for this blog post.

Quick recap

So, we need a few nodes that are dedicated to running cronjobs, and nothing else. At the same time we want to make sure the cornjobs are scheduled to these nodes, and nowhere else. This means we need 2 things

When Istio Meets Jaeger - An Example of End-to-end Distributed Tracing

Kubernetes is great! It helps many engineering teams to realize the dream of SOA (Service Oriented Architecture). For the longest time, we build our applications around the concept of monolith mindset, which is essentially having a large computational instance running all services provided in an application. Things like account management, billing, report generation are all running from a shared resource. This worked pretty well until SOA came along and promised us a much brighter future. By breaking down applications to smaller components, and having them to talk to each other using REST or gRPC. We hope expect things will only get better from there but only to realize a new set of challenges awaits. How about cross services communication? How about observability between microservices such as logging or tracing? This post demonstrates how to set up OpenTracing inside a Kubernetes cluster that enables end-to-end tracing between serv

stevenc81 / Kubernetes Master Nodes Backup for Kops on AWS - A step-by-step
Created August 27, 2019 01:03
Kubernetes Master Nodes Backup for Kops on AWS - A step-by-step Guide

Kubernetes Master Nodes Backup for Kops on AWS - A step-by-step Guide

For those who have been using kops for a while should know the upgrade from 1.11 to 1.12 poses a greater risk, as it will upgrade etcd2 to etcd3.

Since this upgrade is disruptive to the control plane (master nodes), although brief, it's still something we take very seriously because nearly all the Buffer production services are running on this single cluster. We felt a more thorough backup process than the currently implemented Heptio Velero was needed.

To my surprises, my Google searches didn't yield any useful result on how to carry out the backup steps. To be fair, there are a few articles that are specifically for backing up master nodes created by kubeedm but nothing too concrete for `kop

Upgrading Kubernetes Cluster with Kops, and Things to Watch Out For

Alright! I'd like to apologize for the inactivity for over a year. Very embarrassingly, I totally dropped the good habit. Anyways, today I'd like to share a not so advanced and much shorter walkthrough on how to upgrade Kubernetes with kops.

At Buffer, we host our own k8s (Kubernetes for short) cluster on AWS EC2 instances since we started our journey before AWS EKS. To do this effectively, we use kops. It's an amazing tool that manages pretty much all aspects of cluster management from creation, upgrade, updates and deletions. It never failed us.

How to start?

Okay, upgrading a cluster always makes people nervous, especially a production cluster. Trust me, I've been there! There is a saying, hope is not a strategy. So instead of hoping things will go smoothly, I always have bias that shit will hit the fan if you skip testing. Plus, good luck explaining to people

The Minimalist DeFi Liquidity Alert - An effective way to manage DeFi bank run risk

What the hell are you babbling about?

For those who have no idea what the title says, I offer my apology here. Finance by itself is a deep topic with many trade-offs, similar to the world of engineering. According to a greate economists Thomas Sowell

There are no solutions, only trade-offs

In the DeFi (Decentralized Finance) sense, as much as we all love the high interest rates, the trade-offs aren't always clear in this nascent industry. The underlying concept is really simple and sound. It goes something like this:

How to Set Kubernetes Resource Requests and Limits - A Saga to Improve Cluster Stability and Efficiency

A mystery

So, it all started on September 1st, right after our cluster upgrade from 1.11 to 1.12. Almost on the next day, we began to see alerts on kubelet reported by Datadog. On some days we would get a few (3 - 5) of them, other days we would get more than 10 in a single day. The alert monitor is based on a Datadog check kubernetes.kubelet.check, and it's triggered whenever the kubelet process is down in a node.

We know kubelet plays an important role in Kubernetes scheduling. Not having it running properly in a node would directly remove that node from a functional cluster. Having more nodes with problematic kubelet then we get a cluster degradation. Now, Imagine waking up to

stevenc81 / dd-tracing-logging-examples-nodejs.js
Created May 13, 2019 00:04
A quick example for Datadog APM tracing and logging
hostname: process.env.DD_AGENT_HOST,
port: 8126,
env: 'development',
logInjection: true,
analytics: true,
const { createLogger, format, transports } = require('winston');
const addAppNameFormat = format(info => {
docker run -d --name ipfs-node -p 8080:8080 -p 8081:8081 -p 4001:4001 -p ipfs/go-ipfs:v0.4.22
docker run -it -p 28000:27017 --name mongoContainer mongo:3.6.13 mongo "mongodb+srv://<HOST>/<DB>" --username <USERNAME>