Table of Contents
- General
- Infrastructure as Code
- Logging
- Tracing
- Monitoring & Observability
- Networking
- PaaS: Platform as a Service
- Data visualization and dashboards
- Querying
- Data Analytics
- Data Science and Machine Learning
- Security
- Testing
- Feature toggles/flags
- Development and ephemeral environments
- Internal Developer Platform
- Serverless
- Databases
- REST APIs
- Marketing
- Tools for online events
- Social networks
- Hotspots and code analysis
- Playgrounds
- Books and other resources
- DevOps Lifecycle Mesh
- Kubernetes (k8s)
- Tips, tricks, tools, etc: https://twitter.com/patrickdebois/status/1221114733186682885
- Flux
- k8s deployment tool: https://github.com/slok/kahoy
- Kustomize
- Popeye: cluster resource sanitizer
- Skaffold
- Cilium: Network policies
- https://github.com/cilium/hubble
- Local k8s:
- minikube
- https://kind.sigs.k8s.io/
- NATS (Cloud messaging)
- AWS
- https://awsclibuilder.com/home
- https://github.com/open-guides/og-aws
- AWS Security Best Practices Assessment, Auditing, Hardening and Forensics Readiness Tool
- Certifications recommended: CSAA (Certified Solutions Architect - Associate) and CSysOpA (Certified SysOps Administrator - Associate)
- AWS mock for local testing:
- https://github.com/localstack/localstack
- https://github.com/p4tin/goaws (faster for SQS/SNS)
- Moto
- Write blog post for terratest + localstack: https://github.com/gruntwork-io/terratest/blob/master/test-docker-images/moto/README.md#localstack
- Codebase with localstack examples
- Data centers
- 12factor
- Consul
- Consul/etcd registrator: https://github.com/gliderlabs/registrator
- Consul snapshots: https://github.com/pshima/consul-snapshot
- LinkerD
- Service Mesh
- Load balancing
- Service discovery: we used Consul
- Circuit breaking: detect and remove unhealthy nodes.
- Retries and timeouts
- Traffic split: e.g. for canary releases and blue/green deployment
- No ingress controller
- It offers all the info about what is happening (observability).
- Kong:
- Open Source API gateway
- Authentication
- Traffic control: restrict inbound/outbound traffic
- Load balancing
- Healthchecks and circuit breakers
- Squid (HTTP proxy)
- Varnish: cache, for HTTP performance, e.g. behind an nginx.
- HA proxy
- PagerDuty
- Distributed tracing
- Zipkin
- Jaeger
- NATS: Cloud-native messaging system.
- OpenEBS: Opensource cloud native storage solution.
- DNS:
- VPN
- https://docs.google.com/presentation/d/1TUz8TtLu6Y-UdOsXgZwanjqcIMPu-LRqyoMyEikTuvc/edit?ts=5ef10fa0#slide=id.p
- https://github.com/sshuttle/sshuttle
- https://www.wireguard.com/
- https://openvpn.net/
- Openswan
- ProtonVPN
- Windscribe
- https://tunnelblick.net/ (OpenVPN client for Mac)
- Git
- Bash scripting
- Terraform
- Atlantis
- Alternative to Terraform Cloud
- Pulumi
- Packer: immutability
- Docker
- lazydocker
- Docker platforms
- Ansible
- Chef
- Puppet
- fluentd
- systemd-journald
- Logz.io
- Logstash
- Splunk
- ELK
- Sentry
- https://grafana.com/oss/loki/
- Zipkin
- Jaeger
- OpenTelemetry
- OpenTelemetry
- Prometheus
- https://prometheus.io/docs/prometheus/latest/getting_started/
- Cortex: Cortex: horizontally scalable, highly available, multi-tenant, long term storage for Prometheus
- promtool: unit testing for rools
- https://chrome.google.com/webstore/detail/prometheus-formatter/jhfbpphccndhifmpfbnpobpclhedckbb?hl=en
- Grafana
- ELK
- Logz.io
- Statuspage
- Pingdom
- Runscope: healthcheck for external customers, connected to PagerDuty, Slack...
- Consul dashboards: internal healthcheck status
- Grafana + Prometheus
- AWS Cloudwatch + Lambda
- Postman
- Lightweight monitoring (uptime, healthcheck): https://jvns.ca/blog/2022/07/09/monitoring-small-web-services/
- Uptime
- Traffic interceptor:
- Wireshark
- https://httptoolkit.com/
- https://proxyman.io/
- Heroku
- Render
- https://fly.io/ (used by jvns)
- SQL for the cloud: https://steampipe.io/
- Tools: https://twitter.com/episuarez/status/1338035772608360451
- Amplitude
- Mixpanel
- https://posthog.com/
- Debezium: Change Data Capture (CDC)
- Streamlit: Streamlit turns data scripts into shareable web apps in minutes. All in pure Python. No front‑end experience required.
- Hugging Face Spaces: Hugging Face Spaces offer a simple way to host ML demo apps directly on your profile or your organization's profile. [Backend]
- Gradio): Gradio is the fastest way to demo your machine learning model with a friendly web interface so that anyone can use it, anywhere! [Frontend]
- MLFlow: MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.
- MLOps: end-to-end machine learning development process to design, build and manage reproducible, testable, and evolvable ML-powered software.
- Kubeflow: Kubeflow is an open-source platform for machine learning and MLOps on Kubernetes.
- VertexAI (GCP): machine learning (ML) platform that lets you train and deploy ML models and AI applications, and customize large language models (LLMs) for use in your AI-powered applications. Fast, scalable, and easy-to-use AI technologies. Branches of AI, network AI, and artificial intelligence fields in depth on Google Cloud.
- Model Cards Google Cloud: the README.md for AI models. Model cards aim to provide a concise, holistic picture of a machine learning model. The value of a shared understanding of AI models.
- Amazon SageMaker: Build, train, and deploy machine learning models for any use case with fully managed infrastructure, tools, and workflows. Amazon SageMaker is a cloud machine-learning platform that enables developers to create, train, and deploy machine-learning models in the cloud. It also enables developers to deploy ML models on embedded systems and edge-devices.
- Auth0
- Cloudflare
- Fail2ban
- Hashicorp Vault: secrets management
- Snyk
- ZAProxy: Automate network vulnerability scans (Internet facing networks and systems)
- Amazon GuardDuty: Automate network intrusion detection
- https://github.com/zricethezav/gitleaks
- https://snyk.io/product/container-vulnerability-management/
- https://www.trendmicro.com/en_us/business/products/hybrid-cloud/deep-security.html
- https://www.hackthebox.com/
- https://www.iriusrisk.com/
- Postman
- Load testing
- Pact
- Hey: HTTP load generator
- ToxiProxy: to simulate network and system conditions for chaos and resiliency testing.
- Mail SMTP servers: https://github.com/mailhog/MailHog
- Visual testing:
- https://applitools.com/
- https://percy.io/
- Oculow
- BrowserStack
- https://twitter.com/morvader/status/1452584938482634757
- https://github.com/Netflix/SimianArmy
- https://github.com/netflix/chaosmonkey
- https://medium.com/@adhorn/chaos-engineering-part-3-61579e41edd8
- https://netflix.github.io/chaosmonkey/
- https://chaostoolkit.org/
- https://github.com/KTH/royal-chaos
- https://docs.litmuschaos.io/
- OpenFeature: CNCF initiative
- https://twitter.com/mpjme/status/1301127511967961089
- GitLab Feature Flags
- https://flagger.app/
- https://www.split.io/
- Togglz
- https://launchdarkly.com/
- Unleash
- GitLab/Azure feature flags
- https://learn.hashicorp.com/tutorials/terraform/blue-green-canary-tests-deployments
- https://groundcontrol.sh/
- Flipper cloud (used in Devengo, for example)
- Tools
- https://code.visualstudio.com/docs/remote/containers
- https://github.com/features/codespaces
- https://www.gitpod.io/: Spin up fresh, automated dev environments for each task, in the cloud, in seconds (IDE as a Service).
- https://www.bunnyshell.com/: Full-stack production-like replicas on any cloud.
- https://www.okteto.com/: Instantly spin up pre-configured environments in the cloud and start developing within seconds
- https://localstack.cloud/
- Readings
- Backstage
- Roadie (based on Backstage)
- Humanitec
- https://www.cortex.io/
- Other alternatives: https://internaldeveloperplatform.org/developer-portals/
- AppSmith: internal custom application development, LowCode
- https://www.serverless.com/
- https://tech.genial.ly/en-cualquier-aplicaci%C3%B3n-o-sistema-existen-una-serie-de-acciones-que-t%C3%ADpicamente-nos-permiten-797615db17b6
- https://homeschool.dev/class/production-ready-serverless
- How we used serverless to speed up our servers" by Jessica Kerr and Ian Wilkes
- https://mockoon.com/
- OpenAPI
- Liquibase
- Flyway
- https://dev.to/juanvegadev/language-and-framework-agnostic-database-migrations-56bj
- HotJar: Website Heatmaps & Behavior Analytics Tools
- Clarity (from Microsoft): free user behavior analytics tool, Free Heatmaps & Session Recordings
- LogRocket: Session Replay | Product Analytics | Error Tracking | Identify technical and UX issues with our AI, quantify impact with analytics, and then watch session replays to see exactly what went wrong
- Google Tag Manager: measure your advertising ROI
- Product updates announcement: https://announcekit.app/
- https://www.hoppier.com/blog/best-virtual-event-platforms-and-tools-for-2021
- https://streamyard.com/
- https://vidiv.com/ (used in Tarugoconf)
- Fishbowl: https://www.stooa.com/es
- SpatialChat
- Survey: https://www.mentimeter.com/enterprise
- GetStream: Build In-App Chat. Video & Audio + Feeds
- Codescene
- Glean (System for collecting, deriving and querying facts about source code):
- https://next.github.com/projects/repo-visualization
- https://understandlegacycode.com/blog/focus-refactoring-with-hotspots-analysis/
- https://github.com/smontanari/code-forensics
- https://github.com/pbmiguel/behavioural-code-analyser
- nginx playground
- SQL playground
- https://jvns.ca/blog/2023/04/17/a-list-of-programming-playgrounds/
- https://github.com/marcosnils/awesome-playgrounds
- Books
- https://www.goodreads.com/review/list/6102002-isidro-l-pez?shelf=systems
- "Infrastructure as Code"
- "Release it!"
- Devops Handbook
- SRE
- https://gumroad.com/l/aws-good-parts/released
- https://www.amazon.com/Designing-Distributed-Systems-Patterns-Paradigms
- Posts
- Training
- Workshops, exercises and examples: