Skip to content

Instantly share code, notes, and snippets.

@thimslugga
Last active June 28, 2024 05:09
Show Gist options
  • Save thimslugga/36019e15b2a47a48c495b661d18faa6d to your computer and use it in GitHub Desktop.
Save thimslugga/36019e15b2a47a48c495b661d18faa6d to your computer and use it in GitHub Desktop.
Setup Docker on Amazon Linux 2023

Setup Docker on Amazon Linux 2023

The following guide is for setting up Docker with docker-compose v2 on Amazon Linux 2023. The steps are intendend for AL2023 on EC2 but should mostly work for the AL2023 VMs running on other hypervisors.

Install and configure Docker on Amazon Linux 2023

Check for new updates

Get the current release:

rpm -q system-release --qf "%{VERSION}\n"

Find out the latest release:

sudo dnf check-release-update --latest-only --version-only

# Use the following for more verbose output
#sudo dnf check-release-update

To upgrade the host for the current release:

sudo dnf check-update --refresh
sudo dnf upgrade --refresh

To upgrade the host to the latest release:

#sudo touch /etc/dnf/vars/releasever && echo 'latest' | sudo tee /etc/dnf/vars/releasever
sudo dnf check-update --refresh --releasever=latest
sudo dnf upgrade --refresh --releasever=latest

Install base os packages

Install the following packages, which are good to have installed:

sudo dnf install --allowerasing -y \
  kernel-modules-extra \
  dnf-plugins-core \
  dnf-utils \
  dnf-plugin-support-info \
  git-core \
  git-lfs \
  grubby \
  kexec-tools \
  chrony \
  audit \
  dbus \
  dbus-daemon \
  polkit \
  systemd-pam \
  systemd-container \
  udisks2 \
  nss-util \
  nss-tools \
  dmidecode \
  nvme-cli \
  lvm2 \
  dosfstools \
  e2fsprogs \
  xfsprogs \
  xfsprogs-xfs_scrub \
  attr \
  acl \
  shadow-utils \
  shadow-utils-subid \
  fuse3 \
  squashfs-tools \
  star \
  gzip \
  pigz \
  bzip2 \
  zstd \
  xz \
  unzip \
  p7zip \
  numactl \
  iproute \
  iproute-tc \
  iptables-nft \
  nftables \
  conntrack-tools \
  ipset \
  ethtool \
  net-tools \
  iputils \
  traceroute \
  mtr \
  telnet \
  whois \
  socat \
  bind-utils \
  tcpdump \
  cifs-utils \
  nfsv4-client-utils \
  nfs4-acl-tools \
  libseccomp \
  psutils \
  python3 \
  python3-pip \
  python3-policycoreutils \
  policycoreutils-python-utils \
  bash-completion \
  vim-minimal \
  wget \
  jq \
  awscli-2 \
  ec2rl \
  ec2-utils \
  htop \
  sysstat \
  fio \
  inotify-tools \
  rsync

(Optional) Install the EC2 Instance Connect Utility

sudo dnf install --allowerasing -y ec2-instance-connect ec2-instance-connect-selinux

(Optional) Install the Amazon EFS Utils helper tool

sudo dnf install --allowerasing -y amazon-efs-utils

(Optional) Install the smart-restart utility package

Amazon Linux now ships with the smart-restart package, which the smart-restart utility restarts systemd services on system updates whenever a package is installed or deleted using the systems package manager. This occurs whenever a dnf <update|upgrade|downgrade> is executed.

The smart-restart uses the needs-restarting from the dnf-utils package and a custom denylisting mechanism to determine which services need to be restarted and whether a system reboot is advised. If a system reboot is advised, a reboot hint marker file is generated (/run/smart-restart/reboot-hint-marker).

sudo dnf install --allowerasing -y smart-restart python3-dnf-plugin-post-transaction-actions

After the installation, the subsequent transactions will trigger the smart-restart logic.

(Optional) Install and enable the Kernel Live Patching (KLP) feature

Run the following command to install and enable the kernel live patching feature:

sudo dnf install --allowerasing -y kpatch-dnf kpatch-runtime
sudo dnf kernel-livepatch -y auto
sudo systemctl enable --now kpatch.service

(Optional) Remove the EC2 Hibernation Agent

Run the following command to remove the EC2 Hibernation Agent:

sudo dnf remove -y ec2-hibinit-agent

(Optional) Install and setup the Amazon SSM Agent

Install the Amazon SSM Agent:

sudo dnf install --allowerasing -y amazon-ssm-agent

The following is a tweak, which should resolve the following reported issue.

Add the following drop-in to make sure networking is up, dns resolution works and cloud-init has finished before the amazon ssm agent is started.

sudo mkdir -p /etc/systemd/system/amazon-ssm-agent.service.d

cat <<'EOF' | sudo tee /etc/systemd/system/amazon-ssm-agent.service.d/00-override.conf
[Unit]
# To have a service start after cloud-init.target it requires the
# addition of DefaultDependencies=no due to the following default
# DefaultDependencies=y, which results in the default target e.g.
# multi-user.target to depending on the service.
#
# See the follow for more details: https://serverfault.com/a/973985
Wants=network-online.target
After=network-online.target nss-lookup.target cloud-init.target
DefaultDependencies=no
ConditionFileIsExecutable=/usr/bin/amazon-ssm-agent

EOF

sudo systemctl daemon-reload
sudo systemctl enable --now amazon-ssm-agent.service
sudo systemctl try-reload-or-restart amazon-ssm-agent.service
sudo systemctl status amazon-ssm-agent.service

(Optional) Install and setup the Unified CloudWatch Agent

Install the Unified CloudWatch Agent:

sudo dnf install --allowerasing -y amazon-cloudwatch-agent collectd

Add the following drop-in to make sure networking is up, dns resolution works and cloud-init has finished before the unified cloudwatch agent is started.

sudo mkdir -p /etc/systemd/system/amazon-cloudwatch-agent.d

cat <<'EOF' | sudo tee /etc/systemd/system/amazon-cloudwatch-agent.d/00-override.conf
[Unit]
# To have a service start after cloud-init.target it requires the
# addition of DefaultDependencies=no due to the following default
# DefaultDependencies=y, which results in the default target e.g.
# multi-user.target depending on the service.
#
# See the follow for more details: https://serverfault.com/a/973985
Wants=network-online.target
After=network-online.target nss-lookup.target cloud-init.target
DefaultDependencies=no
ConditionFileIsExecutable=/opt/aws/amazon-cloudwatch-agent/bin/start-amazon-cloudwatch-agent

EOF

sudo systemctl daemon-reload
sudo systemctl enable --now amazon-cloudwatch-agent.service
sudo systemctl try-reload-or-restart amazon-cloudwatch-agent.service
sudo systemctl status amazon-cloudwatch-agent.service

The current version of the CloudWatchAgentServerPolicy:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "cloudwatch:PutMetricData",
                    "ec2:DescribeVolumes",
                    "ec2:DescribeTags",
                    "logs:PutLogEvents",
                    "logs:DescribeLogStreams",
                    "logs:DescribeLogGroups",
                    "logs:CreateLogStream",
                    "logs:CreateLogGroup"
                ],
                "Resource": "*"
            },
            {
                "Effect": "Allow",
                "Action": [
                    "ssm:GetParameter"
                ],
                "Resource": "arn:aws:ssm:*:*:parameter/AmazonCloudWatch-*"
            }
        ]
    }

(Optional) Install Ansible on the host

Run the following to install ansible on the host:

sudo dnf install -y \
  python3-psutil \
  ansible \
  ansible-core \
  sshpass

Configure some sane OS default settings

Locale:

sudo localectl set-locale LANG=en_US.UTF-8
localectl

Hostname:

sudo hostnamectl set-hostname <hostname>
sudo hostnamectl set-chassis vm
hostnamectl

Set the system timezone to UTC and ensure chronyd is enabled and started:

sudo timedatectl set-timezone Etc/UTC
sudo systemctl enable --now chronyd
sudo timedatectl set-ntp true
timedatectl

Logging:

sudo mkdir -p /etc/systemd/journald.conf.d
cat <<'EOF' | sudo tee /etc/systemd/journald.conf.d/00-override.conf
[Journal]
SystemMaxUse=100M
RuntimeMaxUse=100M
RuntimeMaxFileSize=10M

RateLimitIntervals=1s
RateLimitBurst=10000

EOF

sudo systemctl daemon-reload
sudo systemctl restart systemd-journald.service

Configure a sane user environment for the current user e.g. ec2-user

touch ~/.{profile,bashrc,bash_profile,bash_login,bash_logout,hushlogin}
mkdir -pv "${HOME}/bin"
mkdir -pv "${HOME}/.config/environment.d"
mkdir -pv "${HOME}/.config/systemd/user"
mkdir -pv "${HOME}/.config/systemd/user/sockets.target.wants"
mkdir -pv "${HOME}/.local/share/systemd/user"
mkdir -pv "${HOME}/.local/bin"
#cat <<'EOF' | tee ~/.config/environment.d/environment_vars.conf
#PATH="${HOME}/bin:${HOME}/.local/bin:${PATH}"
#
#EOF
loginctl enable-linger $(whoami)
systemctl --user daemon-reload

If you need to switch to root user, use the following instead of sudo su - <user>.

# sudo machinectl shell <username>@
sudo machinectl shell root@

Install and configure Moby aka Docker on the host

Run the following command to install moby aka docker:

sudo dnf install --allowerasing -y \
  docker \
  containerd \
  runc \
  container-selinux \
  cni-plugins \
  oci-add-hooks \
  amazon-ecr-credential-helper \
  udica

Configure the following docker daemon settings:

sudo mkdir -p /etc/docker

cat <<'EOF' | sudo tee /etc/docker/daemon.json
{
  "debug": false,
  "experimental": false,
  "exec-opts": ["native.cgroupdriver=systemd"],
  "userland-proxy": false,
  "live-restore": true,
  "log-level": "warn",
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m",
    "max-file": "3"
  }
}
EOF

Add the current user e.g. ec2-user to the docker group:

sudo usermod -aG docker $USER

Enable and start the docker service:

sudo systemctl enable --now docker
sudo systemctl status docker

Install the Docker Compose v2 Plugin

Install the Docker Compose plugin with the following commands:

# Install the docker compose plugin for all users
sudo mkdir -p /usr/local/lib/docker/cli-plugins

sudo curl -sL https://github.com/docker/compose/releases/latest/download/docker-compose-linux-"$(uname -m)" \
  -o /usr/local/lib/docker/cli-plugins/docker-compose

# Set ownership to root and make executable
test -f /usr/local/lib/docker/cli-plugins/docker-compose \
  && sudo chown root:root /usr/local/lib/docker/cli-plugins/docker-compose
test -f /usr/local/lib/docker/cli-plugins/docker-compose \
  && sudo chmod +x /usr/local/lib/docker/cli-plugins/docker-compose

(Optional) To install for the local user, run the following commands:

mkdir -p "${HOME}/.docker/cli-plugins" \
  && touch "${HOME}/.docker/config.json"

cp /usr/local/lib/docker/cli-plugins/docker-compose "${HOME}/.docker/cli-plugins/docker-compose"

cat <<'EOF' | tee -a "${HOME}/.bashrc"

XDG_CONFIG_HOME="${HOME}/.config"
XDG_DATA_HOME="${HOME}/.local/share"
XDG_RUNTIME_DIR="${XDG_RUNTIME_DIR:-/run/user/$(id -u)}"
DBUS_SESSION_BUS_ADDRESS="unix:path=${XDG_RUNTIME_DIR}/bus"
export XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR DBUS_SESSION_BUS_ADDRESS 

#DOCKER_CONFIG=/usr/local/lib/docker
DOCKER_CONFIG="${DOCKER_CONFIG:-$HOME/.docker}"
DOCKER_TLS_VERIFY=1
export DOCKER_CONFIG DOCKER_TLS_VERIFY

#DOCKER_HOST="unix:///run/user/$(id -u)/docker.sock"
#export DOCKER_HOST

EOF

Verify the plugin is installed correctly with the following command(s):

docker compose version

(Optional) Install the Docker Scout Plugin

(Optional) Install docker scout with the following commands:

<commands goes here>

(Skip) Install the Docker Buildx Plugin

Note: You can safely skip this step as it should not be necessary due to the version of Moby shipped in AL2023 bundling the buildx plugin by default.

(Optional) Install the docker buildx plugin with the following commands:

sudo curl -sSfL 'https://github.com/docker/buildx/releases/download/v0.14.0/buildx-v0.14.0.linux-amd64' \
  -o /usr/local/lib/docker/cli-plugins/docker-buildx

#sudo curl -sL https://github.com/docker/compose/releases/latest/download/docker-buildx-linux-"$(uname -m)" \
#  -o /usr/local/lib/docker/cli-plugins/docker-buildx

# Set ownership to root and make executable
test -f /usr/local/lib/docker/cli-plugins/docker-buildx \
  && sudo chown root:root /usr/local/lib/docker/cli-plugins/docker-buildx
test -f /usr/local/lib/docker/cli-plugins/docker-buildx \
  && sudo chmod +x /usr/local/lib/docker/cli-plugins/docker-buildx

cp /usr/local/lib/docker/cli-plugins/docker-buildx "${HOME}/.docker/cli-plugins/docker-buildx"

docker buildx install

(Optional) Install the EC2 Nitro Enclave CLI tool

This is mostly optional if needed, otherwise you can just skip this one.

sudo dnf install --allowerasing -y aws-nitro-enclaves-cli aws-nitro-enclaves-cli-devel
sudo usermod -aG ne $USER
sudo systemctl enable --now nitro-enclaves-allocator.service

(Optional) Install the Nvidia Drivers

To install the Nvidia drivers:

sudo dnf install -y wget kernel-modules-extra kernel-devel gcc

Download the driver install script, run it and verify:

curl -sL 'https://us.download.nvidia.com/tesla/535.161.08/NVIDIA-Linux-x86_64-535.161.08.run' -O
sudo sh NVIDIA-Linux-x86_64-535.161.08.run -a -s --ui=none -m=kernel-open
nvidia-smi

For the Nvidia container runtime:

curl -sL 'https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo' | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo dnf check-update
sudo dnf install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

To create an Ubuntu based container with access to the host GPUs:

docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

(Optional) Configure the aws-cli for the ec2-user

# configure region
aws configure set default.region $(curl --noproxy '*' -w "\n" -s -H "X-aws-ec2-metadata-token: $(curl --noproxy '*' -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")" http://169.254.169.254/latest/dynamic/instance-identity/document | jq -r .region)
# use regional endpoints
aws configure set default.sts_regional_endpoints regional
# get credentials from imds
aws configure set default.credential_source Ec2InstanceMetadata
# get credentials last for 1hr
aws configure set default.duration_seconds 3600
# set default pager
aws configure set default.cli_pager ""
# set output to json
aws configure set default.output json

Verify:

aws configure list
aws sts get-caller-identity

(Optional) Create your first Amazon Linux 2023 based container(s)

Login to the AWS ECR service:

aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws

To create an AL2023 based container:

docker pull public.ecr.aws/amazonlinux/amazonlinux:2023
docker run -it --security-opt seccomp=unconfined public.ecr.aws/amazonlinux/amazonlinux:2023 /bin/bash

Amazon Linux 2023 Resources

Docker Resources

Containers

Performance Tuning for Amazon Linux 2023

EC2 Bandwidth Limits

ethtool -S eth0 | grep -E 'err|exceeded|missed'

NIC Tuning

#sudo ethtool -G eth0 tx 1024 rx 4096
sudo ethtool -G eth0 tx 1024 rx 8192
ethtool -c eth0

rx-adaptive on
rx usecs 20
tx usecs 64 (default)
#ethtool -C eth0 adaptive-rx off rx-usecs 0 tx-usecs 0
cat /proc/interrupts | grep Tx-Rx

GRUB Configuration

uname -sr
cat /proc/cmdline
sudo grubby --update-kernel=ALL --args="intel_idle.max_cstate=1 processor.max_cstate=1 cpufreq.default_governor=performance swapaccount=1 psi=1"
sudo grubby --info=ALL
#sudo systemctl reboot

sysctl

# start with 50-70
echo 50 > /proc/sys/net/core/busy_read
echo 50 > /proc/sys/net/core/busy_poll
echo 0 > /proc/sys/net/ipv4/tcp_sack
cat <<'EOF' | sudo tee /etc/sysctl.d/99-custom-tuning.conf
# Custom kernel sysctl configuration file
#
# Disclaimer: These settings are not a one size fits all, you will need to test and valid them in your environment.
#
# https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
# https://www.kernel.org/doc/Documentation/sysctl/net.txt
# https://www.kernel.org/doc/Documentation/networking/proc_net_tcp.txt
# https://www.kernel.org/doc/Documentation/networking/scaling.txt
#
# For binary values, 0 is typically disabled, 1 is enabled.
#
# See sysctl(8) and sysctl.conf(5) for more details.
#
# AWS References:
#
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ena-nitro-perf.html#ena-nitro-perf-considerations
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ena-improve-network-latency-linux.html#ena-latency-kernel-config
# https://github.com/amzn/amzn-drivers/blob/master/kernel/linux/ena/ENA_Linux_Best_Practices.rst#performance-optimizations-faqs
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/processor_state_control.html
# https://github.com/amzn/amzn-ec2-ena-utilities/tree/main
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-and-configure-cloudwatch-agent-using-ec2-console.html
#
# Misc References:
#
# https://github.com/leandromoreira/linux-network-performance-parameters
# https://oxnz.github.io/2016/05/03/performance-tuning-networking/
# https://www.speedguide.net/articles/linux-tweaking-121
# https://www.tweaked.io/guide/kernel/
# http://rhaas.blogspot.co.at/2014/06/linux-disables-vmzonereclaimmode-by.html
# https://fasterdata.es.net/host-tuning/linux/
# https://documentation.suse.com/sles/15-SP5/html/SLES-all/cha-tuning-network.html
# https://blog.packagecloud.io/monitoring-tuning-linux-networking-stack-receiving-data/
# https://github.com/myllynen/rhel-performance-guide
# https://github.com/myllynen/rhel-troubleshooting-guide

# Minimize console logging level for kernel printk messages.
# The defaults are very verbose and have a performance impact.
# Note: 4 4 1 7 is also fine and works too
kernel.printk=3 4 1 7

# A feature aimed at improving system responsiveness under load by
# automatically grouping task groups with similar execution patterns.
# While beneficial for desktop responsiveness, in server environments,
# especially those running Kubernetes, this behavior might not always
# be desirable as it could lead to uneven distribution of CPU resources
# among pods. 
#
# man 7 sched
# 
# The use of the cgroups(7) CPU controller to place processes in cgroups 
# other than the root CPU cgroup overrides the affect of autogrouping. 
#
# This setting enables better interactivity for desktop workloads and not
# generally suitable for many server workloads e.g. postgres db. 
#
# https://cateee.net/lkddb/web-lkddb/SCHED_AUTOGROUP.html
# https://www.postgresql.org/message-id/50E4AAB1.9040902@optionshouse.com
kernel.sched_autogroup_enabled=0

# This affects / allows processes to stay longer on a CPU core since it's 
# last run. This is a heuristic for estimating cache misses e.g. you have 
# a lot tasks to run, for those who still have a lot of their data in cache 
# it's cheaper to wait minimally and then run on the CPU core they last ran
# as cache misses cost quite more cpu-cycle wise.
#
# For those which have not much, or even no, data in the local CPU caches i.e.
# L1, and also L2, it maybe faster/better to just run the task on another CPU 
# with less work i.e. migrate it, as it must re-cache its data anyway.
#
# The heuristic uses, among other things, the time duration since the tasks last 
# run to estimate how many task data is probably still cached, as the longer the 
# task did not run the more likely it is that it's data was evicted from cache to 
# make room for another task. Now, setting this to high can have downsides, e.g.
# cache penalty may add up, but having a too low of setting is also not ideal, as 
# task migration is not exactly free i.e. as often inter-CPU locks must be acquired 
# to move a task to another's CPU cores run queue. But it seems that this values 
# default may be a bit to low for modern systems and a hypervisor work load, so you
# could try to set it to 5ms instead of 0.5 ms, and observe how your system is affected, 
# note that here a higher CPU load which may be desired as basically it just gets used 
# more efficiently i.e. less time wasted in task migrations and you can achieve more 
# throughput.
#
# A lower value e.g. like 500000 (0.5 ms) may improve the responsiveness for certain workloads.
#kernel.sched_migration_cost_ns=500000

# For rngd
#kernel.random.write_wakeup_threshold=3072

# Prevent ebpf privilege escalation, see the following:
# https://lwn.net/Articles/742170
# https://www.suse.com/support/kb/doc/?id=000020545
# https://discourse.ubuntu.com/t/unprivileged-ebpf-disabled-by-default-for-ubuntu-20-04-lts-18-04-lts-16-04-esm/27047
# 0 = re-enable, 1 = disable, 2 = disable but allow admin to re-enable without a reboot
#kernel.unprivileged_bpf_disabled=0

# https://github.com/containers/podman/blob/main/docs/tutorials/rootless_tutorial.md
user.max_user_namespaces=28633

# https://lwn.net/Articles/782745/
#vm.unprivileged_userfaultfd=1

# Ensure that your reserved kernel memory is sufficient to sustain a
# high rate of packet buffer allocations (the default value may be too small).
#
# As a rule of thumb, you should set this value to between 1-3% of available
# system memory and adjust this value up or down to meet the needs of your
# application requirements.
vm_min_free_kbytes=1048576

# Maximum number of memory map areas a process may have (memory map areas are used
# as a side-effect of calling malloc, directly by mmap and mprotect, and also when
# loading shared libraries).
vm.max_map_count=262144
vm.overcommit_memory=1

# Make sure the host doesn't swap too early
vm.swappiness=10

# Maximum percentage of dirty system memory
# Note: The default on SLES 12 and 15 the default is 20.
# https://www.suse.com/support/kb/doc/?id=000017857
vm.dirty_ratio = 10

# Percentage of dirty system memory at which background 
# writeback will start (default 10).
vm.dirty_background_ratio=5

# Some kernels won't allow dirty_ratio to be set below 5%.
# Therefore, in dealing with larger amounts of RAM, percentage ratios 
# might not be granular enough. If that is the case, then use the
# below instead of the settings above.
#
# Configure 600 MB maximum dirty cache
#vm.dirty_bytes=629145600

# Spawn background write threads once the cache holds 300 MB
#vm.dirty_background_bytes=314572800

# The value in file-max denotes the maximum number of file- handles that the Linux kernel will allocate.
# When you get lots of error messages about running out of file handles, you might want to increase this limit.
# Attempts to allocate more file descriptors than file-max are reported with printk, look for in the kernel logs.
# VFS: file-max limit <number> reached
fs.file-max=1048576

# Maximum number of concurrent asynchronous I/O operations (you might need to
# increase this limit further if you have a lot of workloads that use the AIO
# subsystem e.g.  MySQL, etc.
fs.aio-max-nr=524288
#fs.aio-max-nr=1048576

# Upper limit on the number of watches that can be created per real user ID
# Raise the limit for watches to the limit i.e. 524,288
# https://man7.org/linux/man-pages/man7/inotify.7.html
fs.inotify.max_user_watches=524288

# Suppress logging of net_ratelimit callback
#net.core.message_cost=0

# The maximum number of "backlogged sockets, accept and syn queues are governed by 
# net.core.somaxconn and net.ipv4.tcp_max_syn_backlog. The maximum number of 
# "backlogged sockets". The net.core.somaxconn setting caps both queue sizes.
# Ensure that net.core.somaxconn is always set to a value equal to or greater than 
# tcp_backlog e.g. net.core.somaxconn >= 4096.

# Increase number of incoming connections
#net.core.somaxconn = 1024
#net.ipv4.tcp_max_syn_backlog = 2048
#net.core.somaxconn = 4096
#net.ipv4.tcp_max_syn_backlog = 8192

# Increasing this value for high speed cards may help prevent losing packets
#net.core.netdev_max_backlog=16384

# Increase UDP buffer size
# https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes
# https://medium.com/@CameronSparr/increase-os-udp-buffers-to-improve-performance-51d167bb1360
# The default socket receive buffer (size in bytes)
#net.core.rmem_default=31457280
# The maximum receive socket buffer (size in bytes)
#net.core.rmem_max=7500000
# The maximum send socket buffer (size in bytes)
#net.core.wmem_max=7500000

# OR allow testing with buffers up to 16MB
#net.core.rmem_max=16777216
#net.core.wmem_max=16777216

# Increase linux auto-tuning of TCP buffer limits to 16MB
# https://blog.cloudflare.com/the-story-of-one-latency-spike/
#net.ipv4.tcp_rmem=4096 87380 16777216
#net.ipv4.tcp_wmem=4096 65536 16777216

#net.core.busy_read=50
#net.core.busy_poll=50

# It's recommended to use a 'fair queueing' qdisc e.g. fq or fq_codel. 
# For queue management, sch_fq is/was recommended instead of fq_codel as of linux 3.12.
#
# http://man7.org/linux/man-pages/man8/tc-fq.8.html
# https://www.bufferbloat.net/projects/codel/wiki/
#
# Note: fq can be safely used as a drop-in replacement for pfifo_fast.
# Note: fq is required to use tcp_bbr as it requires fair queuing.
# Note: fq is best for fat servers with tcp-heavy workloads and particularly at 10GigE and above.
# Note: fq-codel is a better choice for forwarding/routers which don't originate local traffic, 
#       hypervisors and best general purpose qdisc.
net.core.default_qdisc=fq
#net.core.default_qdisc=fq_codel

# https://research.google/pubs/pub45646/
# https://github.com/google/bbr/blob/master/README
#
# Note: This is not an official Google product.. lol
# Note: BBR will support fq_codel after linux-4.13.
# Note: BBR must be used with fq qdisc with pacing enabled, since pacing is integral to the BBR design 
#        and implementation. BBR without pacing would not function properly and may incur unnecessary 
#        high packet loss rates.
#net.ipv4.tcp_congestion_control = bbr

# Negotiate TCP ECN for active and passive connections
#
# Turn on ECN as this will let AQM sort out the congestion backpressure 
# without incurring packet losses and retransmissions.
#
# In order to make best used of this we really need ECN-enablement 
# sysctl net.ipv4.tcp_ecn on end-hosts.
#
# https://github.com/systemd/systemd/pull/9143
# https://github.com/systemd/systemd/issues/9748
#net.ipv4.tcp_ecn=1
net.ipv4.tcp_ecn=2
net.ipv4.tcp_ecn_fallback=1

# Enable forwarding so that docker networking works as expected

# Enable IPv4 forwarding
net.ipv4.ip_forward=1
net.ipv4.conf.all.forwarding=1

# Enable IPv6 forwarding
net.ipv6.conf.default.forwarding=1
net.ipv6.conf.all.forwarding=1

# Increase the outgoing port range
net.ipv4.ip_local_port_range="10000 65535"
#net.ipv4.ip_local_reserved_ports=

# Enable Multipath TCP
net.mptcp.enabled=1

# Enable TCP Window Scaling
net.ipv4.tcp_window_scaling=1
net.ipv4.tcp_adv_win_scale=1

# Bump the TTL from the default i.e. 64 to 127 on AWS
net.ipv4.ip_default_ttl=127

# Recommended for hosts with jumbo frames enabled
net.ipv4.tcp_mtu_probing=1

# Protect Against TCP Time-Wait Assassination Attacks
net.ipv4.tcp_rfc1337=1

#net.ipv4.tw_reuse=1

# Disables ICMP redirect sending
net.ipv4.conf.eth0.send_redirects=0
net.ipv4.conf.all.send_redirects=0
net.ipv4.conf.default.send_redirects=0

# Disables ICMP redirect acceptance
net.ipv4.conf.all.accept_redirects=0
net.ipv4.conf.default.accept_redirects=0
net.ipv6.conf.all.accept_redirects=0
net.ipv6.conf.default.accept_redirects=0

net.ipv4.conf.all.secure_redirects=0
net.ipv4.conf.default.secure_redirects=0

# https://blog.cloudflare.com/optimizing-the-linux-stack-for-mobile-web-per/
# https://access.redhat.com/solutions/168483
# Use this parameter to ensure that the maximum speed is used from beginning
# also for previously idle TCP connections. Avoid falling back to slow start
# after a connection goes idle keeps our cwnd large with the keep alive
# connections (kernel > 3.6).
net.ipv4.tcp_slow_start_after_idle = 0

# Decrease the time default value for connections to keep alive
#net.ipv4.tcp_keepalive_time = 300
#net.ipv4.tcp_keepalive_probes = 5
#net.ipv4.tcp_keepalive_intvl = 15

# Decrease the time default value for tcp_fin_timeout connection, FIN-WAIT-2
#net.ipv4.tcp_fin_timeout = 15

# Reduce TIME_WAIT from the 120s default to 30-60s
#net.netfilter.nf_conntrack_tcp_timeout_time_wait=30

# Reduce FIN_WAIT from teh 120s default to 30-60s
#net.netfilter.nf_conntrack_tcp_timeout_fin_wait=30

EOF

To apply these settings:

sudo systemctl daemon-reload
sudo sysctl --system

Setup local dns caching service on Amazon Linux 2023

The following steps can be used to setup a local DNS caching service (dnsmasq) to cache DNS lookups on AL2023.

sudo dnf install -y dnsmasq bind-utils

Backup the defualt configuration:

sudo cp /etc/dnsmasq.conf{,.bak}

Configure dnsmasq:

cat <<'EOF' | sudo tee /etc/dnsmasq.conf
# https://thekelleys.org.uk/dnsmasq/docs/dnsmasq-man.html
# https://thekelleys.org.uk/gitweb/?p=dnsmasq.git

## Server Configuration

# The alternative would be just 127.0.0.1 without ::1
listen-address=::1,127.0.0.1

# Port 53
port=53

# For a local only DNS resolver use interface=lo + bind-interfaces
# See for more details: https://serverfault.com/a/830737

# Listen only on the specified interface(s).
interface=lo

# dnsmasq binds to the wildcard address, even if it is listening 
# on only some interfaces. It then discards requests that it 
# shouldn't reply to. This has the advantage of working even 
# when interfaces come and go and change address.
bind-interfaces

#bind-dynamic

# Do not listen on the specified interface. 
#except-interface=eth0
#except-interface=eth1

# Turn off DHCP and TFTP Server features
#no-dhcp-interface=eth0

# The user to which dnsmasq will change to after startup
user=dnsmasq

# The group which dnsmasq will run as
group=dnsmasq

# PID file
pid-file=/var/run/dnsmasq.pid

# Whenever /etc/resolv.conf is re-read or the upstream servers are set via DBus, clear the 
# DNS cache. This is useful when new nameservers may have different data than that held in cache. 
#clear-on-reload

## Name resolution options

# Specify the upstream AWS VPC Resolver within this config file
# https://docs.aws.amazon.com/vpc/latest/userguide/vpc-dns.html#AmazonDNS
# Setting this does not suppress reading of /etc/resolv.conf, use --no-resolv to do that.
#server=169.254.169.253
#server=fd00:ec2::253

# Specify upstream servers directly
#server=/ec2.internal/169.254.169.253
#server=/compute.internal/169.254.169.253

# IPv6 addresses may include an %interface scope-id
#server=/ec2.internal/fd00:ec2::253%eth0
#server=/compute.internal/fd00:ec2::253%eth0

# https://tailscale.com/kb/1081/magicdns
# https://tailscale.com/kb/1217/tailnet-name
#server=/beta.tailscale.net/100.100.100.100@tailscale0

# To query all upstream servers simultaneously
#all-servers

# Query upstream servers in order
strict-order

# Later versions of windows make periodic DNS requests which don't get sensible answers 
# from the public DNS and can cause problems by triggering dial-on-demand links.
# This flag turns on an option to filter such requests. 
#filterwin2k

# Specify the upstream resolver within another file
resolv-file=/etc/resolv.dnsmasq

# Uncomment if specify the upstream server in here so you no longer poll 
# the /etc/resolv.conf file for changes.
#no-poll

# Uncomment if you specify the upstream server in here so you don't read 
# /etc/resolv.conf. Get upstream servers only from cli or dnsmasq conf.
#no-resolv

# Additional hosts files to include
#addn-hosts=/etc/dnsmasq-blocklist

# Send queries for internal domain to another internal resolver
#address=/int.example.com/10.10.10.10

# Examples of blocking TLDs or subdomains
#address=/.local/0.0.0.0
#address=/.example.com/0.0.0.0

# Return answers to DNS queries from /etc/hosts and --interface-name and
# --dynamic-host which depend on the interface over which the query was received.
#localise-queries

# Never forward addresses in the non-routed address spaces
bogus-priv

# Never forward plain names
domain-needed

# Reject private addresses from upstream nameservers
stop-dns-rebind

# Disable the above entirely by commenting out the option OR allow RFC1918 responses 
# from specific domains by commenting out and/or adding additional internal domains.
#rebind-domain-ok=/int.example.com/
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-naming.html
rebind-domain-ok=/ec2.internal/compute.internal/local/

# Exempt 127.0.0.0/8 and ::1 from rebinding checks
rebind-localhost-ok

# Set the maximum number of concurrent DNS queries.
# The default value is 150. Adjust to your needs.
#dns-forward-max=150

# Set the size of dnsmasq's cache, default is 150 names
cache-size=1000

# Without this option being set, the cache statistics are also available in 
# the DNS as answers to queries of class CHAOS and type TXT in domain bind. 
no-ident

# The following directive controls whether negative caching 
# should be enabled or not. Negative caching allows dnsmasq 
# to remember “no such domain” answers from the parent 
# nameservers, so it does not query for the same non-existent 
# hostnames again and again.
#no-negcache

# Negative replies from upstream servers normally contain 
# time-to-live information in SOA records which dnsmasq uses 
# for caching. If the replies from upstream servers omit this 
# information, dnsmasq does not cache the reply. This option 
# gives a default value for time-to-live (in seconds) which 
# dnsmasq uses to cache negative replies even in the absence 
# of an SOA record.  
neg-ttl=60

# Uncomment to enable validation of DNS replies and cache DNSSEC data.

# Validate DNS replies and cache DNSSEC data.
#dnssec

# As a default, dnsmasq checks that unsigned DNS replies are legitimate: this entails 
# possible extra queries even for the majority of DNS zones which are not, at the moment,
# signed. 
#dnssec-check-unsigned

# Copy the DNSSEC Authenticated Data bit from upstream servers to downstream clients.
#proxy-dnssec

# https://data.iana.org/root-anchors/root-anchors.xml
#conf-file=/usr/share/dnsmasq/trust-anchors.conf

# The root DNSSEC trust anchor
#
# Note that this is a DS record (ie a hash of the root Zone Signing Key)
# If was downloaded from https://data.iana.org/root-anchors/root-anchors.xml
#trust-anchor=.,19036,8,2,49AAC11D7B6F6446702E54A1607371607A1A41855200FD2CE1CDDE32F24E8FB5

## Logging directives

#log-async
#log-dhcp

# Uncomment to log all queries
#log-queries

# Uncomment to log to stdout
#log-facility=-

# Uncomment to log to /var/log/dnsmasq.log
log-facility=/var/log/dnsmasq.log

EOF

Create the following file with the upstream resolvers:

cat <<'EOF' | sudo tee /etc/resolv.dnsmasq
nameserver 169.254.169.253
#nameserver fd00:ec2::253

EOF

Validate the configuration

sudo dnsmasq --test

Make sure that systemd-resolved is not configured to be a stub resolver:

sudo mkdir -pv /etc/systemd/resolved.conf.d

cat <<'EOF' | sudo tee /etc/systemd/resolved.conf.d/00-override.conf
[Resolve]
DNS=127.0.0.1
DNSStubListener=no
FallbackDNS=
MulticastDNS=no
LLMNR=no

EOF

sudo systemctl daemon-reload
sudo systemctl restart systemd-resolved

Unlink the stub and re-create the /etc/resolv.conf file:

sudo unlink /etc/resolv.conf

cat <<'EOF' | sudo tee /etc/resolv.conf
nameserver ::1
nameserver 127.0.0.1
search ec2.internal
options edns0 timeout:1 attempts:5
#options trust-ad

EOF

Enable and start the service:

sudo systemctl enable --now dnsmasq.service
sudo systemctl restart dnsmasq.service

Verify:

dig aws.amazon.com @127.0.0.1
#!/bin/bash
# https://brew.sh/
# https://docs.brew.sh/Homebrew-on-Linux#install
sudo dnf groupinstall 'Development Tools'
sudo dnf install --alowerasing -y procps-ng curl file git git-lfs
# set password as homebrew script wont allow running as root i.e. sudo
sudo passwd ec2-user
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# add to bash env
(echo; echo 'eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"') >> /home/ec2-user/.bashrc
eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"
# Ensure `/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin` is in your PATH
echo $PATH
# View formulae details about the homebrew packaage:
#brew info jq
# install packages via homebrew:
#brew install gcc

Setup DataDog Vector on Amazon Linux 2023

Note: This is WIP ATM.

Install DataDog Vector for exporting journaled logs to CloudWatch Logs

Either use the setup script or manually add the repo and install the package.

Setup script:

bash -c "$(curl -L https://setup.vector.dev)"

Manually add repo:

cat <<'EOF' | sudo tee /etc/yum.repos.d/vector.repo
[vector]
name = Vector
baseurl = https://yum.vector.dev/stable/vector-0/$basearch/
enabled=1
gpgcheck=1
repo_gpgcheck=1
priority=1
gpgkey=https://keys.datadoghq.com/DATADOG_RPM_KEY_CURRENT.public
       https://keys.datadoghq.com/DATADOG_RPM_KEY_B01082D3.public
       https://keys.datadoghq.com/DATADOG_RPM_KEY_FD4BF915.public

EOF

Install the Vector package:

sudo dnf install -y vector

Backup the default configuration file and then configure the Vector service:

sudo mv /etc/vector/vector.yaml{,.bak}
cat <<'EOF' | sudo tee /etc/vector/vector.yaml
sources:
  my_journald_source:
    type: "journald"

sinks:
  my_cloudwatch_sink:
    type: "aws_cloudwatch_logs"
    inputs:
      - "my_journald_source"
    compression: "gzip"
    encoding:
      codec: "json"
    #create_missing_group: true
    #create_missing_stream: true
    #endpoint: http://127.0.0.0:5000/path/to/service
    group_name: "prodenv"
    region: "us-east-1"
    stream_name: "prodsite/{{ host }}"

EOF

Verify the configuration file is valid:

vector validate
#vector --config /etc/vector/vector.yaml --require-healthy
#cloud-config
# vim:syntax=yaml
disable_ec2_metadata: false
datasource:
Ec2:
timeout: 50
max_wait: 120
metadata_urls:
- http://169.254.169.254:80
- http://[fd00:ec2::254]:80
#- http://instance-data:8773
apply_full_imds_network_config: true
# boot commands
# default: none
# This is very similar to runcmd above, but commands run very early
# in the boot process, only slightly after a 'boothook' would run.
# - bootcmd will run on every boot
# - INSTANCE_ID variable will be set to the current instance ID
# - 'cloud-init-per' command can be used to make bootcmd run exactly once
bootcmd:
- systemctl stop amazon-ssm-agent
package_update: false
package_upgrade: false
package_reboot_if_required: false
packages:
- docker
manage_resolv_conf: true
resolv_conf:
nameservers: ['169.254.169.253']
searchdomains:
- ec2.internal
domain: ec2.internal
options:
timeout: 5
# set the locale to a given locale
# default: en_US.UTF-8
locale: en_US.UTF-8
# disable ssh access as root.
# if you want to be able to ssh in to the system as the root user
# rather than as the 'ubuntu' user, then you must set this to false
# default: true
disable_root: true
write_files:
- path: /etc/systemd/system/amazon-ssm-agent.service.d/00-override.conf
permissions: "0644"
content: |
[Unit]
# To have a service start after cloud-init.target it requires the
# addition of DefaultDependencies=no due to the following default
# DefaultDependencies=y, which results in the default target e.g.
# multi-user.target to depending on the service.
#
# See the follow for more details: https://serverfault.com/a/973985
Wants=network-online.target
After=network-online.target nss-lookup.target cloud-init.target
DefaultDependencies=no
ConditionFileIsExecutable=/usr/bin/amazon-ssm-agent
@banthaherder
Copy link

Thank you I found this super helpful for configuring AL2023 nodes with docker-compose.

For the docker-compose install we could extract the node's cpu arch x86_64 vs aarch64 to download the appropriate binary

# determine cpu architecture
CPU_ARCH=$(lscpu | grep Architecture | awk '{print $2}')

# download the correct binary
sudo curl -sL https://github.com/docker/compose/releases/latest/download/docker-compose-linux-$CPU_ARCH -o /usr/local/bin/docker-compose

# make executable
sudo chmod +x /usr/local/bin/docker-compose

# verify 
docker-compose --version

@The-Alchemist
Copy link

I would personally recommend putting the docker-compose binary into /usr/local/lib/docker/cli-plugins/, as per the official Docker Compose instructions, so you can invoke it with docker compose (note, no hyphen), which seems to be the "new way".

@thimslugga
Copy link
Author

@banthaherder I'm glad to hear that it helped out. I've updated the steps based on your feedback.

@The-Alchemist Thanks for the feedback. I've updated the steps based on your feedback.

@MSoliven
Copy link

This didn't work for me:

sudo curl -sL https://github.com/docker/compose/releases/latest/download/docker-compose-linux-$(uname -m) ......

It should be the following, i.e. download comes first before the version tag:

sudo curl -sL https://github.com/docker/compose/releases/download/latest/docker-compose-linux-$(uname -m)

But in my case, I had to substitute "latest" with v2.26.1.

@thimslugga
Copy link
Author

@MSoliven ty for sharing, I've updated the doc to address your feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment