mvazquezc/demo1.md Secret

## demo1.md

      
    Raw
  

              demo1.md
            
          
    Capabilities on Containers demos

Demo 1 - Run a container and get its thread capabilities


Let’s run a test container, this container has an application that listens on a given port, but that’s not important for now:
podman run -d --rm --name reversewords-test quay.io/mavazque/reversewords:latest


We can always get capabilities for a process by querying the /proc filesystem:
# Get container's PID
CONTAINER_PID=$(podman inspect reversewords-test --format {{.State.Pid}})
# Get caps for a given PID
grep Cap /proc/${CONTAINER_PID}/status


We get the capability sets in hex format, we can decode them using capsh tool:
capsh --decode=00000000800405fb


We can use podman inspect as well:
podman inspect reversewords-test --format {{.EffectiveCaps}}


Stop the container:
podman stop reversewords-test


Demo 2 - Container running with UID 0 vs container running with nonroot UID


Run our test container with a root uid and get it’s capabilities:
podman run --rm -it --user 0 --entrypoint /bin/bash --name reversewords-test quay.io/mavazque/reversewords:ubi8
grep Cap /proc/1/status


We can see thread's permitted and effective capability sets populated, let's decode them:
capsh --decode=00000000800405fb


Exit the container:
exit


Same test but running the container with a nonroot UID:
podman run --rm -it --user 1024 --entrypoint /bin/bash --name reversewords-test quay.io/mavazque/reversewords:ubi8 
grep Cap /proc/1/status


We can see thread's permitted and effective capability sets cleared, we can exit our container now:
exit


We can requests extra capabilities and those will be assigned to the corresponding sets:
podman run --rm -it --user 1024 --cap-add=cap_net_bind_service --entrypoint /bin/bash --name reversewords-test quay.io/mavazque/reversewords:ubi8
grep Cap /proc/1/status


Since Podman supports ambient capabilities, you can see how we got the NET_BIND_SERVICE cap into the ambient, permitted and effective sets.


We can exit the container now:
exit


Demo 3 - Real world scenario

Using thread capabilities


We can control in which port our application listens by using the APP_PORT environment variable. Let’s try to run our application in a non-privileged port with a non-privileged user:
podman run --rm --user 1024 -e APP_PORT=8080 --name reversewords-test quay.io/mavazque/reversewords:ubi8


Stop the container with Ctrl+C and try to bind to port 80 this time:
podman run --rm --user 1024 -e APP_PORT=80 --name reversewords-test quay.io/mavazque/reversewords:ubi8


This time it fails, remember that since we're running as nonroot, permitted and effective capability sets were cleared (so NET_BIND_SERVICE present on podman's default cap set is not available).


We know that the capability NET_BIND_SERVICE allows unprivileged processes to bind to ports under 1024, let’s assign this capability to the container and see what happens:
podman run --rm --user 1024 -e APP_PORT=80 --cap-add=cap_net_bind_service --name reversewords-test quay.io/mavazque/reversewords:ubi8


This time it worked because the NET_BIND_SERVICE cap was added to the ambient, permitted and effective sets.


You can stop the container using Ctrl+C.


Using file capabilities


We added the NET_BIND_SERVICE capability to our binary when we built the image:
setcap 'cap_net_bind_service+ep' /usr/bin/reverse-words


Let's take a look inside the container:
podman run --rm -it --entrypoint /bin/bash --user 1024 -e APP_PORT=80 --name reversewords-test quay.io/mavazque/reversewords-captest:latest
getcap /usr/bin/reverse-words


The capability is added to the effective and permitted file capability sets.


Let's review the thread capabilities:
grep Cap /proc/1/status 


As you can see, the effective and permitted sets are cleared. But the inheritable and bounding do have the NET_BIND_SERVICE.


Let's run our app:
/usr/bin/reverse-words &


We were able to bind to port 80, the binary had the file capability required to do that and it was present on the inheritable and bounding sets, to the thread adquired the capability on its effective set. We can check the effective and permitted sets:
grep Cap /proc/<app_pid>/status


We can exit the container now.
exit


Does this mean that we can bypass thread capabilities? - Let's see:
podman run --rm -it --entrypoint /bin/bash --user 1024 --cap-drop=all -e APP_PORT=80 --name reversewords-test quay.io/mavazque/reversewords-captest:latest


Check the cointainer thread capabilities:
grep Cap /proc/1/status


All sets are zeroed, let's try to run our app:
/usr/bin/reverse-words


The kernel blocked the execution, since NET_BIND_SERVICE capability cannot be acquired.


That answers the question, NO. Now we can exit the container:
exit


## demo2.md

      
    Raw
  

              demo2.md
            
          
    Seccomp on Containers demos

Demo 1 - Create your own seccomp profile


We will use the OCI Hook project in order to generate the seccomp profile for our app


Create a container with the OCI Hook which runs our application:
sudo podman run --rm --annotation io.containers.trace-syscall="of:/tmp/ls.json" fedora:32 ls / > /dev/null


The hook wrote the seccomp profile to /tmp/ls.json, let's review it
jq < /tmp/ls.json


We can now run our app with this profile
podman run --rm --security-opt seccomp=/tmp/ls.json fedora:32 ls /


What happens if we change the command?
podman run --rm --security-opt seccomp=/tmp/ls.json fedora:32 ls -l /


The required syscalls are not allowed, so it fails. Let's use the hook to append the ones we're missing:
sudo podman run --rm --annotation io.containers.trace-syscall="if:/tmp/ls.json;of:/tmp/lsl.json" fedora:32 ls -l / > /dev/null


We have an updated seccomp profile now, let's diff them:
diff <(jq -S . /tmp/ls.json) <(jq -S . /tmp/lsl.json)


We can use this new profile to run our app:
podman run --rm --security-opt seccomp=/tmp/lsl.json fedora:32 ls -l /


## demo3.md

      
    Raw
  

              demo3.md
            
          
    Capabilities on Kubernetes demos

Demo 1 - Pod running with UID 0 vs container running with nonroot UID


Cluster was created with the following command: kcli create kube generic -P masters=1 -P workers=1 -P master_memory=4096 -P numcpus=2 -P worker_memory=4096 -P sdn=calico -P version=1.24 -P ingress=true -P ingress_method=nginx -P metallb=true -P engine=crio -P domain=linuxera.org caps-cluster


Create a namespace
NAMESPACE=test-capabilities
kubectl create ns ${NAMESPACE}


Create a pod running our application with UID 0:
cat <<EOF | kubectl -n ${NAMESPACE} create -f -
apiVersion: v1
kind: Pod
metadata:
  name: reversewords-app-captest-root
spec:
  containers:
  - image: quay.io/mavazque/reversewords:ubi8
    name: reversewords
    securityContext:
      runAsUser: 0
  dnsPolicy: ClusterFirst
  restartPolicy: Never
status: {}
EOF


Let's review the thread capability sets:
kubectl -n ${NAMESPACE} exec -ti reversewords-app-captest-root -- grep Cap /proc/1/status


We can see that the permitted and effective set have some capabilities, if we decode them this is what we get:
capsh --decode=00000000000005fb


Now, let's run the same application pod but with a nonroot UID:
cat <<EOF | kubectl -n ${NAMESPACE} create -f -
apiVersion: v1
kind: Pod
metadata:
  name: reversewords-app-captest-nonroot
spec:
  containers:
  - image: quay.io/mavazque/reversewords:ubi8
    name: reversewords
    securityContext:
      runAsUser: 1024
  dnsPolicy: ClusterFirst
  restartPolicy: Never
status: {}
EOF


If we review the thread capability sets this is what we get:
kubectl -n ${NAMESPACE} exec -ti reversewords-app-captest-nonroot -- grep Cap /proc/1/status


The permitted and effective sets got cleared, if you remember this is expected. The problem on Kube is that it doesn't support ambient capabilities, as you can see the ambient set is cleared. That leaves us only with two options: File caps or caps aware apps.


Demo 2 - Application with NET_BIND_SERVICE


In this first deployment we are going to run our app with root uid and drop every runtime capability but NET_BIND_SERVICE.
cat <<EOF | kubectl -n ${NAMESPACE} create -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    app: reversewords-app-rootuid
  name: reversewords-app-rootuid
spec:
  replicas: 1
  selector:
    matchLabels:
      app: reversewords-app-rootuid
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: reversewords-app-rootuid
    spec:
      containers:
      - image: quay.io/mavazque/reversewords:ubi8
        name: reversewords
        resources: {}
        env:
        - name: APP_PORT
          value: "80"
        securityContext:
          runAsUser: 0
          capabilities:
            drop:
            - all
            add:
            - NET_BIND_SERVICE
status: {}
EOF


If we get the application logs we can see that it started properlly:
kubectl -n ${NAMESPACE} logs deployment/reversewords-app-rootuid


If we look at the capability sets this is what we get:
kubectl -n ${NAMESPACE} exec -ti deployment/reversewords-app-rootuid -- grep Cap /proc/1/status


We have the NET_BIND_SERVICE available so it worked as expected.


Now, we are dropping all of the runtime’s default capabilities, on top of that we add the NET_BIND_SERVICE capability and request the app to run with non-root UID. In the environment variables we configure our app to listen on port 80.
cat <<EOF | kubectl -n ${NAMESPACE} create -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    app: reversewords-app-nonrootuid
  name: reversewords-app-nonrootuid
spec:
  replicas: 1
  selector:
    matchLabels:
      app: reversewords-app-nonrootuid
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: reversewords-app-nonrootuid
    spec:
      containers:
      - image: quay.io/mavazque/reversewords:ubi8
        name: reversewords
        resources: {}
        env:
        - name: APP_PORT
          value: "80"
        securityContext:
          runAsUser: 1024
          capabilities:
            drop:
            - all
            add:
            - NET_BIND_SERVICE
status: {}
EOF


Let's check the logs:
kubectl -n ${NAMESPACE} logs deployment/reversewords-app-nonrootuid


The application failed to bind to port 80, let's update the confiuration so we can access the pod an check the capability sets:
# Patch the app so it binds to port 8080
kubectl -n ${NAMESPACE} patch deployment reversewords-app-nonrootuid -p '{"spec":{"template":{"spec":{"$setElementOrder/containers":[{"name":"reversewords"}],"containers":[{"$setElementOrder/env":[{"name":"APP_PORT"}],"env":[{"name":"APP_PORT","value":"8080"}],"name":"reversewords"}]}}}}'
# Get capability sets
kubectl -n ${NAMESPACE} exec -ti deployment/reversewords-app-nonrootuid -- grep Cap /proc/1/status


We don't have the NET_BIND_SERVICE in the effective and permitted set, that means that in order for this to work we will need the capability to be in the ambient set, but this is not supported yet on Kubernetes, we will need to make us of file capabilities.


We have an image with the file capabilities configured, let's update the deployment to use port 80 and this new image:
kubectl -n ${NAMESPACE} patch deployment reversewords-app-nonrootuid -p '{"spec":{"template":{"spec":{"$setElementOrder/containers":[{"name":"reversewords"}],"containers":[{"$setElementOrder/env":[{"name":"APP_PORT"}],"env":[{"name":"APP_PORT","value":"80"}],"image":"quay.io/mavazque/reversewords-captest:latest","name":"reversewords"}]}}}}'


Let's check the logs for the app:
kubectl -n ${NAMESPACE} logs deployment/reversewords-app-nonrootuid


If we check the capabilities now this is what we get:
kubectl -n ${NAMESPACE} exec -ti deployment/reversewords-app-nonrootuid -- grep Cap /proc/1/status


We can check the file capabilities configured in our binary as well:
kubectl -n ${NAMESPACE} exec -ti deployment/reversewords-app-nonrootuid -- getcap /usr/bin/reverse-words


## demo4.md

      
    Raw
  

              demo4.md
            
          
    Seccomp Profiles on Kubernetes demos

Demo 1 - Running a workload with a custom seccomp profile


Add below's seccomp profile in your kubernetes nodes under /var/lib/kubelet/seccomp/centos8-ls.json
{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": [
    "SCMP_ARCH_X86_64"
  ],
  "syscalls": [
    {
      "names": [
        "access",
        "arch_prctl",
        "brk",
        "capget",
        "capset",
        "chdir",
        "close",
        "epoll_ctl",
        "epoll_pwait",
        "execve",
        "exit_group",
        "fchown",
        "fcntl",
        "fstat",
        "fstatfs",
        "futex",
        "getdents64",
        "getpid",
        "getppid",
        "ioctl",
        "mmap",
        "mprotect",
        "munmap",
        "nanosleep",
        "newfstatat",
        "openat",
        "prctl",
        "pread64",
        "prlimit64",
        "read",
        "rt_sigaction",
        "rt_sigprocmask",
        "rt_sigreturn",
        "sched_yield",
        "seccomp",
        "set_robust_list",
        "set_tid_address",
        "setgid",
        "setgroups",
        "setuid",
        "stat",
        "statfs",
        "tgkill",
        "write"
      ],
      "action": "SCMP_ACT_ALLOW",
      "args": [],
      "comment": "",
      "includes": {},
      "excludes": {}
    }
  ]
}


Create a namespace for our workload
NAMESPACE=test-seccomp
kubectl create ns ${NAMESPACE}


We can configure seccomp profiles at pod or container level, this time we're going to configure it at pod level:
cat <<EOF | kubectl -n ${NAMESPACE} create -f -
apiVersion: v1
kind: Pod
metadata:
  name: seccomp-ls-test
spec:
  securityContext:
    seccompProfile:
      type: Localhost
      localhostProfile: centos8-ls.json
  containers:
  - image: registry.centos.org/centos:8
    name: seccomp-ls-test
    command: ["ls", "/"]
  dnsPolicy: ClusterFirst
  restartPolicy: Never
status: {}
EOF


We can check pod logs:
kubectl -n ${NAMESPACE} logs seccomp-ls-test


Let's try to modify the container command, this time let's run 'ls -l /':
 cat <<EOF | kubectl -n ${NAMESPACE} create -f -
apiVersion: v1
kind: Pod
metadata:
  name: seccomp-lsl-test
spec:
  containers:
  - image: registry.centos.org/centos:8
    name: seccomp-lsl-test
    command: ["ls", "-l", "/"]
    securityContext:
      seccompProfile:
        type: Localhost
        localhostProfile: centos8-ls.json
  dnsPolicy: ClusterFirst
  restartPolicy: Never
status: {}
EOF


This time the pod failed since the seccomp profile doesn't allow the required syscalls for ls -l / to run:
kubectl -n ${NAMESPACE} logs seccomp-lsl-test