Dentrax/containerd-gc-analysis.md

## containerd-gc-analysis.md

      
    Raw
  

              containerd-gc-analysis.md
            
          
    In containerd, there is actually a garbage collector which can be found here: https://github.com/containerd/containerd/blob/master/docs/garbage-collection.md. In the cleanup phase, only objects that are not associated (i.e. have no image reference) are removed - those marked as "dirty" are kept. To clean up unused images and running/stopped containers, this can be used.
While not yet production-ready, the tool at https://github.com/Azure/eraser could be used to achieve this. However, it may be difficult and complex to run this on all nodes. Descheduler cannot solve this problem as it does not run as a daemonset, but kubelet garbage collection can be used instead (checking if it is enabled in the current configs): https://kubernetes.io/docs/concepts/architecture/garbage-collection/#containers-images.
It seems that containerd does not support log rotation. I found a solution that involves using kubelet (as described in containerd/containerd#3351 (comment), also pr: kubernetes/kubernetes#59898), which can be checked by looking at the /etc/kubernetes/kubelet-config.yaml file on the relevant node. In a newly installed cluster, the values for containerLogMaxFiles and containerLogMaxSize are 5 and 10Mi, respectively.
nerdctl recently added "nerdctl system prune --all" (https://github.com/containerd/nerdctl/pull/1264/files), which is equivalent to "crictl rmi --prune".
After talking to the containerd team, it seems that the garbage collection system is based on cleaning up unused objects: “once the container is removed, the snapshot will be GC’ed. Not right after stopping though, so if you needed to get data out of the ephemeral storage, you could.”. The GC cursor (in etcd bolt db) goes through all object types in order, including containers, and checks for references to snapshots. If a snapshot has no references, it is marked for deletion in the next cycle using tri-color marking. “This is why Kubelet can enforce ephemeral storage policies, it will end up killing the container. It is the standard way to set these limits”. Long-lived pods or containers can be killed using the Descheduler PodLifeTime policy or ephemeral storage policies enforced by kubelet.
In relation to containerd not exposing container filesystem metrics for cAdvisor to use, a KEP has been written (https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2371-cri-pod-container-stats/README.md).