Source: Nestybox Blog

Nestybox Blog Secure (and easy) Kubernetes-in-Docker

Contents Intro Why Privileged Containers? Why Complex Docker Images? K8s-in-Docker with Sysbox Automating things a bit with Kindbox The K8s Node Container Image K8s Node Inner Image Preloading Wrapping Up Intro Recently, Docker containers are being used as a way to deploy Kubernetes (K8s) clusters. In this setup, each Docker container acts as a K8s node, and the K8s cluster is made up of a number of these containers (some acting as control-plane nodes, others as worker nodes) connected to each other via a container network, as shown below: Such a setup is ideal for development, local testing, and CI/CD, where the efficiency and ease of use of containers makes this particularly attractive (e.g., it avoids the need to deploy heavier virtual machines for the same purpose). Tools like K8s.io KinD, initially developed to test K8s itself, are now being used to create such K8s-in-Docker clusters for other purposes. While this tool works well for local developer testing, it however has important limitations that make it less suitable for enterprise or shared environments, such as: Using unsecure privileged containers. Requiring complex container images. Supporting only a limited set of cluster configurations. This article describes the reason for these problems and how to overcome them using the Sysbox runtime to deploy the K8s-in-Docker cluster with strong isolation and simple docker run commands. Cloud developers, QA engineers, DevOps, sys admins, or anyone who wishes to deploy isolated K8s environments will find this info very useful. Note that this article is specific to containers on Linux. Why Privileged Containers? As mentioned above, existing tools for deploying K8s-in-Docker use very unsecure privileged containers. The reason for this is that the K8s components running inside the Docker containers interact deeply with the Linux kernel. They do things like mount filesystems, write to /proc, chroot, etc. These operations are not allowed in a regular Docker container, but are allowed in a privileged container. However, privileged containers in many ways break isolation between the container and the underlying host. For example, it’s trivial for software running inside the container to modify system-wide kernel parameters and even reboot the host by writing to /proc. Furthermore, since the container acting as a K8s node is privileged, it means that if an inner K8s pod is deployed using a privileged security policy, that pod also has root access to the underlying host. This forces you to trust not only the software running in the K8s nodes, but also in the pods within. And this lack of isolation is not just a security problem. It can also be a functional one. For example, a privileged container has write access to /proc/sys/kernel, where several non-namespaced kernel controls reside. There is nothing preventing two or more privileged containers from writing conflicting values to these kernel controls, potentially breaking functionality in subtle ways that would be hard to debug. If you are an individual developer deploying K8s-in-Docker on your laptop, using privileged containers may not be a big deal (though it’s risky). But if you want to use this in testing or CI/CD frameworks, it’s much more problematic. Why Complex Docker Images? Another limitation of existing K8s-in-Docker tools is that they use complex container configurations. That is, they require specialized container images with custom (and tricky) container entrypoints as well as complex container run commands (with several host volume mounts, etc). For example, the K8s.io KinD base image entrypoint does many clever but tricky configurations to setup the container’s environment when it starts. This ends up restricting the choices for the end-users, who must now rely on complex base container images developed specifically for these tools, and who are constrained to the K8s cluster configurations supported by the tools. The reason for this complexity is that existing K8s-in-Docker tools rely on the software stack made up of Docker / containerd and the OCI runc to create the container. This stack was originally designed with the goal of using containers as application packaging & deployment bundles (for which it does an excellent job), but falls short of setting up the container properly to run software such as K8s inside. As a result, it’s up to the K8s-in-Docker tools to overcome these limitations, but that results in complex Docker images and Docker run commands, which in turn create restrictions for the end-users. K8s-in-Docker with Sysbox Wouldn’t it be great if a simple docker run could spawn a container inside of which Kubernetes could run seamlessly and with proper container isolation? The Sysbox container runtime makes this possible (for the first time). It does so by setting up the container with strong isolation (via the Linux user namespace) and in such a way that K8s finds all the kernel resources it needs to run properly inside the container. That is, it fixes the problem at the level of the container runtime (where the container abstraction is created). This has the effect of significantly simplifying the Docker images and commands required to deploy the containers that make up the cluster, which in turn removes complexity and enhances flexibility for the end user. For example, with Docker + Sysbox, deploying a K8s control-plane node can be done with these simple commands: 1) Deploy the container that acts as the K8s control-plane: $ docker run --runtime=sysbox-runc -d --name=k8s-control-plane --hostname=k8s-control-plane nestybox/k8s-node:v1.18.2 2) Ask the Kubeadm tool inside the container to initialize it as a K8s master node: $ docker exec k8s-control-plane sh -c "kubeadm init --kubernetes-version=v1.18.2 --pod-network-cidr=10.244.0.0/16" 3) Configure kubectl on your host to control the cluster (assumes you’ve installed kubectl already): $ docker cp k8s-control-plane:/etc/kubernetes/admin.conf $HOME/.kube/config 4) Use kubectl to configure the desired K8s container network plugin: $ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml That’s it! With these steps you’ll have a K8s master configured in less than 1 minute. Deploying a K8s worker node is even easier. 5) Deploy the container that acts as a worker node: $ docker run --runtime=sysbox-runc -d --name=k8s-worker --hostname=k8s-worker nestybox/k8s-node:v1.18.2 6) In order for the worker to join the cluster, we need a token from the control-plane: $ join_cmd=$(docker exec k8s-control-plane sh -c "kubeadm token create --print-join-command 2> /dev/null") 7) Ask the worker to join the cluster: $ docker exec k8s-worker sh -c "$join_cmd" That’s it! It takes < 15 seconds to add a worker node. To add more workers, simply repeat steps 5 to 7. Notice the simplicity of the entire operation. The Docker commands to deploy the nodes are all very simple, and the kubeadm tool makes it a breeze to setup K8s inside of the containers (with certificates and everything). It’s also fast and efficient: it takes less than 2 minutes to deploy a 10-node cluster on a laptop, with only 1GB of storage overhead (compared to 10GB if you deploy this same cluster with the K8s.io KinD tool). Moreover, the containers are strongly secured via the Linux user namespace. You can confirm this with: $ docker exec k8s-control-plane sh -c "cat /proc/self/uid_map" 0 165536 65536 which means that root user (id 0) on the K8s control plane container maps to unprivileged user 165536 on the host. No more privileged containers! And because you are using Docker commands to deploy the K8s cluster, you have full control of the cluster’s configuration, allowing you to: Choose the cluster’s topology to match your needs. Choose any container image for the K8s nodes (and one which you fully control). Choose different images for different nodes if you want. Place the containers on the Docker network of your choice. Deploy the cluster on a single host or across hosts using Docker overlay networks. Resize the cluster easily. Mount host volumes per your needs, etc. In the near future, constrain the system resources assigned to each K8s node. In short, you are only limited by Docker’s capabilities. All of this simplicity comes by virtue of having the underlying container runtime (Sysbox) take care of setting up the container properly to support running K8s (as well as other system-level workloads) inside. Automating things a bit with Kindbox Though Sysbox enables you to deploy a K8s cluster with simple Docker commands, it’s easier to have a higher level tool that will do these steps for you. There is one such tool here: https://github.com/nestybox/kindbox It’s called Kindbox (i.e., Kubernetes-in-Docker + Sysbox), and it’s basically a simple bash wrapper around Docker commands similar to those shown in the prior section. Kindbox supports commands such as creating a K8s-in-Docker cluster with a configurable number of nodes, resizing the cluster, and deleting it. You can deploy multiple clusters on the same host, knowing that they will be strongly isolated by the Sysbox runtime. Check out this video to see how it works. You may be asking: what’s the point of Kindbox if similar tools such as K8s.io KinD exist already? The answer is that while at a high level they do the same thing (i.e., manage creation of a k8s-in-docker cluster), the way they go about this is very different: K8s.io KinD requires complex Docker configurations and unsecure privileged containers due to limitations of the OCI runc. On the other hand, Kindbox uses simple Docker configurations and strongly isolated containers, by leveraging the capabilities of the Sy

Read full article »
Est. Annual Revenue
$100K-5.0M
Est. Employees
1-25
Cesar Talledo's photo - Co-Founder & CEO of Nestybox

Co-Founder & CEO

Cesar Talledo

CEO Approval Rating

90/100



Nestybox is headquartered in San Jose, California. Cesar Talledo is the Co-Founder & CEO of Nestybox. Nestybox has 1 followers on Owler.