Source: Nestybox Blog

Nestybox Blog Secure Docker-in-Kubernetes

Contents Intro Motivation Setup Why is Sysbox Useful Here? Kubernetes Cluster Creation Defining the Pods (with Docker inside) Persistent Docker Cache Deploying the Pods Verify the Pods are Working Exposing the Pods IP to the outside Connecting Remotely to the Pods Shared Docker Images across Docker Engines Resource Limits Scaling Pod Instances Persistent Volume Removal Docker Build Context Conclusion Resources Intro This post shows you how to run Docker inside a secure (rootless) Kubernetes pod. That is, you create one or more Kubernetes pods and inside of each you run Docker. While running Docker inside pods is not new, what’s different here is that the pod will not be an insecure “privileged� pod. Instead, it will be a fully unprivileged (rootless) pod launched with Kubernetes and the Sysbox runtime, which means you can use this setup in enterprise settings where security is very important. We will show you how to set this up quickly and easily with examples, and afterwards you can adjust these per your needs. Motivation There are several uses cases for running Docker inside a Kubernetes pod; a couple of useful ones are: Creating a pool of Docker engines on the cloud. Each user is assigned one such engine and connects remotely to it via the Docker CLI. Each Docker engine runs inside a Kubernetes pod (instead of a VM), so operators can leverage the power of Kubernetes to manage the pool’s resources. Running Docker inside Kubernetes-native CI jobs. Each job is deployed inside a pod and the job uses the Docker engine running inside the pod to build container images (e.g., Buildkit), push them to some repo, run them, etc. In this blog post we focus on the first use case. A future blog post will focus on the second use case. Setup The diagram below shows the setup we will create: As shown: Kubernetes will deploy the pods with the Sysbox runtime. Each pod will run a Docker engine and SSH in it. Each Docker engine will be assigned to a user (say a developer working from home with a laptop). The user will connect remotely to her assigned Docker engine using the Docker CLI. Why is Sysbox Useful Here? Prior to Sysbox, the setup shown above required insecure “privileged� containers or VM-based alternatives such as KubeVirt. But privileged containers are too insecure, and VMs are slower, heavier, and harder to setup (e.g., KubeVirt requires nested virtualization on the cloud). With Sysbox, you can do this more easily and efficiently, using secure (rootless) containers without resorting to VMs. Kubernetes Cluster Creation Ok, let’s get to it. First, you need a Kubernetes cluster with Sysbox installed in it. It’s pretty easy to set this up as Sysbox works on EKS, GKE, AKS, on-prem Kubernetes, etc. See these instructions to install Sysbox on your cluster. For this example, I am using a 3-node Kubernetes cluster on GKE, and I’ve installed Sysbox on it with this single command: kubectl apply -f https://raw.githubusercontent.com/nestybox/sysbox/master/sysbox-k8s-manifests/sysbox-install.yaml Defining the Pods (with Docker inside) Once Sysbox is installed on your cluster, next step is to define the pods that carry the Docker engine in them. We need a container image that carries the Docker engine. In this example, we use an image called nestybox/alpine-supervisord-docker:latest that carries Alpine + Supervisord + sshd + Docker. The Dockerfile is here. NOTE: You can use another image if you would like. Just make sure that the image is configured to start Docker and SSH inside the container automatically. Next, let’s create a Kubernetes StatefulSet that will provision 6 pod instances (e.g., 2 per node). Each pod will allow remote access to the Docker engine via ssh. Here is the associated yaml file: $ cat dockerd-statefulset.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: dockerd-statefulset spec: selector: matchLabels: app: dockerd serviceName: "dockerd" replicas: 6 template: metadata: labels: app: dockerd annotations: io.kubernetes.cri-o.userns-mode: "auto:size=65536" spec: runtimeClassName: sysbox-runc terminationGracePeriodSeconds: 20 containers: - name: alpine-docker image: nestybox/alpine-supervisord-docker:latest ports: - containerPort: 22 name: ssh volumeMounts: - name: docker-cache mountPath: /var/lib/docker volumeClaimTemplates: - metadata: name: docker-cache spec: accessModes: [ "ReadWriteOnce" ] storageClassName: "gce-pd" resources: requests: storage: 2Gi podManagementPolicy: Parallel Before we apply this yaml, let’s analyze a few things about it. First, we chose a StatefulSet (instead of a Deployment) because we want each pod to have unique and persistent network and storage resources across it’s life cycle. This way if a pod goes down, we can recreate it and it will have the same IP address and the same persistent storage assigned to it. Second, note the following about the StatefulSet spec: It creates 6 pods in parallel (see replicas and podManagementPolicy). The pods are rootless by virtue of using Sysbox (see the cri-o annotation and sysbox-runc runtimeClassName). Each pod exposes port 22 (ssh). Each pod has a persistent volume mounted onto the pod’s /var/lib/docker directory (see next section). Persistent Docker Cache In the StatefulSet yaml shown above, we mounted a persistent volume on each pod’s /var/lib/docker directory. Doing this is optional, but enables us to preserve the state of the Docker engine (aka “the Docker cache�) across the pod’s life cycle. This state includes pulled images, Docker volumes and networks, and more. Without this, the Docker state will be lost when the pod stops. Note that each pod must have a dedicated volume for this. Multiple pods can’t share the same volume because each Docker engine must have a dedicated cache (it’s a Docker requirement). Also, note that the persistent storage is provisioned dynamically (at pod creation time, one volume per pod). This is done via a volumeClaimTemplate directive, which claims a 2GiB volume of a storage class named “gce-pd�. For this example, 2GiB is sufficient; for a production scenario, you’ll likely need much more storage since Docker storage can add up over time when pulling multiple images. What is “gce-pd�? It’s a storage class that uses the Google Compute Engine (GCE) storage provisioner. The resource definition is below: $ cat gce-pd.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: gce-pd provisioner: kubernetes.io/gce-pd parameters: type: pd-ssd fstype: ext4 replication-type: none volumeBindingMode: WaitForFirstConsumer Since my cluster is on GKE, using the GCE storage provisioner makes sense. Depending on your scenario, you can use any other provisioner supported by Kubernetes (e.g., AWS EBS, Azure Disk, etc). In addition, whenever we use volumeClaimTemplate, we must also define a dummy local-storage class (as otherwise Kubernetes will fail to deploy the pod): $ cat local-storage.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: local-storage provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumer Deploying the Pods With this in place, we can now apply the yamls shown in the prior section. $ kubectl apply -f gce-pd.yaml $ kubectl apply -f local-storage.yaml $ kubectl apply -f dockerd-statefulset.yaml If all goes well, you should see the StatefulSet pods deployed within 10->20 seconds, as shown below: $ kubectl get pods NAME READY STATUS RESTARTS AGE dockerd-statefulset-0 1/1 Running 0 9m51s dockerd-statefulset-1 1/1 Running 0 9m51s dockerd-statefulset-2 1/1 Running 0 9m51s dockerd-statefulset-3 1/1 Running 0 9m51s dockerd-statefulset-4 1/1 Running 0 9m51s dockerd-statefulset-5 1/1 Running 0 9m51s You should also see the persistent volumes that Kubernetes dynamically allocated to the pods: $ kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-377c35d8-4075-4d40-9d26-7e4acd42cbea 2Gi RWO Delete Bound default/docker-cache-dockerd-statefulset-1 gce-pd 14m pvc-5937a358-5111-4b91-9cce-87a8efabbb62 2Gi RWO Delete Bound default/docker-cache-dockerd-statefulset-3 gce-pd 14m pvc-5ca2f6ba-627c-4b19-8cf0-775395868821 2Gi RWO Delete Bound default/docker-cache-dockerd-statefulset-4 gce-pd 14m pvc-9812e3df-6d7e-439a-9702-03925af098a5 2Gi RWO Delete Bound default/docker-cache-dockerd-statefulset-0 gce-pd 14m pvc-afd183ab-1621-44a1-aaf0-da0ccf9f96a8 2Gi RWO Delete Bound default/docker-cache-dockerd-statefulset-5 gce-pd 14m pvc-e3f65dea-4f97-4c4b-a902-97bf67ed698b 2Gi RWO Delete Bound default/docker-cache-dockerd-statefulset-2 gce-pd 14m Verify the Pods are Working Let’s exec into one of the pods to verify all is good: $ kubectl exec dockerd-statefulset-0 -- ps PID USER TIME COMMAND 1 root 0:00 {supervisord} /usr/bin/python3 /usr/bin/supervisord -n 14 root 0:00 /usr/bin/dockerd 15 root 0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups 45 root 0:02 containerd --config /var/run/docker/cont

Read full article »
Est. Annual Revenue
$100K-5.0M
Est. Employees
1-25
Cesar Talledo's photo - Co-Founder & CEO of Nestybox

Co-Founder & CEO

Cesar Talledo

CEO Approval Rating

90/100



Nestybox is headquartered in San Jose, California. Cesar Talledo is the Co-Founder & CEO of Nestybox. Nestybox has 1 followers on Owler.