Kubernetes containers: A comprehensive runtime comparison
Discover why runtimes are critical to container management and how they facilitate executing and deploying containers.
June 10, 2020
This article examines container runtime terminology and tools. By the end, you'll better understand container runtime, how the container landscape has evolved and how we got to where we are today.
But first, what is a container in technology?
Before dissecting container runtimes, let's quickly recap containers.
Containers are not first-class objects in the Linux kernel. Containers are fundamentally composed of several underlying kernel primitives: namespaces (who you are allowed to talk to), cgroups (the amount of resources you are allowed to use), and LSMs (Linux Security Modules—what you are allowed to do). Together, these kernel primitives allow us to set up secure, isolated, and metered execution environments for our processes. This is great, but doing all of this manually each time we want to create a new isolated process would be tiresome.
Simplifying process isolation: containers and container runtimes
Instead of unshare-ing, cgcreat-ing, and semodul-ing custom namespaces, cgroups, and selinux policies every time we want to create a new isolated process, these components have been bundled together in a concept called a "container." Tools we call "container runtimes" make it easy to compose these pieces into an isolated, secure execution environment that we can deploy in a repeatable manner.
For more information about containers themselves, check out our other post What is a container? Definition, benefits and use cases.
What is a container runtime?
Let’s take a step back from containers for a moment. You may have heard the term “runtime” referring to the lifecycle phase of a program or the usage of a specific language to execute a program. A container runtime functions similarly to the latter—it’s software that runs and manages the components required to run containers. As mentioned above, these tools make it easier to securely execute and efficiently deploy containers and are a key component of container management. As containers themselves have evolved and changed, so have their runtimes.
How container runtimes have evolved
After cgroups were added to the Linux kernel in 2007, several projects emerged that took advantage of them by creating containerization processes:
Linux Containers (LXC)
LXC, Linux Containers, was introduced shortly after cgroups and was designed for "full-system" containers. Systemd also gained similar container support— systemd-nspawn could run namespaced processes and systemd itself could control cgroups. Neither LXC nor systemd-nspawn really caught on with end-users, but they did see some use in other systems. For example, Canonical's JuJu and Docker (briefly) were notable tools built on top of LXC.
Docker (at the time, "dotCloud"), began building tooling around LXC to make containers more developer and user friendly. Before long, Docker dropped LXC, created the "Open Container Initiative" to establish container standards (more on this later), and open sourced some of their container components as the libcontainer project.
Google also open sourced a version of their internal container stack, LMCTFY, but abandoned it as Docker gained popularity. Most of the functionality was gradually replicated in Docker's libcontainer by the LMCTFY developers.
CoreOS, after initially exclusively using Docker in their Container Linux product, created an alternative to Docker called rkt. rkt had features ahead of its time that differentiated it from Docker and the other early runtimes. Notably, it did not need to run everything as root, was daemonless and CLI driven, and had amenities like cryptographic verification and full Docker image compatibility.
CoreOS also published a container standard called appc before Docker created the OCI. However, as Docker gained popularity, CoreOS pivoted to cofound and support the OCI. This helped broaden its scope and eventually the OCI also encompassed parts of appc. rkt and appc were eventually abandoned.
Over time, experiences with these early and diverse approaches to containers helped bring a level of standardization to the OCI specs. Various implementations of the spec were released, which make up what I'll call the "modern" container runtime landscape.
Container runtime comparison
In this section, we will review different types of container runtimes. Generally, they fall into two main categories: Open Container Initiative (OCI) runtimes and Container Runtime Interface (CRI).
Open Container Initiative (OCI) Runtimes
The Open Container Initiative (OCI) runtimes are a set of industry standards that provide a common runtime environment for containers, ensuring portability and interoperability across different container platforms and orchestrators. These runtimes enable developers to build, share and run containers with ease while fostering collaboration and innovation in the container ecosystem. OCI includes:
Sandboxed and virtualized runtimes
Container Runtime Interface
Open Container Initiative (OCI) runtimes
Sometimes referred to as "low-level" runtimes, implementations of the OCI runtime spec are focused on managing the container lifecycle—abstracting the Linux primitives—and are not required to do much else. Low level runtimes create and run “the container.”
Native low-level runtimes include:
runC is the result of all of Docker's work on libcontainer and the OCI. It is the de-facto standard low-level container runtime. It is written in Go and maintained under Docker's open source moby project.
Railcar was an OCI runtime implementation created by Oracle. It was written in Rust, in contrast to runC’s Go codebase, which can be an excellent language for a component like a container runtime that performs low-level interactions with the kernel. Unfortunately, Railcar has been abandoned.
crun is a Redhat led OCI implementation that is part of the broader containers project and a sibling to libpod (more on that later). It is developed in C, is performant and lightweight, and was one of the first runtimes to support cgroups v2.
rkt is not an OCI runtime implementation, but it is a similar low-level container runtime. It supports running Docker and OCI images in addition to appc bundles, but is not interoperable with higher level components that use OCI runtimes.
It should be noted that, as we can see in this Open Source Summit presentation, the performance of the low-level runtime is only significant during container creation or deletion. Once the process is running, the container runtime is out of the picture.
Sandboxed and virtualized runtimes
In addition to native runtimes, which run the containerized process on the same host kernel, there are some sandboxed and virtualized implementers of the OCI spec:
gVisor and Nabla
gVisor and Nabla are sandboxed runtimes, which provide further isolation of the host from the containerized process. Instead of sharing the host kernel, the containerized process runs on a unikernel or kernel proxy layer, which then interacts with the host kernel on the container's behalf. Because of this increased isolation, these runtimes have a reduced attack surface and make it less likely that a containerized process can have a maleffect on the host.
runV, Clear and Kata
runV, Clear and Kata are virtualized runtimes. They are implementations of the OCI runtime spec that are backed by a virtual machine interface rather than the host kernel. runV and Clear have been deprecated and their feature sets absorbed by Kata. They all can run standard OCI container images, although they do it with stronger host isolation. They start a lightweight virtual machine with a standard Linux kernel image and run the "containerized" process in that virtual machine.
In contrast to native runtimes, sandboxed and virtualized runtimes have performance impacts through the entire life of a containerized process. In sandboxed containers, there is an extra layer of abstraction: the process runs on the sandbox unikernel/proxy, which relays instructions to the host kernel. In virtualized containers, there is a layer of virtualization: the process runs entirely in a virtual machine, which is inherently slower than running natively. Using VM technology like the performance focused AWS Firecracker as the backing virtual machine type for VM containers can help minimize this impact.
The Container Runtime Interface (CRI)
When the Kubernetes container orchestrator was introduced, the Docker runtime was hardcoded into its machine daemon, the kubelet. However, as Kubernetes rapidly became popular the community began to need alternative runtime support.
rkt support was added by customizing the kubelet code for rkt: the "rktlet." However, this per-runtime custom build process would not scale and exposed the need for an abstract runtime model in Kubernetes. To solve this, Hyper, CoreOS, Google and other Kubernetes sponsors collaborated on a high-level spec describing a container runtime from a container-orchestration perspective: the Container Runtime Interface. Integrating with the CRI instead of a specific runtime allows the kubelet to support multiple container runtimes without requiring custom kubelets to be compiled for each runtime.
The CRI has additional concerns over an OCI runtime including image management and distribution, storage, snapshotting, networking (distinct from the CNI) and more. A CRI has the functionality required to leverage containers in dynamic cloud environments, unlike OCI runtimes which are tightly focused on creating containers on a machine. Further, CRIs usually delegate to an OCI runtime for the actual container execution. By introducing the CRI, the Kubernetes authors effectively decoupled the kubelet from the underlying container runtime in an extensible way.
The first CRI implementation was the dockershim, which provided the agreed-upon layer of abstraction in front of the Docker engine. As containerd and runC were split out from the core of Docker, though, it has become less relevant. containerd currently provides a full CRI implementation.
There is also a VM CRI, frakti (v1), which was the first non-Docker CRI implementation. It was created for, and is designed to be used with, runV and offers the same functionality as the native OCI-backed CRIs but with VMs. As Kata has absorbed the feature set of Clear Containers and runV, frakti is less relevant - containerd+kata is the modern frakti+runV.
There are two main players in the CRI space at present:
containerd is Docker's high-level runtime, managed and developed out in the open under the Moby project. By default it uses runC under the hood. Like the rest of the container tools that originated from Docker, it is the current de-facto standard CRI. It provides all the core functionality of a CRI and more and is the CRI that we use in Critical Stack, our container orchestration platform built on Kubernetes. containerd has a plugin design - cri-containerd implements the CRI, and various shims exist to integrate containerd with low-level runtimes such as Kata.
cri-o is a slim CRI implementation led by Redhat, designed specifically for Kubernetes. It is intended to serve as a lightweight bridge between the CRI and a backing OCI runtime. It has fewer peripheral features compared to containerd and delegates to components from libpod and the “Container Tools” project for image management and storage. By default, cri-o uses runC as its OCI, but on recent RedHat Fedora installations (with cgroups v2) it will use crun. Since it has full OCI compatibility, cri-o works out of the box with low level runtimes such as Kata without any additional pieces and minimal configuration.
Both of these CRIs support interop with all of OCI runtimes above via either native interop or plugins/shims, including the sandboxed and virtualized implementations.
You may notice in reading the above that Docker is not a CRI or OCI implementation but uses both (via containerd and runC). In fact, it has additional features like image building and signing that are out of scope of either CRI or OCI specs. So where does this fit in?
Docker calls their product the “Docker Engine”, and generically these full container tools suites may be referred to as Container Engines. No one except Docker provides such a full featured single executable, but we can piece a comparable suite of tools together from the Containers Tools project.
The Container Tools project follows the UNIX philosophy of small tools which do one thing well:
In fact, these are alternatives to the standalone Docker stack, and including the cri-o project as a CRI replacement provides the last missing piece.
Wrapping up Kubernetes container runtimes
And with that, we've surveyed the history and current state of container runtimes! Containers are powerful tools for packaging, isolation, and security, and when used properly they make it easy to deliver software reliably and consistently. The container runtime is a small but critically important piece of this ecosystem, and it's important to understand the history and intent behind the various runtimes as you evaluate them for your use cases. Hopefully the information here helps provide context as you decide on components for your local development, CI/CD, and Kubernetes needs.
There’s quite a bit to unpack when it comes to container runtimes, and even more so when it comes to container management in general. If you don’t have the bandwidth to make all of these implementation decisions yourself, consider a container orchestration solution like Critical Stack. Built by Capital One to help eliminate configuration challenges that bottleneck deployment and maintenance of containerized applications, Critical Stack helps enterprises transition to containers safely and effectively.