Policy Enabled Kubernetes with Open Policy Agent

Addressing Common Concerns with Cloud Computing and DevOps

Jimmy Ray

January 11, 2019

A move to the public cloud is done, in large part, to address the common concerns of “infrastructure provisioning” that are shared by all application teams. The cloud helps organizations greatly reduce the need for the “undifferentiated heavy lifting” — standing-up servers, networking, and security — needed just to deliver applications and features. Moving to containers and Kubernetes can be seen as the next evolution in allowing development teams to focus on their work, not on infrastructure.

Tackling these common needs is also a large part of a continued DevOps journey. In fact, DevOps is all about reducing variability and human-error, increasing repeatability, and implementing practices underpinned by policies to deliver applications and features reliability and efficiently. This is all important in Kubernetes. And, we know that addressing common concerns for feature teams, through cloud-computing and DevOps, is what enables application teams to deliver faster, enabling businesses to move faster.

Common Applications Concerns

As we embrace modern approaches to provision infrastructure, we also use patterns to address common concerns for building applications and delivering services and APIs. Much the same way Aspect Oriented Programming (AOP)did a decade or so ago by satisfying “cross-cutting concerns”, we are addressing modern application design and construction by adopting patterns, such as The 12 Factor methodology. Policy enablement is also a common concern we can leverage to better manage applications and their associated environments.

What is Policy?

As seen on the Open Policy Agent document site:

“All organizations have policies. Policies are essential to the long-term success of organizations because they encode important knowledge about how to comply with legal requirements, work within technical constraints, avoid repeating mistakes, and so on.

In their simplest form, policies can be applied manually based on rules that are written down or conventions that are unspoken but permeate an organization’s culture. Policies may also be enforced with application logic or statically configured at deploy time.”

Simply put, policies are the boundaries in which we deliver applications and infrastructure. These boundaries drive acceptance criteria for our deliverables, and our definition of done. We are measured, in part, by how well we meet the requirements of these policies, and how effectively we enable our customers to stay within policy when they use our solutions.

Automated Policies to Satisfy Common Concerns

Part of a successful DevOps formula is making sure that we follow internal policies and procedures when pushing changes to computing environments. Not all policy enforcement is done via automated DevOps pipelines. For example, tools like Cloud Custodian — the open source rules engine — are used to automate the implementation of policies to maintain a well-managed and secure cloud. These policies are meant to place guardrails around cloud usage, without adversely affecting cloud users.

The Case for General Purpose Policy Enablement

The type of automated policy enforcement, implemented via Cloud Custodian or similar tools, should be considered for other application settings. Executing within prescribed policies is a common concern of cloud-native applications. Open Policy Agent (OPA) is a general purpose approach to policy enablement.

According to the docs, Open Policy Agent (OPA) is:

“…a lightweight general-purpose policy engine that can be co-located with your service. You can integrate OPA as a sidecar, host-level daemon, or library.

Services offload policy decisions to OPA by executing queries. OPA evaluates policies and data to produce query results (which are sent back to the client). Policies are written in a high-level declarative language and can be loaded into OPA via the filesystem or well-defined APIs.”

grey and black before and after T-chart with white text next to white icons in a list under each category. the before and after headings are white ribbons separated by an orange "OPA" circle - and "OPA integrations" under the "after category" is written in orange text

OPA handles the heavy-lifting of policy decisions, and removes the need for custom programming in each application/service. Image Source: openpolicyagent.org

Implementing Admission Control Policies in Kubernetes

Automated policy enforcement is also needed as we move to containers and container orchestration platforms. As the Tech Lead on our Enterprise Kubernetes Platform Team, I have been researching and developing patterns for managing policies in our clusters.

One of the policy control points within Kubernetes is admission control. With Kubernetes admission controllers, we can intercept requests to the Kubernetes API server before the relative objects, providing intent for desired cluster state, can be persisted to the etcd key/value object store.

The pattern that I have investigated can be found here, with its companion GitHub repos here and here.

Implementing a Kubernetes Deployment Admission Controller

(The OPA use case I will focus on is controlling from where container images are sourced, as part of Kubernetes deployment manifests.)

As part of a sound governance and compliance stance, it is important to understand, direct, and even control the image sources for your Kubernetes workloads. With OPA and Kubernetes Validating Admission Controllers, event-driven and dynamically-configured automated-policy-based decisions can prevent unwanted images from being deployed into your clusters. In this solution, an opa service is connected to a Kubernetes ValidatingAdmissionWebhook, and listens for deployment CREATE and UPDATEevents sourced by the Kubernetes API server.

The solution involves creating a Kubernetes object graph consisting of the following objects:

blue kubernetes heptagon with white ship wheel in the middle, connected to diagram with pinkish red rectangles with black text connected by red arrows. there is a square made of a black dashed line going around the right side of the diagram

OPA Solution Kubernetes Object Graph

In general operation, the OPA server is a RESTful server that exposes services to produce and consume both event data and policies. Since OPA is domain agnostic, any data can be sent to the OPA server, to be evaluated by any policy, as long as the policy matches the event data passed in.

In my example solution, OPA policies are stored in the opa namespace as Kubernetes ConfigMap resources. The policies are stored in the opa container by a sidecar workload known as kube-mgmt. kube-mgmt reads ConfigMaps as they are applied to the opa namespace, and compiles them to verify proper syntax.

Upon successful compilation, the policy is stored in the opa container by kube-mgmt. Additionally, kube-mgmt is configured to periodically pull resource metadata that might be needed by the opa service to correctly evaluate API server events, in the case that the event does not contain all the needed data for the logic defined in the evaluation policy. The resources that kube-mgmt scans are configured with container spec arguments in a Kubernetes deployment manifest.

screenshot of workload list in black text

opa and kube-mgmt workloads in opa pod

Preparing the OPA Artifacts — Step by Step

First, we apply the namespace and auth resources for the solution, as seen below. The opa ServiceAccount uses the opa ClusterRoleBinding to bind to the opa ClusterRole for access to the permissions contained therein.

    apiVersion: v1
kind: Namespace
metadata:
  name: opa
  labels:
    app: opa
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: opa
  namespace: opa
  labels:
    app: opa
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: opa
  labels:
    app: opa
rules:
  - apiGroups: [""]
    resources:
      - namespaces
    verbs:
      - get
      - list
      - watch
  - apiGroups: ["extensions"]
    resources:
      - ingresses
    verbs:
      - get
      - list
      - watch
  - apiGroups: ["apps"]
    resources:
      - deployments
    verbs:
      - get
      - list
      - watch
  - apiGroups: [""]
    resources:
      - configmaps
    verbs:
      - get
      - list
      - patch
      - watch
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: opa
  labels:
    app: opa
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: opa
subjects:
- kind: ServiceAccount
  name: opa
  namespace: opa

Next, we build the OPA secrets, and server config files, and apply the opa-server secret to the opa namespace:

    openssl genrsa -out ca.key 2048
openssl req -x509 -new -nodes -key ca.key -days 100000 -out ca.crt -subj "/CN=admission_ca"
cat >server.conf <

Next, we deploy the OPA solution containers, services, and default policy ConfigMap with the following command: kubectl apply -f admission-controller.yaml

The YAML can be seen below:

    kind: Service
apiVersion: v1
metadata:
  name: opa
  namespace: opa
  labels:
    app: opa
spec:
  selector:
    app: opa
  ports:
  - name: https
    protocol: TCP
    port: 443
    targetPort: 443
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: opa
  name: opa
  namespace: opa
spec:
  selector:
    matchLabels:
      app: opa
  replicas: 1
  template:
    metadata:
      labels:
        app: opa
      name: opa
    spec:
      serviceAccountName: opa
      containers:
        - name: opa
          image: //openpolicyagent-opa:0.9.1
          resources:
            limits:
              cpu: 500m
              memory: 512Mi
            requests:
              cpu: 500m
              memory: 512Mi
          args:
            - "run"
            - "--server"
            - "--tls-cert-file=/certs/tls.crt"
            - "--tls-private-key-file=/certs/tls.key"
            - "--addr=0.0.0.0:443"
            - "--insecure-addr=127.0.0.1:8181"
          volumeMounts:
            - readOnly: true
              mountPath: /certs
              name: opa-server
        - name: kube-mgmt
          image: //openpolicyagent-kube-mgmt:0.7
          resources:
            limits:
              cpu: 500m
              memory: 512Mi
            requests:
              cpu: 500m
              memory: 512Mi
          args:
            - "--replicate-cluster=v1/namespaces"
            - "--replicate=extensions/v1beta1/ingresses"
            - "--replicate=apps/v1/deployments"
      volumes:
        - name: opa-server
          secret:
            secretName: opa-server
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: opa-default-system-main
  namespace: opa
  labels:
    app: opa
data:
  main: |
    package system
import data.kubernetes.admission
main = {
      "apiVersion": "admission.k8s.io/v1beta1",
      "kind": "AdmissionReview",
      "response": response,
    }
default response = {"allowed": true}
response = {
        "allowed": false,
        "status": {
            "reason": reason,
        },
    } {
        reason = concat(", ", admission.deny)
        reason != ""
    }

The last part of this YAML applies a ConfigMap that contains the main OPA policy and default response. This policy is used as an entry-point for policy evaluations and returns allowed:true if policies are not matched to inbound data.

The Admission Controller Webhook

The OPA Admission Controller is a Validating Admission Controller, and works as a server webhook. When a request is made to the Kubernetes API server to create an object that is under policy admission control — such as a Deployment resource — the webhook fires and the opa and kube-mgmtcontainers work together to evaluate the API server event and resource data with policies to perform admission review.

The YAML, seen below, sets up the webhook to listen for Kubernetes API server CREATE and UPDATE events for the included list of resources, regardless of API group or API version. The namespaceSelector at the bottom of the YAML file allows us to exclude certain sensitive namespaces from this validation solution.

We base64 encode the ca.crt file from previous OpenSSL operations, and add it to the webhook-configuration.yaml This will allow the webhook to securely communicate with the opa service.

Note: Unlike most centrally managed Kubernetes secrets, the OPA secrets are just used between the webhook and the opa service. As such, these secrets are idempotent and decoupled from the cluster secrets and CA, and can be regenerated as needed to reconfigure the OPA solution.

    kind: ValidatingWebhookConfiguration
apiVersion: admissionregistration.k8s.io/v1beta1
metadata:
  name: opa-validating-webhook
  namespace: opa
  labels:
    app: opa
webhooks:
  - name: validating-webhook.openpolicyagent.org
    rules:
      - operations: ["CREATE", "UPDATE"]
        apiGroups: ["*"]
        apiVersions: ["*"]
        resources:
          - pods
          - services
          - replicasets
          - deployments
          - daemonsets
          - cronjobs
          - jobs
          - ingresses
          - roles
          - statefulsets
          - podtemplates
          - configmaps
          - secrets
    clientConfig:
      caBundle: ${base64 ca.crt}
      service:
        namespace: opa
        name: opa
    namespaceSelector:
      matchExpressions:
      - {key: opa-webhook, operator: NotIn, values: [ignore]}

In the clientConfig section, the opa namespace and service are referenced. CREATE and UPDATE operations will cause this webhook to fire.

Rego: The OPA Policy Language

Rego is OPA’s native query language. It is similar to Datalog, but also supports structured documents, like YAML and JSON. Policies that OPA uses to review resources are written in Rego, and usually saved as *.rego files.

Below is the deployment_create_whitelist.rego that creates a whitelist of acceptable registries that would be part of the image property in a Kubernetes Deployment spec. The deny[msg] block is the entry point into this policy and it pulls data from the API server event.

    package kubernetes.admission  
  
import data.kubernetes.namespaces  
  
deny[msg] {  
    input.request.kind.kind = "Deployment"  
    input.request.operation = "CREATE"  
    registry = input.request.object.spec.template.spec.containers[_].image  
    name = input.request.object.metadata.name  
    namespace = input.request.object.metadata.namespace  
    not reg_matches_any(registry,valid_deployment_registries)  
    msg = sprintf("invalid deployment, namespace=%q, name=%q, registry=%q", [namespace,name,registry])  
}  
  
valid_deployment_registries = {registry |  
    whitelist = ""  
    registries = split(whitelist, ",")  
    registry = registries[_]  
}  
  
reg_matches_any(str, patterns) {  
    reg_matches(str, patterns[_])  
}  
  
reg_matches(str, pattern) {  
    contains(str, pattern)  
}

After data are collected from the event, those data are compared to the whitelist via a call to the reg_matches_any(…) block. The call stack uses the reg_matches(…) block to check if the registry variable value (from the container image property) contains a value from the registry whitelist. If a whitelisted value is not found, the policy evaluation responds with a denyand returns the reason, constructed in the msg variable.

Note: Even though the webhook fires for CREATE and UPDATE API server events, the policy above is only used to evaluate the JSON payloads in the Deployment CREATE API server events. As a reminder, if an API server event is sent to OPA for evaluation, and no matching policy can be found, OPA will respond with the status ofallowed:true. This will tell the API server to continue with the write operation to etcd.

Policies as ConfigMaps

The Rego files are stored in the Kubernetes cluster as ConfigMap resources. When a ConfigMap resource is created in the opa namespace, the kube-mgmtsidecar container reads the ConfigMap, and compiles the policy. Upon successful compilation, the kube-mgmt sidecar annotates the ConfigMap with an ok status, as seen below.

screenshot of details list with white text. "Annotations" line is highlighted in grey

Next, the kube-mgmt sidecar loads the policy contents from the ConfigMap into theopa container as a policy. Once the policy is installed in the opacontainer, the policy can be evaluated against Deployment resources, looking for the whitelisted registry in the image spec. If the registry in the image spec is not in the whitelist of the policy, then the deployment will fail, as seen below.

    Error from server (invalid deployment, namespace="app-ns", name="app-name", registry="app-registry"): error when creating "app-deployment.yaml": admission webhook "validating-webhook.openpolicyagent.org" denied the request...

If the kube-mgmt sidecar cannot successfully compile the Rego policy file, then it will stamp the ConfigMap with a failure status annotation and not load the policy into the opa container.

    openpolicyagent.org/policy-status:  {"status":"error","error":{"code":"invalid_parameter","message":"error(s) occurred while compiling module(s)","errors":[{"code":"rego_parse_error","message":"no match found","location":{"file":"opa/deployment-create-whitelist/main","row":5,"col":10},"details":{}}]}}

Additional Use Cases

Open Policy Agent can be used to evaluate the JSON payload of many API server events, and multiple policies can be used to evaluate the same API event. One of the core features of Kubernetes is how it selects resources, driven by labels. Furthermore, governance and compliance within a cluster can be driven by properly labeling resources. It makes perfect sense to use Open Policy Agent to evaluate API server event payloads to ensure that new and reconfigured objects are properly labeled. This ensures that no workloads are introduced into the cluster without the correct labeling scheme.

Under the covers, Open Policy Agent is a RESTful server, that takes in data and policies to evaluate said data. Given its domain-agnostic nature, Open Policy Agent could be deployed into a Kubernetes cluster to provide services to other workloads that need data validation, beyond the use case of validating Kubernetes resources.

Tips

While working with Open Policy Agent, and Kubernetes Validating Admission Controllers, we discovered a few potential issues of which readers should be aware:

Because of how Kubernetes resource deletions fire the UPDATE event, policies need to be carefully crafted to account for unwanted behavior in non-deletion UPDATE events.
When configuring corporate proxy connections on nodes, .svc may need to be added to the NO_PROXY environment export to prevent erroneously routing opa.opa.svc calls to outside of the cluster.

Conclusion

Along with moving to Cloud Computing, moving to Kubernetes requires thoughtful design to ensure that governance, compliance, and security controls are included. Using policies to apply rules-based control of resources is a dynamic approach to managing Kubernetes configuration. Policy Enablement is a common concern among teams looking to automate the management of Kubernetes.

The domain-agnostic nature of Open Policy Agent makes it well-suited for policy management and evaluation. In conjunction with Kubernetes Validating Admission Controllers, Open Policy Agent can reduce opportunity for unwanted resource configurations into Kubernetes clusters. And, Open Policy Agent is a RESTful server that could take on other data validation roles in the enterprise.

Critical Stack enables enterprises to run efficient and cost-effective containerized infrastructure while reducing risk. Click here to learn more.

Jimmy Ray, Lead Software Engineer

Jimmy is a Cloud Engineer with Capital One, and technical lead on the Enterprise Kubernetes Platform Team. The majority of his 20+ years in IT has been spent developing software and architecting enterprise solutions. Jimmy is a leader in the Richmond, VA tech community, and has spoken at user groups and conferences in the U.S. and Europe, including Jenkins World 2016. He is passionate about delivering containerized cloud solutions.