Skip to content

EMK - In-cluster registry cache

Estimated time to read: 4 minutes

This page describes how to set up and configure an in-cluster registry cache for your Kubernetes cluster.

Introduction

When a pod is deployed, the Kubernetes scheduler assigns the pod to an available node. A containerd process on the assigned node pulls any container image defined in the pod manifest from a remote registry, unless it has recently pulled the image before. When the pod is rescheduled to a new node, due to events like auto-scaling (scale up), rolling update, or replacement of an unhealthy node, all container images must be pulled again from the remote container registry. Pulling an image from a remote registry takes time, network bandwidth and may fail when the remote registry enforces rate limits. To avoid this, you can use the registry-cache extension to run a container registry as pull-through cache, reducing pulls to remote container registries.

Solution

The registry-cache extension deploys and manages a registry in the Kubernetes cluster that runs as pull-through cache. The used registry implementation is distribution/distribution. How does it work?

When the extension is enabled, a registry cache for each configured upstream is deployed to the Kubernetes cluster. Along with this, the containerd daemon on the Kubernetes cluster nodes gets configured to use as a mirror the Service IP address of the deployed registry cache. For example, if a registry cache for upstream docker.io is requested via the Kubernetes spec, then containerd gets configured to first pull the image from the deployed cache in the Kubernetes cluster. If this image pull operation fails, containerd falls back to the upstream itself (docker.io in that case).

The first time an image is requested from the pull-through cache, it pulls the image from the configured upstream registry and stores it locally, before handing it back to the client. On subsequent requests, the pull-through cache is able to serve the image from its own storage.

Note

The used registry implementation (distribution/distribution) supports mirroring of only one upstream registry.

The following diagram shows a rough outline of how an image pull looks like for a Kubernetes cluster with registry cache:

K8s ha cluster

Source: Gardener documentation

Configure in-cluster registry cache

Set the following configuration in your Kubernetes specification:

kind: Shoot
spec:
  extensions:
  - type: registry-cache
    providerConfig:
      apiVersion: "registry.extensions.gardener.cloud/v1alpha3"
      kind: RegistryConfig
      caches:
        - upstream: docker.io
          volume:
            size: 10Gi

Configurations

The following options are available:

  • The providerConfig.caches[].volume field contains settings for the registry cache volume. The registry-cache extension deploys a StatefulSet with a volume claim template. A PersistentVolumeClaim is created with the configured size and StorageClass name.
  • The providerConfig.caches[].volume.storageClassName field is the name of the StorageClass used by the registry cache volume. This field is immutable. If the field is not specified, then the default StorageClass will be used.
  • The providerConfig.caches[].garbageCollection.ttl field is the time to live of a blob in the cache. If the field is set to 0s, the garbage collection is disabled. Defaults to 168h (7 days).
  • The providerConfig.caches[].secretReferenceName is the name of the reference for the Secret containing the upstream registry credentials. To cache images from a private registry, credentials to the upstream registry should be supplied.

Monitoring

After the registry-cache is deployed in your Kubernetes cluster, a new dashboard Registry Caches is made available in the Plutono dashboard, where you can review statistics about the registry-cache.

Screenshot of the Registry Caches dashboard

Increasing the cache disk size

When there is no available disk space, the registry cache continues to respond to requests. However, it cannot store the remotely fetched images locally because it has no free disk space. In such case, it is simply acting as a proxy without being able to cache the images in its local store. The disk has to be resized to ensure that the registry cache continues to cache images.

There are two alternatives to enlarge the cache’s disk size:

  • Resize the PVC

    Note

    The cache’s size in the Shoot spec (providerConfig.caches[].size) diverges from the PVC’s size.

    Find the PVC name to resize for the desired upstream.

    kubectl -n kube-system get pvc -l upstream-host=docker.io
    
    Patch the PVC's size to the desired size.
    kubectl -n kube-system patch pvc $PVC_NAME --type merge -p '{"spec":{"resources":{"requests": {"storage": "10Gi"}}}}'
    

  • Remove and add the cache

    Note

    Already cached images are lost and the cache starts with an empty disk.

Secret configuration

Credentials can be provided for a private upstream registry in order to pull private image with the registry cache. It is only possible to configure one set of credentials for a given pull through cache instance.

Create a Secret with the upstream registry credentials in your project namespace.

kubectl create -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
    name: ro-docker-secret-v1
    namespace: garden-dev
type: Opaque
immutable: true
data:
    username: $(echo -n $USERNAME | base64 -w0)
    password: $(echo -n $PASSWORD | base64 -w0)
EOF

Add the newly created Secret as a reference to the Shoot spec and to the registry-cache extension configuration.

In the registry-cache configuration, set the secretReferenceName field. It should point to a resource reference under spec.resources. The resource reference itself points to the Secret in project namespace.

kind: Shoot
spec:
    extensions:
    - type: registry-cache
    providerConfig:
        caches:
        - upstream: docker.io
            secretReferenceName: docker-secret
    resources:
    - name: docker-secret
    resourceRef:
        apiVersion: v1
        kind: Secret
        name: ro-docker-secret-v1

Danger

Do not delete the referenced Secret when there is a Shoot still using it.

More configuration information can be found in the extension documentation.