How To Recover Persistent Volume Snapshots

Estimated time to read: 5 minutes

In this tutorial you will learn how to recover a PostgreSQL database from a previously created persistent volume snapshot.

First wel will deploy a PostgreSQL database in Kubernetes, storing the database on a new persistent volume. After populating the database we will snapshot the persistent volume containing the database, acting as a backup, to demonstrate a recovery of lost data.

Prerequisites:

In this tutorial we use the following tools:

kubectl (https://kubernetes.io/docs/tasks/tools/)

It's required to have it installed before beginning.

The tutorial will be split into five parts:

Deploy PostgreSQL Database
Create VolumeSnapshotClass
Create Volume Snapshot
Recover Volume from Snapshot
Cleanup

Deploy PostgreSQL Database

1. Create file postgres-configmap.yaml containing the database configuration, update accordingly:

apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-config
  labels:
    app: postgres
data:
  POSTGRES_DB: mydb
  POSTGRES_USER: myuser
  POSTGRES_PASSWORD: mypassword

2. Create file postgres-deployment.yaml containing a PosgreSQL Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      initContainers:
        - name: init-volume
          image: 'postgres:17'
          command: ['sh', '-c', "mkdir -p /data/postgres"]
          volumeMounts:
            - mountPath: /data
              name: data
      containers:
        - name: postgres
          image: 'postgres:17'
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 5432
          envFrom:
            - configMapRef:
                name: postgres-config
          volumeMounts:
            - mountPath: /var/lib/postgresql/data
              name: data
              subPath: postgres
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: postgres-restored-pvc

3. Create file postgres-pvc.yaml containing a PersistentVolumeClaim for storing the data:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: default

4. Deploy all the resources:

kubectl apply -f \
   postgres-configmap.yaml \
   postgres-pvc.yaml \
   postgres-deployment.yaml

5. Confirm the postgres deployment is ready:

kubectl get deployment -o wide
NAME       READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES        SELECTOR
postgres   1/1     1            1           86m   postgres     postgres:17   app=postgres

Although populating the database is out of scope, there are multiple methods for accessing the database from your workstation allowing you to populate the database with some dummy data.

When accessing the datatabase, use the credentials set in the database configuration created earlier.

The first method is to forward the database connection to your workstation making the database accessible on your workstation at address 127.0.0.1:5432:

kubectl port-forward deployment/postgres 5432:127.0.0.1:5432

Another method is to open a psql prompt in the running postgres deployment:

kubectl exec -ti deployment/postgres -c postgres -- sh -c 'PASSWORD=mypassword psql -U myuser mydb'

Example queries to populate the database with some dummy data:

-- Create a table
CREATE TABLE IF NOT EXISTS films (
    id SERIAL PRIMARY KEY,
    title VARCHAR(100) NOT NULL
);

-- Insert rows
INSERT INTO films (title)
VALUES ('Inside Out'), ('Toy Story'), ('Monsters Inc.'), ('Finding Nemo');

-- List current rows
SELECT id, title
FROM films;

Output of the last query should list all inserted film titles:

 id |     title
----+---------------
  1 | Inside Out
  2 | Toy Story
  3 | Monsters Inc.
  4 | Finding Nemo
(4 rows)

In the next step we willl create a VolumeSnapshotClass.

Create VolumeSnapshotClass

To create volume snapshots a VolumeSnapshotClass is required, this is similar to a StorageClass but specific for volume snaphots. If your cluster already has a VolumeSnapshotClass you may skip this part and use the current one instead.

1. Create file volumesnapshotclass.yaml containing the VolumeSnapshotClass:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
driver: cinder.csi.openstack.org
deletionPolicy: Delete
metadata: 
  name: default

2. Deploy the VolumeSnapshotClass:

% kubectl apply -f volumesnapshotclass.yaml

In the next step we willl create a snapshot of the database volume.

Create Volume Snapshot

For databases and filesystem to be fast, writes are cached in ephemeral memory and periodically flushed to persistent storage for safekeeping. This abiquitous optimization makes it unsafe to make snapshots of volumes in-use, therefore we need to temporary stop the database and detach the volume. This will flush all writes to persistent storage to prevent loss of data.

Info

If your use-case is not affected by unflushed writes you can forcefully create snapshots from in-use volumes by updating the target VolumeSnapshotClass with an additional parameter, e.g.:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
driver: cinder.csi.openstack.org
deletionPolicy: Delete
metadata: 
  name: default
parameters:
  force-create: "true"

1. Scale down the deployment to stop the database:

% kubectl scale --replicas 0 deployment postgres
deployment.apps/postgres scaled

2. Create file postgres-snapshot.yaml containing the reference for the new volume snapshot:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: postgres-pvc
spec:
  volumeSnapshotClassName: default
  source:
    persistentVolumeClaimName: postgres-pvc

3. Create the snapshot:

kubectl apply -f postgres-snapshot.yaml

4.. Confirm the snapshot is created:

% kubectl get VolumeSnapshot
NAME          READYTOUSE   SOURCEPVC     RESTORESIZE  SNAPSHOTCLASS  SNAPSHOTCONTENT                                 
postgres-pvc  true         postgres-pvc  1Gi          default        snapcontent-be0019e0-2209-43e5-9a54-485f973911d5

5. Restart the database, scale up the deployment, and confirm:

% kubectl scale --replicas 1 deployment postgres
deployment.apps/postgres scaled

% kubectl get deployment postgres
NAME       READY   UP-TO-DATE   AVAILABLE   AGE
postgres   1/1     1            1           131m

In the next step we will recover the database using the volume snapshot.

Recover Volume from Snapshot

First, lets fake some data loss by deleting the first row in the database:

DELETE FROM films
WHERE id=1;

The first row was deleted:

 id |     title
----+---------------
  2 | Toy Story
  3 | Monsters Inc.
  4 | Finding Nemo
(3 rows)

The recovery process in a nutshell:

Create a new persistent volume based on the volume snapshot created before data loss occured
Update the postgres deployment with the new persistent volume

1. Create file postgres-restored-pvc.yaml containing a reference to the volume snapshot created earlier:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-restored-pvc
spec:
  storageClassName: default
  dataSource:
    name: postgres-pvc
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

2. Apply the manifest to create the new PVC:

kubectl apply -f postgres-restored-pvc.yaml

3. Confirm the PVC is created, depending on the used StorageClass its status will be Pending or Available:

kubectl get pvc
NAME                   STATUS   VOLUME                                         CAPACITY   ACCESS MODES   STORAGECLASS       
postgres-restored-pvc  Pending  pv-shoot-cc978d5b-351e-4245-b801-86d2e4e13bcc  1Gi        RWO            default

4. Change the postgres deployment in file postgres-deployment.yaml to use the new PVC:

...
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: postgres-restored-pvc

5. Apply the changes:

kubectl apply -f postgres-deployment.yaml

Info

Kubernetes will notice the change and detach the current PVC, stop and delete the current Pod, attach the new PVC and deploy a new Pod.

6. Confirm the database was recovered by listing the rows in the database table:

-- List current rows
SELECT id, title
FROM films;

Output:

 id |     title
----+---------------
  1 | Inside Out
  2 | Toy Story
  3 | Monsters Inc.
  4 | Finding Nemo
(4 rows)

Cleanup

To wrap it up, delete all previously created resources:

kubectl delete -f \
   postgres-deployment.yaml \
   postgres-configmap.yaml \
   postgres-restored-pvc.yaml \
   postgres-pvc-volumesnapshot.yaml \
   postgres-pvc.yaml \
   volumesnapshotclass.yaml