Skip to content

Restore Mechanics

Restore is the harder problem:

Replaying Kubernetes resources requires careful ordering, dependency resolution, and handling of state that can't be simply replayed.

Restore Pipeline

1. Download and decompress

RestoreController downloads the backup tarball from the BSL. All subsequent processing is in-memory / temp filesystem on the velero-server pod.

2. Restore priority ordering

Resources are restored in a hardcoded priority order:

  • CRDs and namespaces first,
  • then cluster-scoped resources,
  • then namespaced resources.

Within each tier, dependencies are respected.

// pkg/restore/restore.go — resourcePriorities (abridged)
var resourcePriorities = []string{
    "customresourcedefinitions",
    "namespaces",
    "storageclasses",
    "volumesnapshotclasses.snapshot.storage.k8s.io",
    "volumesnapshotcontents.snapshot.storage.k8s.io",
    "volumesnapshots.snapshot.storage.k8s.io",
    "persistentvolumes",
    "persistentvolumeclaims",
    "secrets",
    "configmaps",
    "serviceaccounts",
    "limitranges",
    "pods",
    "replicasets.apps",
    "clusterrolebindings.rbac.authorization.k8s.io",
    "clusterroles.rbac.authorization.k8s.io",
    "roles.rbac.authorization.k8s.io",
    "rolebindings.rbac.authorization.k8s.io",
}

Resources not in this list are restored after all prioritized resources, in undefined order.

3. RestoreItemAction plugins

Same plugin pattern as backup: for each item, registered RestoreItemAction plugins run.

Built-in examples:

  • job-action: resets Job.spec.completions to allow re-execution
  • service-action: strips spec.clusterIP and spec.clusterIPs to allow re-allocation
  • serviceaccount-action: strips generated token secrets (regenerated by the cluster)
  • csi-volumesnapshot-restore-action: handles CSI VolumeSnapshot object restoration

4. API field cleanup

Before re-applying, Velero strips fields that should not be replayed:

  • metadata.resourceVersion
  • metadata.uid
  • metadata.creationTimestamp
  • status (entire block)
  • Controller-injected annotations (e.g. kubectl.kubernetes.io/last-applied-configuration)

This prevents conflicts with the receiving cluster's state.

5. PV/PVC restore path decision

For each PVC in the backup, Velero checks for associated volume data:

  1. Creates a VolumeSnapshotContent (with deletionPolicy: Retain) pointing at the original snapshot handle.
  2. Creates a VolumeSnapshot referencing it.
  3. Creates the PVC with dataSource pointing at the VolumeSnapshot.
  4. CSI driver's CreateVolumeFromSnapshot RPC is triggered by the PVC binding.
  1. Restores the PVC and PV objects normally.
  2. Waits for a pod to be scheduled that mounts the PVC.
  3. Creates a DataDownload CRD targeting the node where the pod is scheduled.
  4. node-agent's DataDownloadController runs Kopia to write data into the mounted PVC.
  5. Pod's init container (injected by Velero) blocks until DataDownload completes.

Calls the VolumeSnapshotter plugin to restore the volume from the cloud snapshot ID stored in the backup metadata.

6. Resource Modifiers

Before creating resources in the cluster, Velero applies resource modifiers — declarative JSON, merge, or strategic merge patches loaded from a ConfigMap referenced in spec.resourceModifier. These enable restore-time transformations without writing a plugin:

# Example: change storage class during restore
version: v1
resourceModifierRules:
- conditions:
    groupResource: persistentvolumeclaims
    namespaces: ["*"]
  mergePatches:
  - patchData: |
      {"spec": {"storageClassName": "new-storage-class"}}

Conditions can match by namespace, group/resource, name regex, labels, and field values. Patches are applied in sequence, after RestoreItemAction plugins run.

7. Restore hooks (init containers)

Post-restore hooks are injected as init containers into pod specs before the pod is created. They run after volume data is written, before the app container starts. See Hooks.

Conflict handling

Scenario existingResourcePolicy: none existingResourcePolicy: update
Resource doesn't exist Create it Create it
Resource exists, same spec Skip (warning logged) No-op patch
Resource exists, different spec Skip (warning logged) Strategic merge patch
Resource exists, immutable field changed Skip Error (log + continue)

Known pitfalls

Service clusterIP conflict

Services with spec.clusterIP set will fail to create if the same IP is already allocated. The built-in service-action RestoreItemAction strips clusterIP to allow re-allocation: but only for standard ClusterIP services. Headless services (clusterIP: None — services that don't allocate a virtual IP, used for direct pod-to-pod DNS discovery) and some LoadBalancer configurations may still be affected.

PVC in Bound state

If you restore a PVC and the cluster already has a PV with the same name from the previous deployment, the PVC may bind to the wrong PV. Always ensure old PVs are removed (or use namespace remapping) for full restores.

CRD restore race

CRDs are restored first, but the custom resource validation webhook may not

be ready yet. Velero retries resource creation on failure, but complex webhooks can cause false failures in the first reconcile pass.

Debugging restore failures

# Describe the restore for summary
velero restore describe my-restore --details

# Get the restore log
velero restore logs my-restore

# Check restore results file (warnings + errors per item)
velero backup download my-backup --output /tmp/backup.tar.gz
# then inspect restore-my-restore-results.gz from the BSL

# Watch DataDownload objects (Kopia path)
kubectl get datadownload -n velero -w

Key file: pkg/restore/restore.go

Next Up

Hooks System