Restore Mechanics
Restore is the harder problem:
Replaying Kubernetes resources requires careful ordering, dependency resolution, and handling of state that can't be simply replayed.
Restore Pipeline¶
1. Download and decompress¶
RestoreController downloads the backup tarball from the BSL. All subsequent
processing is in-memory / temp filesystem on the velero-server pod.
2. Restore priority ordering¶
Resources are restored in a hardcoded priority order:
- CRDs and namespaces first,
- then cluster-scoped resources,
- then namespaced resources.
Within each tier, dependencies are respected.
// pkg/restore/restore.go — resourcePriorities (abridged)
var resourcePriorities = []string{
"customresourcedefinitions",
"namespaces",
"storageclasses",
"volumesnapshotclasses.snapshot.storage.k8s.io",
"volumesnapshotcontents.snapshot.storage.k8s.io",
"volumesnapshots.snapshot.storage.k8s.io",
"persistentvolumes",
"persistentvolumeclaims",
"secrets",
"configmaps",
"serviceaccounts",
"limitranges",
"pods",
"replicasets.apps",
"clusterrolebindings.rbac.authorization.k8s.io",
"clusterroles.rbac.authorization.k8s.io",
"roles.rbac.authorization.k8s.io",
"rolebindings.rbac.authorization.k8s.io",
}
Resources not in this list are restored after all prioritized resources, in undefined order.
3. RestoreItemAction plugins¶
Same plugin pattern as backup: for each item, registered RestoreItemAction
plugins run.
Built-in examples:
job-action: resetsJob.spec.completionsto allow re-executionservice-action: stripsspec.clusterIPandspec.clusterIPsto allow re-allocationserviceaccount-action: strips generated token secrets (regenerated by the cluster)csi-volumesnapshot-restore-action: handles CSI VolumeSnapshot object restoration
4. API field cleanup¶
Before re-applying, Velero strips fields that should not be replayed:
metadata.resourceVersionmetadata.uidmetadata.creationTimestampstatus(entire block)- Controller-injected annotations (e.g.
kubectl.kubernetes.io/last-applied-configuration)
This prevents conflicts with the receiving cluster's state.
5. PV/PVC restore path decision¶
For each PVC in the backup, Velero checks for associated volume data:
- Creates a
VolumeSnapshotContent(withdeletionPolicy: Retain) pointing at the original snapshot handle. - Creates a
VolumeSnapshotreferencing it. - Creates the PVC with
dataSourcepointing at the VolumeSnapshot. - CSI driver's
CreateVolumeFromSnapshotRPC is triggered by the PVC binding.
- Restores the PVC and PV objects normally.
- Waits for a pod to be scheduled that mounts the PVC.
- Creates a
DataDownloadCRD targeting the node where the pod is scheduled. - node-agent's
DataDownloadControllerruns Kopia to write data into the mounted PVC. - Pod's init container (injected by Velero) blocks until DataDownload completes.
Calls the VolumeSnapshotter plugin to restore the volume from the cloud
snapshot ID stored in the backup metadata.
6. Resource Modifiers¶
Before creating resources in the cluster, Velero applies resource
modifiers — declarative JSON, merge, or strategic merge patches loaded
from a ConfigMap referenced in spec.resourceModifier. These enable
restore-time transformations without writing a plugin:
# Example: change storage class during restore
version: v1
resourceModifierRules:
- conditions:
groupResource: persistentvolumeclaims
namespaces: ["*"]
mergePatches:
- patchData: |
{"spec": {"storageClassName": "new-storage-class"}}
Conditions can match by namespace, group/resource, name regex, labels, and field values. Patches are applied in sequence, after RestoreItemAction plugins run.
7. Restore hooks (init containers)¶
Post-restore hooks are injected as init containers into pod specs before the pod is created. They run after volume data is written, before the app container starts. See Hooks.
Conflict handling¶
| Scenario | existingResourcePolicy: none |
existingResourcePolicy: update |
|---|---|---|
| Resource doesn't exist | Create it | Create it |
| Resource exists, same spec | Skip (warning logged) | No-op patch |
| Resource exists, different spec | Skip (warning logged) | Strategic merge patch |
| Resource exists, immutable field changed | Skip | Error (log + continue) |
Known pitfalls¶
Service clusterIP conflict
Services with spec.clusterIP set will fail to create if the same IP is
already allocated. The built-in service-action RestoreItemAction strips
clusterIP to allow re-allocation: but only for standard ClusterIP services.
Headless services (clusterIP: None — services that don't allocate a
virtual IP, used for direct pod-to-pod DNS discovery) and some
LoadBalancer configurations may still be affected.
PVC in Bound state
If you restore a PVC and the cluster already has a PV with the same name from the previous deployment, the PVC may bind to the wrong PV. Always ensure old PVs are removed (or use namespace remapping) for full restores.
CRD restore race
CRDs are restored first, but the custom resource validation webhook may not
be ready yet. Velero retries resource creation on failure, but complex webhooks can cause false failures in the first reconcile pass.
Debugging restore failures¶
# Describe the restore for summary
velero restore describe my-restore --details
# Get the restore log
velero restore logs my-restore
# Check restore results file (warnings + errors per item)
velero backup download my-backup --output /tmp/backup.tar.gz
# then inspect restore-my-restore-results.gz from the BSL
# Watch DataDownload objects (Kopia path)
kubectl get datadownload -n velero -w
Key file: pkg/restore/restore.go