Backup Mechanics¶

The backup pipeline is where most of Velero's complexity lives.

Understanding this path is essential for debugging, performance tuning, and writing plugins.

Backup lifecycle state machine¶

The full state machine involves 4 controllers in sequence:

                    BackupQueueController
New ──► Queued ──► ReadyToStart
                        │
                   BackupController
                   ReadyToStart ──► InProgress
                                        │
                              BackupOperationsController
                              WaitingForPluginOperations
                              WaitingForPluginOperationsPartiallyFailed
                                        │
                              BackupFinalizerController
                              Finalizing / FinalizingPartiallyFailed
                                        │
                              Completed / PartiallyFailed / Failed
                                        │
                                   ──► Deleting (via DeleteBackupRequest)

The BackupQueueController enforces concurrency limits and detects namespace conflicts (two backups covering the same namespaces cannot run simultaneously). Once a Backup reaches ReadyToStart, the BackupController takes over and drives execution.

Step-by-step Backup Execution¶

1. BackupController Picks Up a New Backup¶

A controller-runtime informer watches for Backup objects reaching ReadyToStart phase. The controller sets status.phase = InProgress and status.startTimestamp.

In HA deployments, controller-runtime's leader election (via Kubernetes leases) ensures only one replica runs reconcilers at a time, preventing concurrent execution of the same backup.

2. Resource Discovery and Collection¶

Uses the API server's discovery API to enumerate all resource types. For each resource type matching the include/exclude filters, lists objects via the dynamic client (an untyped Kubernetes client that works with any resource type without compiled Go structs — essential for backing up CRDs whose types Velero doesn't know at compile time).

Discovery is done concurrently with goroutines per resource group.

Key file: pkg/backup/item_collector.go

3. BackupItemAction Plugins¶

For each collected item, runs all registered BackupItemAction plugins whose AppliesTo() matches the resource type. These can:

Mutate the item (e.g. strip sensitive annotations)
Add additional items to the backup graph (e.g. the built-in pod-action adds the PVC when a Pod is backed up, ensuring PVC/PV pairs are consistent)
Set skip flags to exclude an item

This is where most custom business logic lives. See Plugin System.

4. PVC → Volume Backup Decision¶

For each PVC, Velero decides the volume backup method. Volume policies (ConfigMap-based, referenced via spec.resourcePolicy) take highest precedence, followed by pod annotations, then defaults:

Volume policy match (if configured): a ResourcePolicies ConfigMap can match by storage class, CSI driver, capacity, NFS config, PVC labels, or PVC phase. The first matching rule's action wins: skip, fs-backup, or snapshot.
Skip: if snapshotVolumes: false or if the PVC has the opt-out annotation backup.velero.io/backup-volumes-excludes
CSI VolumeSnapshot: if the CSI plugin is enabled and a matching VolumeSnapshotClass exists
Cloud provider snapshot: if a VolumeSnapshotter plugin is registered for the storage class
Kopia file-level copy: if defaultVolumesToFsBackup: true or if the PVC has the opt-in annotation backup.velero.io/backup-volumes

Volume policies are the modern approach

Volume policies replace the annotation-based per-pod opt-in/out model. They're centrally managed, support rich matching conditions, and apply cluster-wide without modifying application manifests.

5. Pre-backup hooks¶

Before serializing a pod's volume data, executes pre-backup hooks (exec into containers). Used to quiesce databases, flush caches, sync filesystems. See Hooks.

6. Volume Snapshot / Data Upload¶

CSI / Kopia pathCloud snapshot path

Creates DataUpload CRDs. The DataUploadController in node-agent picks these up and runs Kopia to upload data directly from the PVC mount on the node. Velero-server polls DataUpload.status for completion.

Calls the VolumeSnapshotter plugin synchronously. The plugin calls the cloud provider API and returns a snapshot ID that Velero stores in the backup metadata.

7. Post-backup hooks¶

After volume data is captured, runs post-backup hooks to un-quiesce (e.g. UNLOCK TABLES). Velero guarantees post hooks run even if pre hooks fail (unless onError: Fail caused the backup to abort).

8. Serialization and Upload¶

All collected, plugin-processed items are serialized to JSON and written into a tarball (backup.tar.gz). A backup-results.gz file captures warnings and errors per item. Both are streamed to the object store via the ObjectStore plugin.

Key file: pkg/backup/backup.go

9. Metadata Upload¶

A velero-backup.json metadata file is written to the BSL. This is what the BackupSyncController reads to reconstruct Backup objects in a new cluster (enabling cross-cluster restores without re-creating Backup CRDs manually).

Object Store Layout¶

{bucket}/{prefix}/
  backups/
    {backup-name}/
      velero-backup.json                       # Backup CRD spec + status
      {backup-name}.tar.gz                     # All K8s resources (JSON per item)
      {backup-name}-logs.gz                    # Velero server logs during backup
      {backup-name}-results.gz                 # Warnings and errors per item
      {backup-name}-csi-volumesnapshots.json.gz  # CSI snapshot metadata
      {backup-name}-volumesnapshots.json.gz    # Legacy VSL snapshot metadata
  restores/
    {restore-name}/
      restore-{restore-name}-logs.gz
      restore-{restore-name}-results.gz

Tar Archive Structure¶

resources/
  deployments/
    namespaces/
      default/
        my-deployment.json
  persistentvolumeclaims/
    namespaces/
      default/
        my-pvc.json
  persistentvolumes/
    cluster/                    # cluster-scoped resources live here
      pvc-abc123.json

Useful Debug Techniques¶

# Watch backup progress
kubectl get backup my-backup -n velero -o yaml -w

# Stream velero server logs during a backup
kubectl logs -n velero deployment/velero -f --since=5m

# Inspect what's in the tar archive
velero backup download my-backup --output /tmp/my-backup.tar.gz
tar -tzf /tmp/my-backup.tar.gz | head -50

# See per-item warnings/errors
velero backup describe my-backup --details

Performance Note

Backup speed is bound by API server list throughput and object store upload bandwidth. For large clusters (10k+ objects), the list phase dominates.

The spec.resourceVersion is set at list time: items added after listing may be missing from the backup.

Next Up¶

Restore Mechanics