Skip to content

Controller Deep Dive

Velero has 19 controllers reconciling ~10 CRDs. The non-obvious part: some CRDs are reconciled by multiple controllers at different lifecycle phases. Understanding which controller owns which phase transition is essential for debugging and contributing.

Backup Controller Chain

A single Backup CRD is processed by 4 controllers in sequence:

BackupQueueController         BackupController
    │                              │
    │ New → Queued → ReadyToStart  │ ReadyToStart → InProgress
    │ (checks concurrency,         │ (runs backup.Backup:
    │  namespace conflicts)        │  collect items, run plugins,
    │                              │  write tar, upload to BSL)
    │                              │
    ▼                              ▼
BackupOperationsController     BackupFinalizerController
    │                              │
    │ InProgress →                 │ Finalizing → Completed
    │ WaitingForPluginOperations   │ (finalize item actions,
    │ (polls async plugin ops      │  upload final metadata)
    │  every 10s)                  │

BackupQueueController

  • Trigger: New Backup objects + periodic re-evaluation (1m)
  • Responsibility: Queue ordering, concurrent backup limit enforcement, namespace conflict detection
  • Transitions: New → Queued → ReadyToStart
  • Key logic: Two backups covering overlapping namespaces cannot run simultaneously. Uses set intersection on included namespaces.

BackupController

  • Trigger: Backup reaches ReadyToStart
  • Responsibility: Execute the backup via backup.Backup
  • Transitions: ReadyToStart → InProgress
  • Key logic: Calls BackupWithResolvers() which runs item collection, BackupItemAction plugins, volume snapshots, and tar archiving.

BackupOperationsController

  • Trigger: Periodic (10s)
  • Responsibility: Poll async plugin operations (v2 BIA operations)
  • Transitions: InProgress → WaitingForPluginOperations → Finalizing

BackupFinalizerController

  • Trigger: Backup reaches Finalizing
  • Responsibility: Run finalization hooks, upload final metadata to BSL
  • Transitions: Finalizing → Completed / PartiallyFailed / Failed

Restore Controller Chain

Similar pattern with 3 controllers:

RestoreController              RestoreOperationsController
    │                              │
    │ New → InProgress             │ InProgress →
    │ (unpack tar, restore         │ WaitingForPluginOperations
    │  CRDs first, then all        │ (polls async plugin ops)
    │  resources in priority       │
    │  order)                      ▼
    │                          RestoreFinalizerController
    │                              │
    │                              │ Finalizing → Completed
    │                              │ (finalization hooks,
    │                              │  upload results)

Other Controllers

Controller Watches Trigger Action
ScheduleController Schedule CRD Spec + 1m Cron evaluation, skip/pause logic, creates Backup objects
GCController Backup CRD 60m TTL-based expiration, creates DeleteBackupRequests
BackupSyncController BSL Periodic Reads BSL to sync Backup objects into cluster
BackupDeletionController DeleteBackupRequest Create/update Deletes backup from storage, cleans CSI artifacts
BSLController BSL 10s Validates storage connectivity, scrubs error messages
BackupRepoController BackupRepository CRD Spec + 5m Establishes repos, triggers Kopia maintenance
DataUploadController DataUpload CRD Create/update Manages backup data mover pods (CSI path)
DataDownloadController DataDownload CRD Create/update Manages restore data mover pods (CSI path)
PodVolumeBackupController PodVolumeBackup CRD Create/update Legacy FS-based volume backup via node-agent
PodVolumeRestoreController PodVolumeRestore CRD Create/update Legacy FS-based volume restore via node-agent
DownloadRequestController DownloadRequest CRD Create/update Generates signed URLs for backup/restore artifacts
ServerStatusRequestController ServerStatusRequest CRD Create/update Returns server version and installed plugins

Key Files

File What it does
pkg/controller/backup_controller.go State machine for Backup CRD from ReadyToStart onward
pkg/controller/backup_queue_controller.go Queue management, concurrency, namespace conflict detection
pkg/controller/restore_controller.go State machine for Restore CRD lifecycle
pkg/controller/schedule_controller.go Cron trigger logic, creates Backup objects
pkg/controller/gc_controller.go Expired backup detection and deletion
pkg/controller/backup_sync_controller.go Reads BSL to sync Backup objects into cluster
pkg/controller/data_upload_controller.go Data mover pod lifecycle + VGDP concurrency + cancel/finalizer handling
pkg/controller/data_download_controller.go Mirror of data upload for restore path

Patterns

Finalizer-based Cleanup

Restore, DataUpload, DataDownload, and PodVolumeBackup all use finalizers to ensure resource cleanup before deletion. The controller adds a finalizer when it starts processing and removes it only after cleanup is complete.

Stuck in Terminating

If a controller crashes between adding a finalizer and completing cleanup, the resource gets stuck in Terminating. Check for abandoned finalizers with: kubectl get <resource> -n velero -o jsonpath='{.items[*].metadata.finalizers}'

Phase State Machines

All CRDs follow a similar pattern: New → InProgress → WaitingForPluginOperations → Finalizing → Completed/Failed

Partial failure is tracked separately — PartiallyFailed and FinalizingPartiallyFailed carry errors from individual items without failing the entire operation.

Concurrency Control

  • Backup queue: Configurable concurrent backup limit
  • VGDP counter: Limits concurrent data movement jobs per node
  • Progress throttle: Updates throttled to 1s to avoid API server pressure

Next Up

Hooks System