Codebase Tour
This section covers the places to look when you are debugging, adding features, or reviewing PRs.
The Velero codebase is well-structured but has some non-obvious conventions.
Repository layout¶
velero/
cmd/
velero/ # main() — CLI + server command dispatch
pkg/
apis/velero/v1/ # CRD types (Backup, Restore, BSL, etc.)
backup/ # Core backup logic — item collector, tar writer
restore/ # Core restore logic — item restorer, priority ordering
controller/ # All reconcilers (controller-runtime)
plugin/ # Plugin framework: interfaces, gRPC impl, client mgmt
plugin/velero/ # Public plugin interfaces (import in your plugin)
plugin/framework/ # go-plugin server/client plumbing, gRPC proto impl
plugin/clientmgmt/ # Plugin process lifecycle management
datamover/ # Data mover microservices (run inside mover pods)
datapath/ # Async backup/restore abstraction (FileSystemBR)
exposer/ # Volume exposure strategies (CSI snapshot, pod volume)
podvolume/ # Legacy pod-volume backup/restore (node-agent based)
uploader/ # Kopia uploader interface + implementation
repository/ # Kopia repository management (init, connect, maintain)
persistence/ # Object store operations for backup metadata
client/ # Velero client factory, kubebuilder client setup
discovery/ # API server resource discovery + grouping
archive/ # Tar archive reading/writing
nodeagent/ # Node-agent DaemonSet interaction
install/ # Installation resource generation
features/ # Feature flag system
constant/ # Shared constants (controller names, plugin names)
util/ # Shared utilities (kube, csi, logging, etc.)
metrics/ # Prometheus metrics (~40 metrics)
builder/ # Test object builders (fluent constructors)
test/ # Shared test helpers and mocks
internal/
hook/ # Pre/post hook execution on pods
resourcemodifiers/ # JSON/merge/strategic patches during restore
resourcepolicies/ # Volume policy engine (skip/fs-backup/snapshot)
volume/ # Volume metadata and snapshot location utilities
volumehelper/ # Volume decision helpers
credentials/ # Credential loading from files and K8s secrets
delete/ # Backup deletion + CSI cleanup
storage/ # BSL validation helpers
design/ # Design documents for major features
changelogs/ # Per-PR changelog fragments
hack/ # Build scripts, code gen, e2e setup
config/ # CRD manifests (generated)
Key files by subsystem¶
Backup pipeline¶
| File | What's in it |
|---|---|
pkg/backup/item_collector.go |
Resource discovery and collection. Start here to understand what gets backed up and why. |
pkg/backup/backup.go |
Top-level backup orchestrator: calls item collector, runs BIAs, writes tar. |
pkg/backup/item_backupper.go |
Per-item processing: runs BackupItemActions, handles volume decisions. |
pkg/backup/pod_action.go |
Built-in BIA: when a pod is backed up, adds its PVCs. |
pkg/backup/pvc_action.go |
Built-in BIA: adds the PV and triggers volume backup decision. |
Restore pipeline¶
| File | What's in it |
|---|---|
pkg/restore/restore.go |
Restore orchestrator: priority ordering, item restorer calls, PV handling. |
pkg/restore/item_restorer.go |
Per-item processing: runs RestoreItemActions, handles conflict policy. |
pkg/restore/service_action.go |
Built-in RIA: strips clusterIP from Services. |
Controllers¶
| File | What's in it |
|---|---|
pkg/controller/backup_controller.go |
State machine for Backup CRD lifecycle. Entry point for BackupController. |
pkg/controller/restore_controller.go |
State machine for Restore CRD lifecycle. |
pkg/controller/schedule_controller.go |
Cron trigger logic, creates Backup objects. |
pkg/controller/gc_controller.go |
Expired backup detection and deletion. |
pkg/controller/backup_sync_controller.go |
Reads BSL to sync Backup objects into cluster. |
pkg/controller/data_upload_controller.go |
node-agent side: reconciles DataUpload CRDs. |
pkg/controller/data_download_controller.go |
node-agent side: reconciles DataDownload CRDs. |
Plugin system¶
| File | What's in it |
|---|---|
pkg/plugin/velero/ |
All public plugin interfaces. Read these before writing a plugin. |
pkg/plugin/framework/ |
gRPC wiring between velero and plugin processes. Useful when debugging plugin communication issues. |
pkg/plugin/clientmgmt/ |
Plugin process lifecycle (start, stop, restart on crash). |
Volume / Kopia¶
| File | What's in it |
|---|---|
pkg/podvolume/backupper.go |
Kopia/Restic PVC backup orchestration: creates DataUpload CRDs, tracks status. |
pkg/podvolume/restorer.go |
PVC restore orchestration: creates DataDownload CRDs. |
pkg/uploader/kopia/ |
Kopia library integration: snapshot creation, progress tracking. |
pkg/repository/ |
Kopia repository management (init, connect, maintenance). |
CRD types¶
| File | What's in it |
|---|---|
pkg/apis/velero/v1/types.go |
All CRD type definitions. Source of truth for field semantics. |
pkg/apis/velero/v1/register.go |
Scheme registration: add new CRDs here. |
Testing conventions¶
Unit tests¶
Live adjacent to the code (backup_test.go next to backup.go). Use testify/assert and testify/require.
func TestBackupWithHooks(t *testing.T) {
require.NoError(t, err)
assert.Equal(t, velerov1api.BackupPhaseCompleted, backup.Status.Phase)
}
Builder pattern for test objects¶
// Use pkg/builder/ — don't construct literal structs in tests
backup := builder.ForBackup(velerov1api.DefaultNamespace, "test-backup").
IncludedNamespaces("app", "db").
StorageLocation("default").
TTL(metav1.Duration{Duration: 72 * time.Hour}).
Result()
restore := builder.ForRestore(velerov1api.DefaultNamespace, "test-restore").
Backup("test-backup").
Result()
E2E tests¶
Located in pkg/test/e2e/. Use ginkgo. Require a running cluster.
# Run E2E tests
make test-e2e
# Run a specific E2E test
cd pkg/test/e2e && go test ./... -run TestE2E/backup_restore_with_csi
Fake client¶
Controller tests use controller-runtime's envtest or a fake client:
Generated Code¶
CRD deepcopy functions and CRD manifests are generated. Do not edit them manually:
make generate # Runs controller-gen for deepcopy, CRD manifests
make verify-generate # CI check that generated code is up-to-date
make update-generated-crd-code # Alias used in some PR workflows
Prometheus Metrics¶
All metrics are defined in pkg/metrics/metrics.go. The velero server exposes :8085/metrics by default.
| Metric | Type | Description |
|---|---|---|
velero_backup_total |
Counter | Total backups, by schedule and phase |
velero_backup_duration_seconds |
Histogram | Backup duration distribution |
velero_backup_last_successful_timestamp |
Gauge | Last successful backup per BSL/schedule — use for alerting |
velero_restore_total |
Counter | Total restores by phase |
velero_volume_snapshot_success_total |
Counter | Successful volume snapshots |
velero_volume_snapshot_failure_total |
Counter | Failed volume snapshots |
velero_backup_items_total |
Gauge | Items in last backup, by resource type |
# Example alert: no successful backup in 25 hours
- alert: VeleroBackupMissed
expr: time() - velero_backup_last_successful_timestamp > 90000
for: 5m
labels:
severity: warning
annotations:
summary: "Velero backup missed"
Build and Run Locally¶
# Prerequisites: Go 1.21+, Docker, kind
git clone https://github.com/vmware-tanzu/velero
cd velero
make build # Build velero binary to _output/bin/
make test # Unit tests
make lint # golangci-lint (matches CI config)
make container # Build velero container image
# Run velero server locally (fast iteration without building containers)
./_output/bin/linux/amd64/velero server \
--log-level debug \
--kubeconfig ~/.kube/config
Design Documents¶
Before contributing to any area, read the relevant design doc in design/:
design/plugin-refactoring.md: plugin v2 architecture rationaledesign/pvc-backup-restore.md: the PVC/volume backup decision treedesign/csi-volumesnapshots.md: CSI integration designdesign/kopia-integration.md: why Kopia was chosen over Resticdesign/item-snapshotter.md: ItemSnapshotter v2alpha interface