API Machinery¶

The API machinery is the foundational layer of Kubernetes extensibility. Understanding GVK/GVR, the informer pipeline, and patch semantics is essential for writing controllers and debugging API interactions.

GVK & GVR¶

Every Kubernetes object is identified by two related concepts:

GVK (Group / Version / Kind) — identifies a type schema:

Component	Meaning	Examples
Group	Logical family	`""` (core), `apps`, `batch`, `networking.k8s.io`, `rbac.authorization.k8s.io`
Version	Schema version	`v1`, `v1beta1`, `v1alpha1`
Kind	Type name	`Deployment`, `Pod`, `CustomResourceDefinition`

GVR (Group / Version / Resource) — identifies a REST endpoint:

Component	Meaning	Examples
Resource	Plural lowercase noun	`deployments`, `pods`, `customresourcedefinitions`

The mapping between them:

GVK: apps/v1/Deployment  →  GVR: apps/v1/deployments
GVK: ""  /v1/Pod         →  GVR: ""  /v1/pods

This mapping lives in the RESTMapper. In client-go: mapper.ResourceFor(gvk). In Go code, the Scheme maps GVK ↔ Go struct type.

URL structure¶

# Core group (group = "")
/api/v1/namespaces/{ns}/pods/{name}
/api/v1/nodes/{name}

# Named groups
/apis/{group}/{version}/namespaces/{ns}/{resource}/{name}
/apis/apps/v1/namespaces/default/deployments/nginx
/apis/batch/v1/namespaces/default/jobs
/apis/custom.io/v1alpha1/namespaces/myns/foos/myfoo

# Sub-resources
/apis/apps/v1/namespaces/default/deployments/nginx/scale
/api/v1/namespaces/default/pods/mypod/log
/api/v1/namespaces/default/pods/mypod/exec

API discovery¶

Clients discover available APIs before making requests. Two endpoints:

GET /api        → core group versions
GET /apis       → all named groups

GET /api/v1                      → resource list for core v1
GET /apis/apps/v1                → resource list for apps/v1
GET /apis/custom.io/v1alpha1     → resource list for your CRD group

Aggregated discovery (GA 1.30) returns all groups + resources in two requests:

GET /api?aggregated=true
GET /apis?aggregated=true

kubectl caches discovery documents in ~/.kube/cache/discovery/. Stale cache causes "no matches for kind" errors — clear with kubectl api-resources or delete ~/.kube/cache/.

resourceVersion & watches¶

Every object has a resourceVersion field — a monotonically increasing etcd revision string:

metadata:
  resourceVersion: "42891"

Semantics:

Changes on every write (spec, status, metadata, labels — anything)
Used for optimistic concurrency: include it in writes; stale writes get 409 Conflict
Passed to LIST/WATCH to resume from a known point

Watch protocol¶

A watch is a long-lived HTTP/2 streaming request. The API server streams newline-delimited JSON events:

GET /apis/apps/v1/deployments?watch=1&resourceVersion=42891

{"type":"MODIFIED","object":{...}}
{"type":"DELETED","object":{...}}
{"type":"ADDED","object":{...}}

Error cases:

Condition	HTTP response	Action
`resourceVersion` too old (compacted)	`410 Gone`	Re-list (RV=""), restart watch
Network timeout / server close	Connection closed	Resume with last known RV
RV=""	Start from current head	No missed events, but potentially stale

The informer library handles all this automatically. Don't implement raw watches in application code.

resourceVersion semantics on LIST¶

`resourceVersion` value	Meaning
`""` (empty)	Return from API server cache (may be slightly stale). Most efficient.
`"0"`	Same as empty in practice.
Specific value	Return only if cluster state is at least that version.
`"0"` + `resourceVersionMatch: Exact`	Return exactly from that revision (expensive, hits etcd).

generation & observedGeneration¶

Two separate revision counters:

metadata.generation — increments on every spec change. Status updates don't increment it.
status.observedGeneration — set by the controller to the generation value it last successfully processed.

Use this pattern to detect controller drift:

if obj.Generation != obj.Status.ObservedGeneration {
    // controller hasn't caught up to latest spec yet
}

Always set observedGeneration in your status update:

obj.Status.ObservedGeneration = obj.Generation

Patch strategies¶

Four patch types, each with different semantics:

JSON Patch (RFC 6902)¶

Array of operations. Positional — fragile for lists.

[
  {"op": "replace", "path": "/spec/replicas", "value": 3},
  {"op": "add",     "path": "/metadata/labels/env", "value": "prod"},
  {"op": "remove",  "path": "/metadata/annotations/old-key"}
]

kubectl patch deploy/myapp --type=json \
  -p='[{"op":"replace","path":"/spec/replicas","value":3}]'

Merge Patch (RFC 7396)¶

Partial object. Null = delete field. Lists are replaced entirely — danger for containers[].

kubectl patch deploy/myapp --type=merge \
  -p='{"spec":{"replicas":3}}'

Strategic Merge Patch¶

Kubernetes-specific extension to merge patch. List fields have merge keys defined in the Go struct tags:

// containers merges by "name"
Containers []Container `json:"containers" patchStrategy:"merge" patchMergeKey:"name"`

Patching one container by name doesn't replace others:

kubectl patch deploy/myapp --type=strategic \
  -p='{"spec":{"template":{"spec":{"containers":[{"name":"app","image":"myapp:2.0"}]}}}}'

Strategic merge patch doesn't work on CRDs

SMP merge keys are defined in Go struct tags in the Kubernetes source. CRDs have no such tags — SMP falls back to merge patch behavior (list replacement). Use Server-Side Apply for CRDs.

Server-Side Apply (GA 1.22)¶

Field-level ownership tracking. The API server tracks which manager owns which field.

kubectl apply --server-side --field-manager=my-tool -f myapp.yaml

Conflict: two managers try to own the same field → 409 Conflict with details. Force-take ownership:

kubectl apply --server-side --force-conflicts --field-manager=my-tool -f myapp.yaml

In Go with controller-runtime:

err := r.Patch(ctx, obj, client.Apply, client.FieldOwner("my-controller"), client.ForceOwnership)

SSA is the correct approach for controllers that manage partial objects — they own only the fields they care about, and other managers (user, Helm, etc.) can coexist.

Informer architecture¶

The informer is the standard pattern for watching resources without hammering the API server.

API server (ListWatch)
    ↓
Reflector  →  ListWatch implementation; handles 410 Gone (re-list + re-watch)
    ↓
DeltaFIFO  →  queue of (object, delta-type) pairs; deduplicates
    ↓
Indexer    →  thread-safe in-memory store; GVK-indexed; supports label queries
    ↓
Event handlers (AddFunc, UpdateFunc, DeleteFunc)
    ↓
WorkQueue  →  rate-limited, deduplicating; items are NamespacedName keys
    ↓
Reconciler goroutines (n workers)

Key property: all reads in the reconcile loop come from the Indexer (local cache) — never the API server directly. This means the reconciler never adds to the API server's read load, no matter how many controllers run.

SharedInformerFactory¶

Multiple controllers for the same resource type share a single informer/watch:

factory := informers.NewSharedInformerFactory(client, 30*time.Second)
deployInformer := factory.Apps().V1().Deployments()
podInformer    := factory.Core().V1().Pods()

factory.Start(stopCh)
factory.WaitForCacheSync(stopCh)  // block until initial list is complete

The factory deduplicates by GVR — two controllers watching Deployments share one HTTP watch stream to the API server.

Always wait for cache sync¶

if !cache.WaitForCacheSync(stopCh, r.deploymentsSynced, r.podsSynced) {
    return fmt.Errorf("timed out waiting for caches to sync")
}

Starting reconcilers before the cache is synced causes false "not found" errors for objects that exist but haven't populated the local cache yet.

Object metadata deep dive¶

metadata:
  uid: 550e8400-e29b-41d4-a716-446655440000
  # Immutable. Unique forever (even after object deletion + recreation).
  # etcd uses (namespace, name) for storage; uid identifies a specific instance.

  resourceVersion: "42891"
  # etcd revision. String — compare only for equality, never parse as integer.
  # Changes on every write to any field (spec, status, labels, annotations, finalizers).

  generation: 3
  # Incremented only on spec changes. Status and metadata changes don't increment.
  # Use generation/observedGeneration to detect pending reconciliation.

  creationTimestamp: "2024-01-15T10:00:00Z"

  deletionTimestamp: "2024-01-16T10:00:00Z"
  # Set when DELETE is received AND finalizers are present.
  # Object persists until finalizers[] is empty.
  deletionGracePeriodSeconds: 30
  # Grace period for SIGTERM before SIGKILL on container stop.

  managedFields:
  # Server-Side Apply field ownership map. One entry per field-manager.
  # Can be stripped with --show-managed-fields=false or server-side.