Status

Recorded current state

GitOps
spoke-dr-cluster-config recorded Synced/Healthy.
Storage
ODF desired/live spec uses compact LVMS topology and StorageCluster is ready.
OADP
lab-dpa reconciled, BSL available, latest scheduled daily completed in the recorded run. A historical partially failed backup remains in history.
Local app middleware
Demo middleware was retired from spoke desired state. Non-core app middleware is not tracked as part of the OpenShift core operations scope.
AI platform
RHOAI operator installed, but no DataScienceCluster or user AI workloads exist.
User workload metrics
Enabled; user workload Prometheus, Thanos ruler, and the Prometheus operator should run here.
Vault / ESO
SecretStore/rke2-vault is Ready=True and ExternalSecret/eso-vault-smoke is synced through the kubernetes-spoke-dr Vault auth mount.

OSSM 3

Mesh platform state

Decision recorded 2026-05-07

Standby semantics: platform standby

spoke-dr remains platform standby — operators, OSSM 3, ESO, OADP, and storage stay ready, but no application workloads run in normal operation. lab-workloads/clusters/spoke-dr/kustomization.yaml is intentionally resources: []; the workload ApplicationSet matches spoke-dc only. Full rationale and alternatives in ADR-0001.

This avoids the cost of cross-cluster session replication, dual image promotion, and split-brain handling that hot-standby would imply. The trade-off is non-trivial activation time during a regional spoke-dc loss; that's accepted in exchange for operational simplicity.

DR activation procedure (runbook-driven, owner-approved):
  1. Confirm spoke-dc is unrecoverable (or accepted-loss for the drill).
  2. Owner opens a PR on lab-workloads populating clusters/spoke-dr/kustomization.yaml with the workload references currently on spoke-dc.
  3. Merge → the workload ApplicationSet selector is widened (or a one-time manual Application per workload) so the hub Argo CD generates spoke-dr-workloads; the Argo CD Agent on spoke-dr reconciles. Image pull warmth comes from the pre-pull controller already on the cluster.
  4. Owner flips the public DNS / LB to spoke-dr's ingress.
  5. Capture evidence (Argo sync time, route reachability, smoke tests) for the drill record.