spoke-dr
spoke-dr is the workload standby cluster. Decision 2026-05-07: platform standby — platform services are kept ready, application workloads activate only via the DR drill runbook. Not a hot mirror.
Status
Recorded current state
- GitOps
spoke-dr-cluster-configrecordedSynced/Healthy.- Storage
- ODF desired/live spec uses compact LVMS topology and
StorageClusteris ready. - OADP
lab-dpareconciled, BSL available, latest scheduled daily completed in the recorded run. A historical partially failed backup remains in history.- Local app middleware
- Demo middleware was retired from spoke desired state. Non-core app middleware is not tracked as part of the OpenShift core operations scope.
- AI platform
- RHOAI operator installed, but no DataScienceCluster or user AI workloads exist.
- User workload metrics
- Enabled; user workload Prometheus, Thanos ruler, and the Prometheus operator should run here.
- Vault / ESO
SecretStore/rke2-vaultisReady=TrueandExternalSecret/eso-vault-smokeis synced through thekubernetes-spoke-drVault auth mount.
OSSM 3
Mesh platform state
servicemeshoperator3.v3.3.2andkiali-operator.v2.22.2CSVs recordedSucceeded.Istio/default,IstioCNI/default,ZTunnel/default, Kiali, OSSM Console, CNI pods, and ingress gateway are recorded running.- Ambient components are pinned to OSSM 3.3 version
v1.28.5. - No application namespace has opted into ambient yet; use
istio.io/dataplane-mode=ambientfor app onboarding. - RHOAI ServiceMesh capability is recorded
False/MissingOperatorbecause it expects the old OSSM v2 operator; this does not block OSSM 3 itself.
Decision recorded 2026-05-07
Standby semantics: platform standby
spoke-dr remains platform standby — operators, OSSM 3, ESO, OADP, and storage stay ready, but no application workloads run in normal operation. lab-workloads/clusters/spoke-dr/kustomization.yaml is intentionally resources: []; the workload ApplicationSet matches spoke-dc only. Full rationale and alternatives in ADR-0001.
This avoids the cost of cross-cluster session replication, dual image promotion, and split-brain handling that hot-standby would imply. The trade-off is non-trivial activation time during a regional spoke-dc loss; that's accepted in exchange for operational simplicity.
- Confirm
spoke-dcis unrecoverable (or accepted-loss for the drill). - Owner opens a PR on
lab-workloadspopulatingclusters/spoke-dr/kustomization.yamlwith the workload references currently onspoke-dc. - Merge → the workload
ApplicationSetselector is widened (or a one-time manualApplicationper workload) so the hub Argo CD generatesspoke-dr-workloads; the Argo CD Agent onspoke-drreconciles. Image pull warmth comes from the pre-pull controller already on the cluster. - Owner flips the public DNS / LB to
spoke-dr's ingress. - Capture evidence (Argo sync time, route reachability, smoke tests) for the drill record.