Scoped export token
External Vault export now uses a scoped periodic token and scripts renew it at job start.
The authoritative resume queue remains in TODO.md. This page gives the project-readable version of the same operational sequence.
Completed gate
External Vault export now uses a scoped periodic token and scripts renew it at job start.
Vault has one Kubernetes auth mount per OpenShift cluster for ESO login boundaries.
SecretStore/rke2-vault and ExternalSecret/eso-vault-smoke are Ready on all four clusters.
Readiness framework
| Track | Next step | Done signal |
|---|---|---|
| Gate source | Use the production readiness checklist as the local analysis page. | Every gate has an evidence source, owner, remediation path, and accepted risk state. |
| Hub DR | Deferred for the POC per ADR-0005. POC ships on hub-dc + spoke-dc only. Original gate (backup/image/observability/lifecycle) re-applies when DR work is in scope post-POC. | Resumption signal: explicit decision to re-open DR work; ADR-0005 is revisited. |
| Regional DR | Deferred for the POC per ADR-0005. spoke-dr stays platform standby (ADR-0001) without an active DR drill in POC scope. Drill captures (image pre-pull warmth, Vault role on spoke-dr, runbook activation steps, RTO) re-apply when DR work resumes. | Resumption signal: explicit decision to re-open DR work; ADR-0005 is revisited. |
Wait for the hub image pre-pull DaemonSet to finish or expose a clear pull failure.
Build the DR-reachable image mirror/IDMS path for hub recovery images.
Prove backup freshness, hub-dr passive state, dry-run restore manifests, and restate abort criteria.
Run hub-dc to hub-dr activation only after gates pass and ownership risk is accepted.
Parallel work
| Track | Next step | Done signal |
|---|---|---|
| ACM Observability | Choose resilient storage for Thanos stateful PVCs. | MultiClusterObservability Ready on both hubs and PVCs no longer depend on lab-local LVMS for production claims. |
| Backup health alerts | Validate the new ACM/OADP alert rules through an alerting cycle. | Failed or stale backups produce visible alert signals before DR drills. |
| Image mirror / pre-pull | Monitor current pre-pull, then implement durable mirror/IDMS. | First-start hub-dr pulls no longer depend on public registry latency. |
| Governance PolicySets | Group baseline ACM policies into explicit PolicySets. | Compliance posture is visible by purpose rather than loose individual policies. |
| Vault / external secrets | Use the completed smoke path as the template for app-specific policies and roles. | Each real app gets a scoped Vault policy and no static app secrets are committed to Git. |
| External dependency policy | Document which external services are allowed in the OCP wiki and only when they affect OpenShift core operations. | Wiki references stay limited to Vault, backup/object storage, GitOps source, identity, ingress, and approved app onboarding touchpoints. |
| Java/JBoss app | Create workload base, image build, runtime config, and OCP overlay. | App runs on selected spoke with health checks, secrets through ESO, and documented OpenShift routing/mesh posture. |
| OSSM 3 onboarding | Select workload namespace and routing model. | Kiali shows app traffic and routes behave as intended. |
| spoke-dr semantics | Decided 2026-05-07: platform standby. Documented in spoke-dr and topology. | Done. Follow-up work is the Regional DR drill row above and the activation runbook in Backup and DR. |
| Wiki maintenance | Update pages after meaningful fleet changes. | Wiki matches CURRENT_STATE.md, ASSESSMENT.md, and TODO.md. |
Work tracking