Readiness gates
Production readiness checklist
Red Hat does not publish one universal production-readiness checklist for every OpenShift environment. This page turns the relevant Red Hat and OpenShift guidance into local gates we will analyze, validate, and implement for the BRAC POC fleet.
Source set
Official guidance to track
| Source | Use in this wiki |
|---|---|
| OCP 4.20 installation overview | Production clusters must be prepared with persistent storage, identity provider integration, and core platform monitoring before users consume the cluster. |
| OCP 4.20 scalability and performance | Control plane, infrastructure, object limits, latency, capacity, and scale practices. |
| OCP 4.20 etcd | etcd health, backup, restore, performance, defragmentation, and multi-site caution areas. |
| OCP 4.20 monitoring | Supported monitoring customization, alert receivers, and persistent storage for monitoring high availability. |
| Red Hat OpenShift 4 upgrade pre-checks | Practical operator, update path, removed API, etcd backup, must-gather, ODF, and virtualization checks before major operational changes. |
| Red Hat multi-site OpenShift guidance | Subscription-gated guidance for OpenShift clusters or designs spanning sites, regions, data centers, or non-ideal network conditions. |
| ODF 4.20 DR introduction | RPO/RTO framing and distinctions between Metro-DR, Regional-DR, and stretch cluster approaches. |
| ODF Regional-DR solution | Regional-DR workflow using RHACM hub plus two managed OpenShift clusters for failover and failback. |
Gate model
Local readiness gates to analyze
- Core cluster health: ClusterVersion healthy, ClusterOperators healthy, all expected nodes Ready, MachineConfigPools stable, no persistent critical alerts, and no unresolved ODF/RHACS/OLM blockers.
- Production integrations: persistent storage, OAuth/identity provider, supported monitoring configuration, alert receivers, default ingress, DNS, certificates, and route exposure are explicit.
- etcd and platform backup: etcd backup process is current, restorable, stored off-cluster, and matched to the OpenShift version; backup and restore commands are documented.
- OADP and ACM backup: DPA/BSL objects are available, backup schedules meet agreed RPO, backup health alerts fire, and restore dry-runs are part of drill entry criteria.
- Upgrade and lifecycle readiness: update path, operator compatibility, removed APIs, must-gather, support case boundaries, and maintenance windows are checked before upgrades or disruptive drills.
- Image and disconnected readiness: DR paths do not depend on slow public pulls; mirror, IDMS/ITMS, CatalogSource, registry CA, and pull-secret handling are in desired state or explicitly accepted as a lab limitation.
- Security and secrets: admin access, RBAC, kubeadmin retirement path, Vault/ESO boundaries, RHACS posture, no static app secrets in Git, and audit evidence are tracked.
- Observability and operations: platform monitoring storage, Alertmanager routing, logging, tracing, ACM Observability, runbooks, session evidence, and project tracking are maintained.
- Regional-DR readiness: ODF Advanced entitlement, ACM hub, two managed clusters, network reachability, Submariner/CIDR decisions, DRPolicy, protected app, planned relocate, unplanned failover, failback, and RPO/RTO evidence are proven.
Execution tracker
Milestone and phase issues
Implementation is tracked in the Production Readiness Gates milestone and the BRAC POC OpenShift Operations project board.
| Phase | Issue | Gate |
|---|---|---|
| 0 | #18 Gate scope, scoring, and evidence matrix | Define how gates pass, block, or carry accepted risk. |
| 1 | #19 Core cluster health | ClusterVersion, ClusterOperators, nodes, MachineConfigPools, and live exceptions. |
| 2 | #20 Production integrations | Storage, identity, ingress, DNS, certificates, alerts, and Vault/ESO boundaries. |
| 3 | #21 etcd and platform backup | Cluster backup process, restore path, retention, and off-cluster survivability. |
| 4 | #22 OADP and ACM backup | DPA/BSL health, backup freshness, alert behavior, and restore dry-run readiness. |
| 5 | #23 Upgrade and lifecycle readiness | Update path, operator compatibility, removed APIs, must-gather, and rollback planning. |
| 6 | #24 Image and disconnected readiness | Durable mirror, IDMS/ITMS, catalogs, registry trust, and pull-secret handling. |
| 7 | #25 Security and secrets | RBAC, kubeadmin posture, Vault/ESO boundaries, RHACS, and no static secrets in Git. |
| 8 | #26 Observability and operations | Monitoring, alerts, ACM Observability, logging/tracing, runbooks, and evidence hygiene. |
| 9 | #27 Regional-DR readiness | ODF Advanced Regional-DR, Submariner/CIDR, DRPolicy, protected app, failover, and failback. |
| 10 | #28 Final evidence certification | Publish pass, blocked, or accepted-risk status for every gate. |
Implementation queue
How these gates become work
- Analyze: compare each gate against live evidence and current desired state before creating remediation changes.
- Track: use GitHub Issues and milestones for bounded work, but keep desired state in
lab-gitops-full/orlab-workloads/. - Implement: make platform changes through GitOps unless a live break-glass fix is explicitly approved.
- Validate: capture read-only command evidence, alert evidence, dry-run output, and drill outcomes in local session reports.
- Promote: only mark a gate passing when the live check, desired-state check, and runbook evidence agree.
Current fit
Known BRAC POC mapping
| Gate | Current local direction |
|---|---|
| Core cluster health | Recent focused check found all clusters on OCP 4.20.18 with healthy ClusterVersion and Ready nodes, but live exceptions must be rechecked. |
| Production integrations | Vault/ESO smoke is complete; ACM Observability is Ready; production storage and alert receiver decisions still need hardening. |
| Backup and restore | General OADP daily backups complete; ACM backup freshness still needs a strict RPO gate before hub drills. |
| Image readiness | Hub pre-pull is a useful bridge; durable mirror/IDMS remains an active gate for hub and spoke recovery images. |
| Regional DR | ODF Regional-DR POC milestone now phases the work from scope through hot-standby certification for spoke-dc to spoke-dr. |