Cloud disaster recovery sits at the junction of architecture, policy, and operational readiness, and most Australian organisations are weaker than they think on at least one of those three fronts. The question is rarely whether a DR plan exists. It is whether the plan has been tested, whether the recovery time objective (RTO) and recovery point objective (RPO) targets are realistic, and whether the infrastructure actually supports them. Getting that right requires more than a checkbox exercise before an audit.
Why cloud DR is harder than it looks
Public cloud promises elasticity and redundancy, but those properties do not translate automatically into resilience. A workload running in a single AWS Sydney availability zone can still go dark during a zonal outage. Data stored in a single region can still be inaccessible during a regional event. Australian organisations that rely on a single cloud provider's native backup tooling, without a tested failover path, often discover the gaps at exactly the wrong moment.
The ACSC's guidance on business continuity and resilience is explicit on this point: backup is not the same as recovery. A backup confirms that data was preserved at a point in time. A recovery test confirms that the data can be retrieved, that systems can restart in the correct order, and that dependent services reconnect as expected. Without that confirmation, an RTO of four hours is a wish, not a commitment.
Australian organisations also face a geographic constraint that their US and European counterparts do not. With a smaller domestic population of cloud regions, local failover options are more limited. AWS operates in Sydney and Melbourne, Azure covers East Australia (Sydney), Southeast Australia (Melbourne), and Australia Central (Canberra), and GCP anchors its presence in Sydney. That is a workable set of options, but it also means that choosing the right cloud platform matters for DR. For a detailed look at how each provider stacks up for local workloads, the comparison of AWS, Azure, and GCP for Australian workloads is worth reading before committing to a DR architecture.
The key DR strategies and when to use each
There are four broad approaches to cloud DR, each with different cost and complexity trade-offs. Choosing correctly depends on how much downtime your business can absorb and how much data loss is acceptable.
- Backup and restore. The lowest-cost option. Data is backed up to a secondary region or cloud storage tier and restored from scratch when needed. RTOs are long, often measured in hours or days. Suitable for non-critical workloads where cost is the primary constraint.
- Pilot light. A minimal set of core services runs continuously in the DR environment, with the rest of the infrastructure ready to scale up when triggered. RTOs drop to minutes or low hours. Good for workloads where cost-efficiency matters but complete cold restoration is too slow.
- Warm standby. A scaled-down but fully functional version of the production environment runs continuously. Failover involves scaling up rather than spinning up. RTOs of minutes are achievable. More expensive than pilot light, but recovery is significantly more reliable.
- Multi-site active-active. Production runs simultaneously across two or more sites or regions, with traffic distributed between them. Failover is near-instant. This is the most expensive architecture but delivers the highest resilience for mission-critical systems.
Most Australian enterprises land on a mix of these approaches across their workload portfolio. Core financial and customer-facing systems typically warrant warm standby at minimum. Internal tools and development environments can often tolerate backup and restore. The mistake is applying a single DR tier to all workloads, which either overspends on low-priority systems or under-protects critical ones.
Data residency and sovereignty in DR planning
Australian data residency rules create a constraint that every DR plan must account for. When a DR environment replicates data to a secondary region, that region must still meet the data residency requirements applicable to the workload. For most federal government systems, that means keeping data onshore at all times, including during a DR event. For regulated industries such as finance and healthcare, contractual and legal obligations may be equally strict.
This rules out using offshore DR regions as a cost-saving measure for sensitive workloads. It also creates complexity in multicloud DR designs where one provider's Australian footprint may be better positioned for a particular workload. The complete guide to Australian data residency covers the current rules in detail, including the Privacy Act reform implications that are reshaping obligations in 2026.
Sovereign cloud offerings add another layer to this picture. Providers including Microsoft, AWS, and several local operators now offer environments designed specifically for Australian government and regulated industry workloads, with tighter controls on data access and staff vetting. DR planning for agencies and regulated entities should account for whether the DR target environment maintains the same sovereignty controls as production.
Testing: the part most organisations skip
A DR plan that has not been tested in the last twelve months is, in practical terms, untested. Systems change. Dependencies are added. Configuration drift happens. The script that triggered a clean failover during last year's test may fail silently because a service account password rotated and nobody updated the runbook.
Effective DR testing for cloud environments involves more than confirming that backups exist. A full rehearsal should verify that data restores within the committed RPO, that applications start in the correct dependency order, that network routing resolves to the DR environment, and that monitoring and alerting are active in the recovery site. Tabletop exercises are useful for communication and governance, but they do not substitute for a live failover test.
The frequency question depends on risk appetite and regulatory requirements. APRA-regulated entities operating under CPS 230 are expected to test business continuity arrangements regularly, with documentation of results. Federal government agencies subject to the ACSC's Essential Eight requirements carry similar expectations. For most other organisations, an annual full rehearsal supplemented by quarterly component tests is a reasonable baseline.
Cost management in cloud DR
One of the practical advantages of cloud-based DR over legacy on-premises approaches is that standby infrastructure does not need to run at full capacity continuously. Pilot light and warm standby architectures both take advantage of the cloud's ability to scale on demand, holding only a skeleton environment in the DR site until it is needed.
The risk is that optimising for cost introduces latency into the recovery process. An environment that scales from near-zero to production capacity in fifteen minutes looks fine on paper but may miss the RTO in practice if the scaling triggers slow, if AMIs or container images need to be pulled from a distant registry, or if database promotion steps are manual rather than automated.
Cloud cost pressure also creates a temptation to reduce the frequency of DR tests or to maintain incomplete environments in the secondary region. Both reduce confidence in the plan. The better approach is to automate the DR environment as infrastructure-as-code so that it can be spun up fully, tested, and torn down quickly, keeping test costs low without sacrificing fidelity. This connects directly to good cloud cost governance more broadly: unexamined idle resources in DR environments are one of the more common sources of bill blowout. A structured cloud cost optimisation review often surfaces DR-related waste alongside the more obvious compute and storage inefficiencies.
Getting the foundations right
Cloud DR that actually holds up in a real incident is built on a few non-negotiable foundations. Clear ownership of the DR plan, with named individuals responsible for each recovery phase. Documented and tested runbooks, reviewed after every significant infrastructure change. RPO and RTO targets that are grounded in business impact analysis rather than guesswork. And a testing cadence that treats recovery rehearsal as a normal operational activity, not an occasional compliance tick.
For Australian organisations navigating the combination of local data residency obligations, APRA and ACSC compliance requirements, and the practical realities of a limited domestic cloud footprint, getting cloud DR right is more demanding than the vendor brochures suggest. But the organisations that do it well treat DR not as a separate project but as a continuous property of their cloud architecture. That mindset shift is what separates resilient operations from ones that discover their weaknesses at the worst possible time.
