Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Ensuring high availability and enabling cross-region replication are essential for mission-critical applications using Azure DocumentDB. This document outlines best practices for configuring and managing high availability (HA) and cross-region replication. Follow guidance in this document to achieve optimal performance, resilience, and disaster recovery capabilities in Azure DocumentDB.
High availability (HA) best practices
Use HA for production clusters
Enabling high availability (HA) is crucial for production clusters and any clusters that are sensitive to downtime. In a production environment, unexpected node failures can cause significant disruptions. HA ensures that your cluster remains available and operational with zero data loss even when one of its physical shards (nodes) becomes unavailable.
Use HA to achieve 99.99% SLA
Azure DocumentDB offers a 99.99% monthly availability SLA for clusters with high availability enabled. To meet this SLA, ensure that HA is activated for all critical workloads that require continuous uptime.
Enable HA for automatic failover
Clusters with high availability enabled automatically recover from physical shard failures without manual intervention. When a node failure occurs, the system promotes a standby physical shard to replace the failed primary node. The automatic failover process retains the same connection string, so that the failover process is seamless and transparent to applications. This feature is critical for applications that require continuous uptime and consistent data access.
Disable HA for non-production clusters
For non-production clusters or those clusters that aren't sensitive to downtime, high availability can be disabled to reduce costs. These environments may tolerate occasional downtime without impacting business operations. Carefully assess the risk and cost trade-offs before disabling HA on any cluster.
Use HA with availability zones
In regions where availability zones are supported, enabling HA ensures that each primary-standby physical shard pair is provisioned in different availability zones. Zone redundancy provides extra resilience by protecting your cluster from data center-level failures within a region.
Cross-region replication best practices
Use cross-region replication for disaster recovery
Use cross-region replication when a copy of cluster data needs to be stored in another Azure region for disaster recovery (DR) purposes. Cross-region replication ensures that your data is available even in the event of a regional outage. Azure DocumentDB supports active-passive replication configuration to facilitate cross-region disaster recovery. Active-passive replication keeps one cluster as the primary one in read-write mode and maintains a read-only replica cluster in another Azure region.
If there's a rare regional outage, replica cluster can be promoted to become the new read-write cluster with minimal interruption. This capability ensures that your data remains safe and accessible even if an entire region experiences an outage.
Configure replication with minimal impact on performance
When configuring cross-region replication, consider network latency and write latency impact on your applications. Choose regions for the primary read-write and replica clusters that are geographically close to your users and ensure that your applications are optimized for eventual consistency.
Read scaling
Use cross-region replication to offload massive read operations from the primary cluster to a replica cluster. Offloading read operations to a replica cluster prevents overloading the primary cluster and ensures that the system can handle high read volumes efficiently.
Combined HA and DR strategy
Combine high availability (HA) for in-region availability with cross-region replication for disaster recovery (DR) and global read scalability. The combination of two provides 99.995% SLA. This approach delivers the best balance between local resilience and global redundancy, ensuring continuous availability and optimal performance for your applications.
Failover mode best practices
Azure DocumentDB supports three cross-region failover modes. Choose the mode that best matches your recovery objectives.
Enable service-managed failover for mission-critical workloads
For workloads that need automatic recovery from regional outages, enable service-managed failover on the primary cluster. The service detects regional outages and promotes the replica without operator intervention. Because the failover is unplanned, it might lose any writes that hadn't replicated to the secondary region when the outage began. Pair service-managed failover with in-region high availability to protect against both shard-level and region-level failures.
Use graceful promotion for planned region switches
When you can choose the timing—for example, during scheduled maintenance, a permanent region migration, or a disaster recovery drill—use graceful promotion. Graceful promotion waits for replication to drain before switching write roles, so the operation completes with zero data loss. Plan for a short write-availability pause while the replication queue drains.
Use forced promotion for full control
Use forced promotion when you need explicit control over the timing of an unplanned failover, such as when the primary region is unreachable and service-managed failover isn't enabled. Like service-managed failover, forced promotion might result in data loss because of replication lag.
Combine failover modes
Service-managed failover and graceful promotion aren't mutually exclusive. Enabling service-managed failover doesn't prevent you from triggering a graceful promotion for planned maintenance. Use service-managed failover as a safety net for outages and graceful promotion for everything you can schedule.
Summary of best practices
| Scenario | Recommendation |
|---|---|
| Production clusters | Enable high availability |
| Clusters requiring 99.99% SLA | Enable high availability |
| Clusters requiring 99.995% SLA | Enable high availability and create a replica cluster |
| Non-production clusters | Disable high availability to reduce costs |
| Automatic failover for shard failures | Enable high availability |
| Automatic failover for regional outages | Enable service-managed failover |
| Cross-region disaster recovery (DR) | Create a replica cluster |
| Planned region switch with zero data loss | Trigger a graceful promotion |
| Read scalability across multiple regions | Create a replica cluster |
By following these best practices, you can ensure that your Azure DocumentDB clusters remain highly available and resilient against failures and regional outages.