We have a route-based Azure VPN Gateway with NAT rules connected via IPsec/IKEv2 to a partner network (Ellkay). Traffic from partner → Azure works correctly. Traffic from Azure → partner fails for both TCP and ICMP, over the same IPsec SA, with our gateway-level packet capture showing ESP frames egressing to the partner peer but zero return ESP frames during the test window.
We have exhausted all client-side troubleshooting (NSG, route table, VM firewall, NAT rules, partner-side review) and would like Microsoft Networking to validate the VPN Gateway data plane and NAT-rule behavior in the egress direction.
Environment
Field****ValueSubscription IDPII removedResource GroupPII removedVPN Gateway PII removed (route-based, active-active)Gateway public IPsPII removed (PII removed) / PII removed(PII removed)VPN ConnectionPII removedLocal Network GatewayPII removed, peer IP 1``PII removedPartner address space IPsec policyIKEv2, AES256, SHA256, DH14, lifetime 28800 spolicyBasedTrafficSelectors``false (route-based, no TS restrictions on our side)Connection stateConnected (verified az network vpn-connection show)VPN Gateway NAT rules (linked to the connection)
RuleModeExternal****Internal EgressSnatxx.xx.xx.xx/32``xx.xx.xx.xx/32``nat-ingress-gpsl-orders-prodIngressSnatxx.xx.xx.xx/32``xx.xx.xx.xx/32Both rules are linked to vpnconn-ellkay-prod (verified via az network vpn-connection show → ingressNatRules / egressNatRules).
The 20.10.104.81 Public IP exists as a standalone Microsoft.Network/publicIPAddresses resource (PII removed, Standard SKU, unattached) — used purely as a NAT-mapping label.
Affected VM
Field****ValueVMPII removedPrivate IPxx.xx.xx.xxNSG (NIC)PII removed — verified outbound permits TCP/ICMP to xx.xx.xx.xx/32Effective route for xx.xx.xx.xx/32next-hop type VirtualNetworkGateway, state ActiveWindows FirewallAllow-Ellkay-Results-Outbound enabled, no Block rules enabledWorking flow — partner → Azure (Orders, TCP/8445)
- Partner
xx.xx.xx.xx → xx.xx.xx.xx
- IngressSnat rewrites to
xx.xx.xx.xx:8445
- TCP handshake succeeds, application traffic flows
- This proves the IPsec SA carries traffic for the selector pair
xx.xx.xx.xx ↔ xx.xx.xx.xx
Failing flow — Azure → partner (Results, TCP/16731 and ICMP)
From xx.xx.xx.xx (server: PII removed):
ICMP ping xx.xx.xx.xx (4 packets, 2026-05-26 18:39 UTC)
Packets: Sent = 4, Received = 0, Lost = 4 (100% loss)
TCP Test-NetConnection xx.xx.xx.xx -Port 16731 (×3)
18:39:42 UTC Open=False Elapsed=26.5s
18:40:08 UTC Open=False Elapsed=26.1s
18:40:34 UTC Open=False Elapsed=26.0s
Gateway packet-capture findings (taken earlier today)
We ran a gateway-level packet capture on PII removed via:
az network vnet-gateway packet-capture start ...
az network vnet-gateway packet-capture stop ... (we used the ARM REST API directly due to a SAS-URL ampersand-parsing issue
in the Azure CLI)
tshark protocol-hierarchy and conversation analysis of the captured .cap:
- ESP frames egress from gateway PIP to
xx.xx.xx.xx: present for the test window.
- ESP frames ingress to gateway PIP from
xx.xx.xx.xx during the same window: zero.
- IKE control plane: healthy, no rekey events during the test.
Connection byte counters (from az network vpn-connection show):
-
egressBytesTransferred increments steadily during outbound tests.
-
ingressBytesTransferred increments only at keepalive/DPD cadence (no per-test correlated jump).
What we have verified and ruled out
- ✅ NSG on the source NIC permits outbound TCP/ICMP to
xx.xx.xx.xx
- ✅ Effective route table sends
xx.xx.xx.xx/32 to VirtualNetworkGateway (Active)
- ✅ Windows Firewall on the VM has no Block rules; the Allow rule for outbound is enabled
- ✅ NAT rule mappings are symmetric and linked to the connection
- ✅ IPsec SA is up and carries inbound traffic for the same
xx.xx.xx.xx ↔ xx.xx.xx.xx pair (Orders works)
- ✅ Partner has performed internal review and stated no policy is blocking and that they "see ingress traffic but not egress"
- ❌ Both ICMP and TCP fail outbound from Azure — i.e., not a port- or listener-specific issue
Can you please check the below from Microsoft side?
- Egress data plane validation: Please verify on the gateway control/data planes that ESP frames generated for the SA with
xx.xx.xx.xx are leaving the gateway PIPs with the correct inner source (post-SNAT xx.xx.xx.xx) and that no Azure-side egress filter is dropping them.
- Return path: Please confirm whether the gateway is receiving any ESP frames from
xx.xx.xx.xx for this SA during a fresh test window (we will run one on request). If frames are received but not decapsulated/forwarded due to NAT or SA mismatch, please surface the reason from gateway logs.
- NAT-rule semantics for asymmetric flows: For a route-based gateway with paired ingress/egress SNAT rules, is there any known scenario in which the egress SNAT applies on outbound but the corresponding return-path translation fails to match — for example if the partner replies from a slightly different source IP, port, or selector than the egress rule expects?
- Active-active behavior: With two PIPs (
xx.xx.xx.xx, xx.xx.xx.xx), can return ESP from the partner arrive on PIP A while the SA state lives on the instance behind PIP B? If yes, is there mitigation guidance?
- Gateway health: Please confirm the gateway SKU, build, and instance health are nominal and there are no current incidents impacting NAT-rule processing in our region.
- Diagnostic logs: Please pull
GatewayDiagnosticLog, TunnelDiagnosticLog, RouteDiagnosticLog, and IKEDiagnosticLog for xx.xx.xx.xx covering the last 24 hours, focusing on the SA to peer xx.xx.xx.xx and on NAT rule xx.xx.xx.xx.
removed PII