Skip to content

Instantly share code, notes, and snippets.

@andaryjo
Last active July 14, 2023 10:05
Show Gist options
  • Save andaryjo/f019d889148dbebba38e8adcd2e95aa7 to your computer and use it in GitHub Desktop.
Save andaryjo/f019d889148dbebba38e8adcd2e95aa7 to your computer and use it in GitHub Desktop.
Azure Tales: Private Endpoints don't care about your feelings

Azure Tales: Private Endpoints don't care about your feelings

Recently, the team and I encountered weird networking behavior on Azure which just baffled us. We are developing a platform based on Azure's Hub & Spoke Network Topology reference architecture and are using the Azure Firewall as central routing component to route traffic from spoke to spoke. A simplified architecture diagram would look somewhat like this:

hs-architecture

With this networking setup, we were able to establish connectivity...

  • from the VM in the Hub to the VM in Spoke A
  • from the VM in Spoke B to both the Private Endpoint and the VM in Spoke A
  • from on-premises to both the Private Endpoint and the VM in Spoke A
  • but somehow not from the VM in the Hub to the Private Endpoint

Weird, huh?

We checked Firewall rules, UDRs in the route tables and NSGs, but everything was set up like it was supposed to. We even could see the traffic getting allowed in the Firewall logs. Still, we were not able to establish connectivity directly from the Hub to the Private Endpoint in the Spoke. Already at this point we assumed that this has to be some special behavior of Private Endpoints we do not know about.

Private Endpoint Network Policies

And we were right. We discovered this article and after initial troubles understanding it, we though that this must be the problem. Let me break it down:

When creating a Private Endpoint, Azure will automatically create /32 routes for this endpoint in the endpoint's vnet and all directly peered vnets. These routes are the most specific routes and thereby overwrite all user-defined routes. As a result, all traffic to the Private Endpoint will directly jump to the Private Endpoint and ignore any other appliances.

Azure then introduced a feature called "Private Endpoint Network Policies" for subnets, which allows you to invalidate these system-generated /32 routes, meaning traffic to your Private Endpoints placed in subnets where this feature is enabled, will now adhere to your UDRs. This feature is disabled per default on new subnets.

netpol-disabled

But wait, that's only for traffic to the Private Endpoint, not for the traffic back from a Private Endpoint, so how does this affect us? Why can we even see the requests to the Private Endpoint in the Firewall logs when apparently traffic should not get routed over the Firewall? Oh, contrary to the Azure docs, Terraform enables Private Endpoint Network Policies per default, nevermind then (azsh.it/89).

Private Endpoint response traffic

While these Private Endpoint Network Policies are important to ensure that traffic from directly peered vnets to Private Endpoints still goes over our Firewall, those were already enabled and not the cause of our networking problem. It had to have something to do with how the response traffic of Private Endpoints gets routed. Finding helpful documentation on this was not easy, but here you go:

User-defined routes (UDR) traffic is bypassed from private endpoints. User-defined routes can be used to override traffic destined for the private endpoint.

This statement is a bit vague, but translates for me to: Private Endpoints don't care whatsoever about what you want. (azsh.it/88) They will completely ignore any UDRs you defined in routing tables and bypass any Network virtual appliances (such as our Azure Firewall) and there is no way to change that behavior.

In our scenario, this means that traffic goes over the Firewall to the Private Endpoint, but the response traffic does not go over the Firewall but instead directly back to the source, resulting in asymetric routing and the source dropping the response packets.

netpol-enabled

But why does it work for us when trying to access from another spoke or from on-premises then? The short answer is: We don't know. The longer answer is: When the response traffic cannot directly reach the source, we suspect that somehow the UDRs then get respected, but we are not sure of the reason and were not able to find any documentation on it. If you were to know more about this behavior, please let me know.

A mitigation for this could be to introduce SNAT in the Azure Firewall, which would replace the source IP with an IP of the Firewall and force response traffic to go back over the Firewall. But we did not test this and did not consider SNAT to be a viable solution due to the limitations of concurrent connections.

What do we do now?

We now learned:

  1. Private Endpoint Network Policies must be enabled on a Private Endpoint's subnet to force traffic to the Private Endpoint from the same or directly peered vnets to respect UDRs.
  2. Private Endpoint response traffic to the same or a directly peered vnet will ignore any UDRs and alway bypass any NVAs.

The simplest solution for us was in this case to simply move the resources from the Hub vnet to another, only indirectly peered vnet.

We still wanted to ensure that Private Endpoint Network Policies are enabled for all subnets to not overload our route tables with system-generated routes and are using an Azure policy for that (we do not actually know whether system-generated routes are counted against the route table routes limit). Be careful when using deny effect policies though, the Azure Portal does not support nested creation of subnets with Private Endpoint Network Policies enabled and the policy might break some deployment flows in the Azure Portal (but that's an Azure tale for a different time).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment