Some notes on Azure serverless functions and related features/patterns/etc
- https://learn.microsoft.com/en-us/azure/azure-functions/security-concepts
-
Securing Azure Functions
-
- https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-http-webhook-trigger#working-with-client-identities
-
Working with client identities If your function app is using App Service Authentication / Authorization, you can view information about authenticated clients from your code. This information is available as request headers injected by the platform.
You can also read this information from binding data. This capability is only available to the Functions runtime in 2.x and higher. It is also currently only available for .NET languages.
The authenticated user is available via HTTP Headers.
- https://learn.microsoft.com/en-us/azure/app-service/overview-authentication-authorization
-
Authentication and authorization in Azure App Service and Azure Functions
-
Azure App Service provides built-in authentication and authorization capabilities (sometimes referred to as "Easy Auth"), so you can sign in users and access data by writing minimal or no code in your web app, RESTful API, and mobile back end, and also Azure Functions. This article describes how App Service helps simplify authentication and authorization for your app.
-
- https://learn.microsoft.com/en-us/azure/app-service/overview-authentication-authorization
-
Related docs/etc:
- https://learn.microsoft.com/en-us/azure/azure-functions/durable/
-
Azure Durable Functions documentation Durable Functions is an extension of Azure Functions that lets you write stateful functions in a serverless compute environment.
-
- https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview
-
What are Durable Functions?
-
Durable Functions is a feature of Azure Functions that lets you write stateful functions in a serverless compute environment. The extension lets you define stateful workflows by writing orchestrator functions and stateful entities by writing entity functions using the Azure Functions programming model. Behind the scenes, the extension manages state, checkpoints, and restarts for you, allowing you to focus on your business logic.
- https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview#monitoring
-
Pattern #4: Monitor
-
The monitor pattern refers to a flexible, recurring process in a workflow. An example is polling until specific conditions are met. You can use a regular timer trigger to address a basic scenario, such as a periodic cleanup job, but its interval is static and managing instance lifetimes becomes complex. You can use Durable Functions to create flexible recurrence intervals, manage task lifetimes, and create multiple monitor processes from a single orchestration.
- https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-timer
-
Timer trigger for Azure Functions
-
- https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-timer
-
The monitors can end execution when a condition is met, or another function can use the durable orchestration client to terminate the monitors. You can change a monitor's
wait
interval based on a specific condition (for example, exponential backoff.)- https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-bindings#orchestration-client
-
Orchestration client The orchestration client binding enables you to write functions that interact with orchestrator functions. These functions are often referred to as client functions. For example, you can act on orchestration instances in the following ways:
- Start them
- Query their status.
- Terminate them.
- Send events to them while they're running.
- Purge instance history.
You can bind to the orchestration client by using the
DurableClientAttribute
attribute
-
- https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-bindings#orchestration-client
-
- https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview#human
-
Pattern #5: Human interaction
-
Many automated processes involve some kind of human interaction. Involving humans in an automated process is tricky because people aren't as highly available and as responsive as cloud services. An automated process might allow for this interaction by using timeouts and compensation logic.
An approval process is an example of a business process that involves human interaction. Approval from a manager might be required for an expense report that exceeds a certain dollar amount. If the manager doesn't approve the expense report within 72 hours (might be the manager went on vacation), an escalation process kicks in to get the approval from someone else (perhaps the manager's manager).
-
You can implement the pattern in this example by using an orchestrator function. The orchestrator uses a durable timer to request approval. The orchestrator escalates if timeout occurs. The orchestrator waits for an external event, such as a notification that's generated by a human interaction.
- https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-timers
-
Timers in Durable Functions (Azure Functions)
-
- https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-external-events
-
Handling external events in Durable Functions (Azure Functions)
-
- https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-timers
-
Note: There is no charge for time spent waiting for external events when running in the Consumption plan.
-
- https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview#aggregator
-
Pattern #6: Aggregator (stateful entities) The sixth pattern is about aggregating event data over a period of time into a single, addressable entity. In this pattern, the data being aggregated might come from multiple sources, might be delivered in batches, or might be scattered over long-periods of time. The aggregator might need to take action on event data as it arrives, and external clients might need to query the aggregated data.
-
You can use Durable entities to easily implement this pattern as a single function.
- https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-entities
-
Entity functions Entity functions define operations for reading and updating small pieces of state, known as durable entities. Like orchestrator functions, entity functions are functions with a special trigger type, the entity trigger. Unlike orchestrator functions, entity functions manage the state of an entity explicitly, rather than implicitly representing state via control flow. Entities provide a means for scaling out applications by distributing the work across many entities, each with a modestly sized state.
-
- https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-entities
-
Clients can enqueue operations for (also known as "signaling") an entity function using the entity client binding.
- https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-bindings#entity-client
-
Entity client The entity client binding enables you to asynchronously trigger entity functions. These functions are sometimes referred to as client functions.
-
- https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-bindings#entity-client
-
- https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview#the-technology
-
The technology Behind the scenes, the Durable Functions extension is built on top of the Durable Task Framework, an open-source library on GitHub that's used to build workflows in code. Like Azure Functions is the serverless evolution of Azure WebJobs, Durable Functions is the serverless evolution of the Durable Task Framework. Microsoft and other organizations use the Durable Task Framework extensively to automate mission-critical processes. It's a natural fit for the serverless Azure Functions environment.
- https://github.com/Azure/durabletask
-
Durable Task Framework The Durable Task Framework (DTFx) is a library that allows users to write long running persistent workflows (referred to as orchestrations) in C# using simple async/await coding constructs. It is used heavily within various teams at Microsoft to reliably orchestrate long running provisioning, monitoring, and management operations. The orchestrations scale out linearly by simply adding more worker machines. This framework is also used to power the serverless Durable Functions extension of Azure Functions.
-
-
- https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview#code-constraints
-
Code constraints In order to provide reliable and long-running execution guarantees, orchestrator functions have a set of coding rules that must be followed. For more information, see the Orchestrator function code constraints article.
- https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-code-constraints
-
Orchestrator function code constraints
-
Orchestrator functions use event sourcing to ensure reliable execution and to maintain local variable state. The replay behavior of orchestrator code creates constraints on the type of code that you can write in an orchestrator function. For example, orchestrator functions must be deterministic: an orchestrator function will be replayed multiple times, and it must produce the same result each time.
-
Orchestrator functions can call any API in their target languages. However, it's important that orchestrator functions call only deterministic APIs. A deterministic API is an API that always returns the same value given the same input, no matter when or how often it's called.
-
These restrictions apply only to orchestrator functions. Other function types don't have such restrictions.
-
An orchestrator function must not use any bindings, including even the orchestration client and entity client bindings. Always use input and output bindings from within a client or activity function. This is important because orchestrator functions may be replayed multiple times, causing nondeterministic and duplicate I/O with external systems.
-
Use activity functions to make outbound network calls. If you need to make an HTTP call from your orchestrator function, you also can use the durable HTTP APIs.
-
Blocking APIs like "sleep" can cause performance and scale problems for orchestrator functions and should be avoided. In the Azure Functions Consumption plan, they can even result in unnecessary execution time charges. Use alternatives to blocking APIs when they're available. For example, use Durable timers to create delays that are safe for replay and don't count towards the execution time of an orchestrator function.
-
Orchestrator code must never start any async operation except those defined by the orchestration trigger's context object. For example, never use Task.Run, Task.Delay, and HttpClient.SendAsync in .NET or setTimeout and setInterval in JavaScript. An orchestrator function should only schedule async work using Durable SDK APIs, like scheduling activity functions. Any other type of async invocations should be done inside activity functions.
-
Always declare JavaScript orchestrator functions as synchronous generator functions. You must not declare JavaScript orchestrator functions as async because the Node.js runtime doesn't guarantee that asynchronous functions are deterministic.
-
You must not declare Python orchestrator functions as coroutines. In other words, never declare Python orchestrator functions with the async keyword because coroutine semantics do not align with the Durable Functions replay model. You must always declare Python orchestrator functions as generators, meaning that you should expect the context API to use yield instead of await.
-
A durable orchestration might run continuously for days, months, years, or even eternally. Any code updates made to Durable Functions apps that affect unfinished orchestrations might break the orchestrations' replay behavior. That's why it's important to plan carefully when making updates to code.
- https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-code-constraints?#durable-tasks
-
Durable tasks
-
Note: This section describes internal implementation details of the Durable Task Framework. You can use durable functions without knowing this information. It is intended only to help you understand the replay behavior.
-
Tasks that can safely wait in orchestrator functions are occasionally referred to as durable tasks. The Durable Task Framework creates and manages these tasks. Examples are the tasks returned by
CallActivityAsync
,WaitForExternalEvent
, andCreateTimer
in .NET orchestrator functions.
-
-
- https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-code-constraints
-
-
- https://learn.microsoft.com/en-us/azure/architecture/patterns/rate-limiting-pattern
-
Rate Limiting pattern
- I skimmed through this, and while it talks about the right sort of things at a high level, it doesn't necessarily go into a great lot of useful detail; and mostly seems to cover things that were already sort of known/obvious.
-
Implementations of this pattern are available in different programming languages
- https://github.com/mspnp/go-batcher
- https://github.com/Azure-Samples/java-rate-limiting-pattern-sample
- https://github.com/Azure-Samples/java-rate-limiting-pattern-sample#rate-limiting-implementation
-
Rate Limiting Implementation This is achieved by implementing a Distributed Lock using Redis and Spring Integration.
- 'Distributed Lock' sounds like a good keyword... I wonder what that gets us back in Azure / Durable Functions area of things? 🤔
- https://learn.microsoft.com/en-us/answers/questions/1182051/can-we-prevent-2-azure-functions-from-running-at-t
- From a quick skim, this seems to suggest using Durable Functions, specifically Orchestration functions; and a semaphore for signalling
- It also mentions the 'singleton' attribute that can be applied to a function, which uses a distributed lock mechanism
- https://stackoverflow.com/questions/53510916/does-azure-have-an-out-of-box-way-of-distributed-locking-with-a-key
- This mentions using Azure Storage lease blobs as a way of implementing a distributed locking mechanism
- https://medium.com/@fbeltrao/distributed-locking-in-azure-functions-bc4517c0306c
- This is another method that makes use of Azure Storage as a locking mechanism, using blob leases
- https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-entities
-
Entity functions
- https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-entities#example-orchestration-signals-and-calls-an-entity
-
Example: Orchestration signals and calls an entity Orchestrator functions can access entities by using APIs on the orchestration trigger binding.
-
Note: Calling an entity from an orchestrator function is similar to calling an activity function from an orchestrator function. The main difference is that entity functions are durable objects with an address, which is the entity ID. Entity functions support specifying an operation name. Activity functions, on the other hand, are stateless and don't have the concept of operations.
-
- https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-entities#entity-coordination
-
Entity coordination There might be times when you need to coordinate operations across multiple entities. For example, in a banking application, you might have entities that represent individual bank accounts. When you transfer funds from one account to another, you must ensure that the source account has sufficient funds. You also must ensure that updates to both the source and destination accounts are done in a transactionally consistent way.
-
Coordinating entity updates requires using the
LockAsync
method to create a critical section in the orchestration. -
The
LockAsync
method locked both the source and destination account entities. This locking ensured that no other client could query or modify the state of either account until the orchestration logic exited the critical section at the end of the using statement. This behavior prevents the possibility of overdrafting from the source account.
-
- https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-entities#critical-section-behavior
-
Critical section behavior The
LockAsync
method creates a critical section in an orchestration. These critical sections prevent other orchestrations from making overlapping changes to a specified set of entities. Internally, theLockAsync
API sends "lock" operations to the entities and returns when it receives a "lock acquired" response message from each of these same entities. Both lock and unlock are built-in operations supported by all entities.No operations from other clients are allowed on an entity while it's in a locked state. This behavior ensures that only one orchestration instance can lock an entity at a time. If a caller tries to invoke an operation on an entity while it's locked by an orchestration, that operation is placed in a pending operation queue. No pending operations are processed until after the holding orchestration releases its lock.
-
Locks on entities are durable, so they persist even if the executing process is recycled. Locks are internally persisted as part of an entity's durable state.
-
-
- https://learn.microsoft.com/en-us/answers/questions/1182051/can-we-prevent-2-azure-functions-from-running-at-t
- 'Distributed Lock' sounds like a good keyword... I wonder what that gets us back in Azure / Durable Functions area of things? 🤔
-
- https://github.com/Azure-Samples/java-rate-limiting-pattern-sample#rate-limiting-implementation
-
- https://learn.microsoft.com/en-us/azure/architecture/patterns/throttling
-
Throttling pattern
- As above; I skimmed through this, and while it talks about the right sort of things at a high level, it doesn't necessarily go into a great lot of useful detail
-
Queue-based Load Leveling pattern. Queue-based load leveling is a commonly used mechanism for implementing throttling. A queue can act as a buffer that helps to even out the rate at which requests sent by an application are delivered to a service.
- https://learn.microsoft.com/en-us/azure/architecture/patterns/queue-based-load-leveling
-
Queue-Based Load Leveling pattern Use a queue that acts as a buffer between a task and a service it invokes in order to smooth intermittent heavy loads that can cause the service to fail or the task to time out. This can help to minimize the impact of peaks in demand on availability and responsiveness for both the task and the service.
-
- https://learn.microsoft.com/en-us/azure/architecture/patterns/queue-based-load-leveling
-
- https://medium.com/microsoftazure/azure-functions-limiting-throughput-and-scalability-of-a-serverless-app-5b1c381491e3
-
Azure functions. Limiting throughput and scalability of a serverless app
-
TL;DR. You will get info on how to limit incoming incoming HTTP traffic to Serverless API and messages stream from Azure IoT Hub. And how to enforce restriction to Azure Functions scale-out and why singleton attribute might be needed.
- Skimmed through this, while some of the ideas are sort of ok, most of them I didn't really like the sound of. Probably not a good solution, but kept here as reference in case.
-
- https://learn.microsoft.com/en-us/azure/architecture/patterns/retry
-
Retry pattern Enable an application to handle transient failures when it tries to connect to a service or network resource, by transparently retrying a failed operation. This can improve the stability of the application.
-
Retry after delay. If the fault is caused by one of the more commonplace connectivity or busy failures, the network or service might need a short period while the connectivity issues are corrected or the backlog of work is cleared. The application should wait for a suitable time before retrying the request.
-
If the request still fails, the application can wait and make another attempt. If necessary, this process can be repeated with increasing delays between retry attempts, until some maximum number of requests have been attempted. The delay can be increased incrementally or exponentially, depending on the type of failure and the probability that it'll be corrected during this time.
-
If a request still fails after a significant number of retries, it's better for the application to prevent further requests going to the same resource and simply report a failure immediately. When the period expires, the application can tentatively allow one or more requests through to see whether they're successful. For more details of this strategy, see the Circuit Breaker pattern.
- https://learn.microsoft.com/en-us/azure/architecture/patterns/circuit-breaker
-
Circuit Breaker pattern Handle faults that might take a variable amount of time to recover from, when connecting to a remote service or resource. This can improve the stability and resiliency of an application.
-
A circuit breaker acts as a proxy for operations that might fail. The proxy should monitor the number of recent failures that have occurred, and use this information to decide whether to allow the operation to proceed, or simply return an exception immediately.
-
The proxy can be implemented as a state machine with the following states that mimic the functionality of an electrical circuit breaker:
Closed: The request from the application is routed to the operation. The proxy maintains a count of the number of recent failures, and if the call to the operation is unsuccessful the proxy increments this count. If the number of recent failures exceeds a specified threshold within a given time period, the proxy is placed into the Open state. At this point the proxy starts a timeout timer, and when this timer expires the proxy is placed into the Half-Open state.
The purpose of the timeout timer is to give the system time to fix the problem that caused the failure before allowing the application to try to perform the operation again.
Open: The request from the application fails immediately and an exception is returned to the application.
Half-Open: A limited number of requests from the application are allowed to pass through and invoke the operation. If these requests are successful, it's assumed that the fault that was previously causing the failure has been fixed and the circuit breaker switches to the Closed state (the failure counter is reset). If any request fails, the circuit breaker assumes that the fault is still present so it reverts to the Open state and restarts the timeout timer to give the system a further period of time to recover from the failure.
-
- https://learn.microsoft.com/en-us/azure/architecture/patterns/circuit-breaker
-
- https://learn.microsoft.com/en-us/azure/architecture/patterns/competing-consumers
-
Competing Consumers pattern Enable multiple concurrent consumers to process messages received on the same messaging channel. With multiple concurrent consumers, a system can process multiple messages concurrently to optimize throughput, to improve scalability and availability, and to balance the workload.
-
- https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-error-pages
-
Azure Functions error handling and retries
⚠️ TODO: go through these docs + add relevant snippets here; particularly with regards to how azure functions + queues handle error states, retries, retry delays, etc
-
Some other notes:
- https://github.com/mjpieters/aiolimiter
- https://aiolimiter.readthedocs.io/en/latest/
-
aiolimiter
-
An efficient implementation of a rate limiter for asyncio.
This project implements the Leaky bucket algorithm, giving you precise control over the rate a code section can be entered.
- https://en.wikipedia.org/wiki/Leaky_bucket
-
The leaky bucket is an algorithm based on an analogy of how a bucket with a constant leak will overflow if either the average rate at which water is poured in exceeds the rate at which the bucket leaks or if more water than the capacity of the bucket is poured in all at once. It can be used to determine whether some sequence of discrete events conforms to defined limits on their average and peak rates or frequencies, e.g. to limit the actions associated with these events to these rates or delay them until they do conform to the rates. It may also be used to check conformance or limit to an average rate alone, i.e. remove any variation from the average.
-
- https://en.wikipedia.org/wiki/Leaky_bucket
-
- https://aiolimiter.readthedocs.io/en/latest/