kasobol-msft/Reactor4.md

## Reactor4.md

      
    Raw
  

              Reactor4.md
            
          
    Introduction

Project Reactor is working on a new major version with tentative plan to baseline on JDK17 (following Spring's and Netty's decisions)  as well as evolve APIs in a breaking change manner (since new baseline asks for new major revision anyway). This document is attempting to assess impact on Azure SDK for Java as well as propose couple of ways SDK could be solving this (and upcoming) migration.
Reactor usage in Azure SDK for Java

Azure SDK for Java uses Reactor to

Implement internals of asynchronous clients. Reactor helps building complex reactive streams that would otherwise require equivalent in-house framework (which is not a trivial effort)
Utilize Reactor types like Mono and Flux to express return types of public asynchronous APIs. There are also derived types like ContinuablePagedFlux or PollerFlux (not an exclusive list).

Internal usage isn't much of a problem assuming that tests can catch and fix potential runtime breaks (these happen anyway from time to time with minor Reactor updates given their deprecation strategy).
However, the presence of Reactor types on public API surface means that we will inherently have to break APIs when we decide to fully move to Reactor 4. Sooner or later Reactor 3 line is going to reach end of its life.
Immediate future

Reactor acknowledged that new JDK baseline and major version are going to be disruptive for the community. Therefore they tentatively decided to mark one of the 3.x lines as LTS and maintain it for couple of years (how much remains unknown at this time). Additionally, an effort to facilitate 3 and 4 side by side usage is being explored. For us it means that

We can remain on Reactor 3 line until it reaches end of life for both public and internal purposes.
As long as Reactor 4 implements Reactive Streams or JDK's Flow interfaces, absorbing Reactor 3 types into Reactor 4 chains is doable, i.e. user won't be blocked in interop scenarios. Request has been made to Reactor to consider creating and adapter package as well as automagically adapt for Spring 6 users - these efforts may improve the experience.
Spring 6 baselining on JDK17 isn't a big of a problem. As long as SDK can run on JDK17 we're good (minus Spring related packages/integrations, these will have to fork earlier).
Netty setting a new baseline is bigger concern than Spring if we choose to upgrade it. See here, here and here.

Long term strategies

In this section we list possible efforts Azure SDK for Java can make to facilitate the transition when Reactor 3 reaches end of life (or sooner) as well as explore options to not run into similar problem when Reactor 5 appears on the horizon or reduce the blast radious.
Internal usage

Assuming that we can always bridge between internal usage and types exposed on the public surface we can explore the following options.
Keep Reactor 3 as dependency until end of life

Do nothing immediately and wait until necessary.
Pros:

Low cost in immediate future.

Cons:

We won't be getting any performance upgrades from Reactor 4.
Scouting Reactor 4 implications for internal usage requires forking codebase (or sitting on a potentially ticking exposive).

Allow both Reactor 3 and 4

Allow usage of both via plugin mechanism.
Pros:

Reactor 4 gets tested eagerly for internal purposes.
Customers get latest Reactor 4 perf updates (if they exist).
Customers can have one Reactor on classpath (should their policy require that).
Solves the JDK baselining problem by giving options.

Cons:

Cost. It requires buidling an abstraction layer on the top of reactor for internal purposes (on the positive, it might be used to build up sync stack without code duplication some day).

Shade Reactor

Pros:

It makes Reactor disapper from dependency graph.
We gain full control over SDK internals, i.e. decouple from Reactor customer is using.

Cons:

Maintaining shaded version of Reactor is constly (e.g. when security patches are required).
This doesn't solve new JDK baseline in Reactor project. (unless we join this with abstraction layer mentioned above and also shading Reactor 4, but that's more and more cost).

Build our own framework

Pros:

We control our destiny.

Cons:

Cost. Building reactive framework is hard (see requirements).
Reinventing the wheel. (Shading existing framework seems to be better option).

Public surface

We assume that changes discussed here are going to be part of upcoming major revision of the SDKs. The goal here is to explore what to do with reactive types present on the public surface. Any other changes that might be part of major revision are not in scope of this document.
Keep Reactor types

Pros:

Easy upgrade experience for Reactor users.
Users who match Reactor version may keep fusion benefits (not sure if Reactor 3 and 4 would be able to fuse, introduction of custom abstraction layers might render fusion not working).

Cons:

SDK major revision cycle remains coupled to Reactor.
SDK keeps inheriting Reactor's deprecations and breaking changes that might not be aligned with SDK's deprecation policy (e.g. Reactor deprecates and removes APIs between minor version upgrades).
SDK has to maintain a fork of the codebase for each Reactor release or find a way to conditionally compile the codebase.
SDK has to come up with deprecation strategy for SDK lines based on older Reactor and go through that cycle every time new Reactor version releases (so that we don't get overwhelmed with number of active forks).
Testing infrastructure requires investment to allow multi-reactor testing.

Expose JDK or Reactive Streams types

We can attempt to use JDK types like CompletableFuture, Flow (JDK9+) or Reactive Streams (JDK8+) to express what our async APIs return.
We could either use Flow/Reactive Streams alone and rely on documentation to describe how many items API can publish or use CompletableFuture for single emissions and Flow/Reactive Streams for streams of items.
The following example shows the latter option (as it seems to be closer to expressivenes we have today).
The existing
public final class BlobContainerAsyncClient {
    // PagedFlux extends Flux
    public PagedFlux<BlobItem> listBlobs();
    public Mono<Boolean> exists();
}
becomes
public final class BlobContainerAsyncClient {
    // PagedPublisher implements Flow.Publisher
    public PagedPublisher<BlobItem> listBlobs();
    public CompletableFuture<Boolean> exists();
}
Pros:

Public API surface doesn't depend on Reactor.
In best case we'd depend solely on JDK interfaces.
Synchronous customers won't be affected by this change.
Non-Reactor async users have to adapt Reactor types anyway, so their experience shouldn't change much.
Major reactive frameworks already have adapters to consume JDK and Reactive Streams async types.
Easy to expose internally used types, i.e. bridge from internally used Reactor version to these types.

Cons:

Reactor users experience degrades, i.e. they'll have to use adapters, some optimizations may not work (operator fusion).
JDK don't provide rich reactive experience (i.e. they have to be adapted to become useful, but adapters exist and are easy to use).
CompletableFuture is a close replacement for Mono. However, the API becomes "eager", i.e. the transaction starts the moment API is called not when it's subscribed to. Making it lazy again in a reactive chain requires concious choice of adapter used.
Flow/Reactive Streams interfaces alone (if we don't use CompletableFuture) are not descriptive enough for APIs we have.
Enriching return types isn't possible (i.e. we won't add extra functionality on the top of base interfaces without inventing our own types).

Create our own reactive types

We can build our own types to express the result of the asynchronous operation. These types should build on the top of Flow and/or Reactive Streams(depending which JDK becomes a baseline). This approach is somewhat similar to using raw JDK types, however it solves the communication of cardinality of the result without making compromises CompletableFuture does and it gives extension point to bring more APIs on the top of raw JDK types.
The existing
public final class BlobContainerAsyncClient {
    // PagedFlux extends Flux
    public PagedFlux<BlobItem> listBlobs();
    public Mono<Boolean> exists();
}
becomes
public final class BlobContainerAsyncClient {
    public PagedMultiPublisher<BlobItem> listBlobs();
    public SinglePublisher<Boolean> exists();
}
where
public interface PagedMultiPublisher<T, P extends PagedResponse<T>> extends MultiPublisher<T> {
    MultiPublisher<P> byPage();
}

public interface MultiPublisher<T> extends Flow.Publisher<T> {
    Stream<T> toStream();
    // and other extensions on the top of Flow we think are worth it. 
}

public interface SinglePublisher<T> extends Flow.Publisher<T> {
    T block();
    // and other extensions on the top of Flow we think are worth it. 
}
Pros:

Public API surface doesn't depend on Reactor.
In best case we'd depend solely on JDK interfaces.
Synchronous customers won't be affected by this change.
Non-Reactor async users have to adapt Reactor types anyway, so their experience shouldn't change much.
Major reactive frameworks already have adapters to consume JDK and Reactive Streams async types.
Cardinality of the API results is described by coherent set of types.
APIs remain fully reactive and lazy (i.e. no need for CompletableFuture).

Cons:

Reactor users experience degrades, i.e. they'll have to use adapters, some optimizations may not work (operator fusion).
Necesity for good naming of new abstractions (azure.Mono, AzureMono, SinglePublisher ?) and potential confusion for major framework users (unless we do a good job here).
Bridging between abstraction and internal usage of reactor becomes more complicated (depending on amout of extra functionality that needs to be exposed).

Split SDKs into sync, async and reactive libraries

This option is a bit of a revolution. However, it also attempts to solve few other problems that customers are facing, e.g.:

Sync-over-async is constant source of pain for customers with high throughput services. It's hard to debug and configure such service. E.g. this issue, this issue, OOMs stalling JVM. See also "No Such Thing as a Free Lunch" here.
Implementing more sophisticated synchornous patterns on the top of reactive stack is challanging and error prone. See here or here.
Synchronous users get exposed to unnecessary dependencies, i.e. one does not need Reactor to make a synchronous REST call.
Depending on HttpClient implementation we end up doing sync-over-async-over-sync (if we use OkHttp).

Therefore, why don't we go one step further and reduce blast radious of reactor in such a way that:

We build truely synchronous stack from top to the HttpClient layer.
We build asynchronous stack based on JDK's CompletableFuture for users who don't want to go reactive (this is less important).
We do sync-over-async or async-over-sync at the HttpClient layer (or transport layer in general for non-http protocols) depending on capabilities of the HttpClient.
We default to HttpClient (transport) implementation that's right for the relevant stack.
Establish different versioning policy for reactive package, i.e. 12.2-reactor-3.4 (similarly to how Spock versions Groovy support).

It could roughly look like this.
                      ┌─────────────┐
                      │Azure Storage│
          ┌──────────►│ Blobs Models◄─────────────┐
          │           └───────▲─────┘             │
          │                   │                   │
          │                   │                   │
          │                   │                   │
    ┌─────┴───────┐     ┌─────┴───────┐    ┌──────┴───────┐
    │Azure Storage│     │Azure Storage│    │Azure Storage │
    │   Blobs     │     │ Blobs Async │    │Blobs Reactive│
    └────┬────────┘     └─────┬───────┘    └────┬─────────┘
         │                    │                 │
         │                    │                 │
         │                    │                 │
    ┌────▼─────┐        ┌─────▼────┐       ┌────▼─────┐
    │Azure Core│        │Azure Core│       │Azure Core│
┌───┤   Sync   │        │ Async    │   ┌───┤ Reactive ├──┐
│   └────┬─────┘        └┬───────┬─┘   │   └────┬─────┘  │
│        │               │       │     │        │        │
│        │               │       │     │        │        │
│        │       ┌───────▼──┐    │     │        │        │
│        │       │Azure Core│    │     │   ┌────▼──┐     │
│        └───────►  Common  ◄────┼─────┘   │Reactor│     │
│                └──────────┘    │         └───────┘     │
│                                │                       │
│   ┌────────┐                  ┌▼────┐                  │
└───► OkHttp │                  │Netty◄──────────────────┘
    └────────┘                  └─────┘

Pros:

Decouples reactive users from the rest of the population.
Gives opportunity to address sync-over-async issues.
Gives opportunity to establish separate reactive versioning model.

Cons:

Costly.
Requires either to duplicate functional code or invention of a framework that can abstract sync and async stack.
Multiplies number of packages and disrupts existing naming.

Other ideas

Other ideas that are either crazy or require very long timeline to happen (i.e. Reactor 3 might be dead before they happen).

Assume Reactor 3 maintainer role to make it alive longer (costly, might not be in line with Reactor community plans)
Design and implement rich reactive interfaces for JDK (long process).

Plan

Updated 12/14/2021
During a group discussion we decided to do the following in near term future.

Stick to Reactor 3 both internally and externally.
Keep current package structure.
Explore ways to introduce synchronous stack.