RikkiGibson/interceptors.md

## interceptors.md

      
    Raw
  

              interceptors.md
            
          
    Interceptors


 Proposed
 Prototype: Complete
 Implementation: Not Started
 Specification: Not Started

Summary

An interceptor is a method which can declaratively substitute a call to itself instead of a call to an interceptable method at compile time. This substitution occurs by having the interceptor declare the source locations of the calls that it intercepts.
using System;
using System.Runtime.CompilerServices;

var c = new C();
c.InterceptableMethod(1); // prints `interceptor 1`
c.InterceptableMethod(1); // prints `other interceptor 1`
c.InterceptableMethod(1); // prints `interceptable 1`

class C
{
    [Interceptable]
    public void InterceptableMethod(int param)
    {
        Console.WriteLine($"interceptable {param}");
    }
}

// generated code
static class D
{
    // 'position' is a number of characters from the start of the file
    [InterceptsLocation("Program.cs", position: 73)]
    public static void InterceptorMethod(this C c, int param)
    {
        Console.WriteLine($"interceptor {param}");
    }

    [InterceptsLocation("Program.cs", position: 125)]
    public static void OtherInterceptorMethod(this C c, int param)
    {
        Console.WriteLine($"other interceptor {param}");
    }
}
Motivation

Many general-purpose library methods work by reflecting on the incoming arguments. This presents problems in AOT (ahead-of-time) compilation scenarios, where runtime code generation is limited or impossible. With this proposal, libraries can instead ship source generators which add interceptors to the compilation. This permits each call site of the original symbol to be specialized based on any information known at compile time.
Some use cases for the feature include:

Regex, where calls like Regex.IsMatch(@"a+b+") can be intercepted to use a statically-generated matcher when the pattern is constant.
ASP.NET Minimal API, where calls like app.MapGet("/products", handler: (int? page, int? pageLength, MyDb db) => { ... }) can be intercepted to register a statically-generated thunk which calls the user's handler.
Vectorization, where user code oriented around constructs like foreach can be rewritten to check for and use relevant intrinsics at runtime, and fall back to user code if those intrinsics aren't available.
Dependency injection, where provider.Register<MyService>() can be intercepted to provide an implementation which resolves the dependency graph statically.
Query providers could translate expression trees to another language (e.g. SQL) at compile time.

What we essentially want is to provide a limited and traceable facility for source generators to modify existing code. We still think it's important that this works by adding source code to the compilation, and not by exposing source generator APIs that modify the object model of the compilation in any way. It is a goal that you can build a project with source generators, write all the generator outputs to disk, delete the generators, and then build again and have the same behavior.
Detailed design

InterceptableAttribute

A method must indicate that its calls can be intercepted by including [Interceptable] on its declaration.
namespace System.Runtime.CompilerServices
{
    [AttributeUsage(AttributeTargets.Method)]
    public sealed class InterceptableAttribute : Attribute { }
}
The point of this is to make it clear to users that calls they make to the method may be intercepted. We consider it risky to allow any calls to any method from anywhere to be intercepted, if the method author doesn't intend for the calls to be interceptable.
See Alternatives for further discussion.
InterceptsLocationAttribute

An interceptor indicates which call it intercepts by including their source locations in the [InterceptsLocation] attribute.
namespace System.Runtime.CompilerServices
{
    [AttributeUsage(AttributeTargets.Method, AllowMultiple = true)]
    public sealed class InterceptsLocationAttribute(string filePath, int position) : Attribute
    {
    }
}
The location of the call is the location of the name syntax which denotes the interceptable method. For example, in app.MapGet(...), the name syntax for MapGet would be considered the location of the call. If we allow intercepting calls to property accessors in the future (e.g obj.Property), we would also be able to use the name syntax in this way.
/pathmap is respected when determining the location of the call. The purpose of /pathmap is to provide consistent build behavior in different environments. We want only mapped paths (i.e. normalized paths, paths which lack machine-specific base path information) to appear in source, in order to reduce the need for source code differences across build environments. File paths must exactly match the paths on the syntax nodes by ordinal comparison.
The goal of the above decisions is to make it so that when source generators are filling in [InterceptsLocation(...)], they simply need to read nameSyntax.SyntaxTree.FilePath and nameSyntax.GetLocation().SourceSpan.Start for the exact file path and position information they need to use.
See Alternatives for further discussion.
Interceptable and non-interceptable usages

Conversion to delegate type, address-of, etc. usages of methods cannot be intercepted. We want to start with this limitation because in order to specialize, we generally need to know things like the arguments to a method. The user can't usefully specialize a usage which is simply a conversion to delegate.
Interception can only occur for calls to ordinary member methods--not constructors, delegates, properties, local functions, etc. Some of these could probably be relaxed in the future.
Signature matching

An interceptor method's signature is required to match the signature of the method it intercepts. This matching requirement resembles the requirement for delegate compatibility §19.4.
The goals of this matching requirement are:

Make interception a clean subsitution as much as possible, minimizing ripple effects where argument side-effects or downstream code semantics can change deeply because of signature differences between an interceptable and interceptor method.
Enable intercepting calls where the receiver type is not owned by the current compilation. To do this, we want to let an extension method be a valid interceptor for an instance method.

The following sample roughly reflects the requirements we impose on the relationship between the signatures of the interceptor and intercepted methods, when the receiver is an expression (including an implicit this):
void Check(TReceiver receiver)
{
    // There must exist delegate types D1 and D2 where the below sequence of conversions are legal.
    D1 d1 = receiver.Interceptable;
    D2 d2 = receiver.Interceptor;
    d1 = new D1(d2);
}
When the interceptable method is static and not an extension being called in reduced form, a similar logic applies:
void Check()
{
    D1 d1 = InterceptableContainingType.Interceptable;
    D2 d2 = InterceptorContainingType.Interceptor;
    d1 = new D1(d2);
}
The above is provided in lieu of a formal specification for the time being to enable us to focus on intent regarding the signature requirements.
There is no requirement that all [InterceptsLocation] attributes on an interceptor refer to the same interceptable method. In other words, a single interceptor method can intercept calls of more than one method. Because of this, the signature matching requirements are enforced at each interception site rather than at the declaration site of the interceptor.
Additionally, at an intercepted call site:

the language will preserve conversions of the arguments to the interceptable method's parameter types. Additional implicit reference conversions may be inserted to match the interceptor's parameter types.
the language will insert a conversion from the interceptor's return type to the interceptable's return type.

Source generator performance

Source generators are currently a sore spot for performance. We have introduced incremental generators to try and mitigate this, but we think there's more to be done before we're in a good place here. For example, if a source generator adds members to a compilation in response to an edit within a method body, every location which could access those members needs to be re-bound. This breaks certain optimizing assumptions, for example, that making an edit within a method body won't affect the meaning of things outside of it.
Much of this perf work is only tangentially related to this proposal. However, it's very important that we understand how the design of this feature interacts with our performance requirements, and to ensure we don't deliver a design which traps us in a corner where good performance becomes unattainable.
We think that a source generator API ForMethodInvocationWithName, along the lines of ForAttributeWithMetadataName, might part of the solution for this.
ForMethodInvocationWithName takes a qualified method name. For example, for the method bool System.Text.RegularExpressions.Regex.IsMatch(string), we would use the System.Text.RegularExpressions.Regex.IsMatch. The term qualified method name is intentionally not precisely specified here, and could match any overloads of IsMatch with any number of parameters or type parameters, as long as it is contained within the appropriate type.
It's a goal that the implementation of this helper can limit binding to invocations which match the suffix IsMatch, for example, instead of needing to fully bind all method bodies in the source generation compilation pass.
See Alternatives for further discussion.
Conflicting interceptors

If more than one interceptor applies to the same usage, it is a compile-time error.
We could imagine usages where more than one component wants to perform "independent" forms of specialization, but we're not prepared to define what usage of multiple interceptors means on a single call, for example. It would be interesting to consider if a third-party framework could be designed on top of this feature which does define what the equivalent of multiple interceptors means.
If an [InterceptsLocation] attribute is found in the compilation which does not refer to the location of an interceptable method call, a compile-time error occurs.
Editor experience and public API

It's very important that we maintain a high standard of design-time traceability for projects which adopt this feature. Therefore, we want to provide the following behaviors:

Go To Definition on an interceptable call should show both the interceptable and interceptor methods.
Find All References on an interceptor declaration should show all intercepted call sites in addition to any normal calls.
Find All References on an interceptable declaration should show all intercepted call sites as well as which interceptors are in use at each call site.
Signature Help should show the interceptable method's signature.
Any public API providing information about the call, such as IInvocationOperation, should be able to give both the interceptable and interceptor method symbols associated with the call.
We'd like to avoid situations where a breakpoint is placed in an interceptable method and missed because the call site of interest was actually intercepted. Ideally, some gesture to set a method breakpoint on the interceptable method could be provided in these cases, and that method breakpoint would get hit when either the interceptable method is being called, or if some interceptor is being called.

If an interceptor is reproducing part of the implementation of an interceptable method from source, the interceptor may also want to use #line-references to the interceptable method.


Interceptor accessibility

Interceptors are ordinary methods. They will be directly usable and will appear in lookup, etc. if they are available at the given point per the language rules.
However, we think it will be common that interceptors are expected to not be called directly in user code. Therefore, an interceptor is allowed to intercept a call even if accessibility or file limitations would prohibit directly using the interceptor at the point of the call.
One technique source generators might use to hide their interceptors from the user would be to place them in a static file class.
Drawbacks

Requires source generators in practice

Although the feature doesn't literally require source generators to work, it is not viable for users to hand-write [InterceptsLocation] attributes which refer to locations in their source code. Code containing intercepted calls may be moved at any time, and the effects of doing so could range between getting a compile-time error to silently intercepting a different call than before. We haven't done a feature before which we would be inclined to discourage use of in the absence of source generators.
It's been suggested that alternative ways to specify the locations of calls that should be intercepted could make the feature viable in non-source-generator scenarios. See position ranges and semantic locations.
Limited ability to abstract usages

This feature is somewhat incompatible with indirection. If the information that we need to specialize over is not directly available at the call site, we might either have to fall back to runtime reflection or the source generator would need to emit an error.
void MakeRoutes(IEndpointRouteBuilder builder, string baseRoute, Delegate handler)
{
    // error: we don't know what kind of thunk to generate, because we don't know the exact parameters of 'handler'.
    builder.MapGet($"{baseRoute}/products", handler);
    builder.MapGet($"{baseRoute}/products-alias", handler);
}

void MakeRoute(IEndpointRouteBuilder builder)
{
    var handler = (int? page, int? pageSize, MyDb db) =>
    {
        return Response(...);
    }
    builder.MapGet("/products/", handler);
}
void RegisterAndDoSomething<T>(T instance)
{
    DependencyInjection.Register(instance);
    DoSomethingElse(instance);
}
However, it's been observed that reflection-based tools and macro systems also have similar tendencies--when certain indirections are introduced, information is lost.
Inability to capture ambient state

Some of the proposed use cases involve allowing the interceptor to take "additional" arguments based on variables in scope at the call site. A feature like implicit parameters be introduced instead to address this across the language rather than being specific to calls that are intercepted.
Alternatives

InterceptableAttribute alternatives

It has been suggested to include a mechanism for limiting which interceptors can work on which interceptable methods in a scheme comparable to IVTs. For example, if a generator adds interceptors to a compilation, we might require it to have a certain public key token in order to consider those interceptors "valid", based on some marker provided at the declaration of the interceptable method. We want to make sure this isn't a scheme where things only work if a generator does it. Manually-authored code may be able to work around these guardrails, but we think there is still value in having the guardrails.
It has been suggested to include a compiler flag to allow any calls to be intercepted, even if they are not calls to methods with [Interceptable].
InterceptsLocationAttribute alternatives

Syntax location of invocations

We could use the location of the ( token to denote the location of a method call, which is similar to how the implementation of CallerInfo attributes works. This might complicate things if we ever allow interception of invocations which don't use (, such as property accessors.
Line and column instead of position

We could denote locations via zero-indexed line and column numbers. This would enable us to respect #line directives when determining the location of a call. Generators would use Location.GetMappedLineSpan() to populate the attribute in this case. It's possible that this would enable a source generator to intercept a call in Razor code, for example: the InterceptsLocation would refer to a location in Razor code, and the line directive in the generated C# would tell us that the InterceptsLocation actually refers to some underlying C# code. This seems somewhat fragile. We would be depending very deeply on Razor having line directives which map precisely enough for the compiler to understand this relation, and we'd be depending on the source generator to understand Razor code precisely enough to refer to that relation.
Position ranges

We could let InterceptsLocation specify a range of positions where calls should be intercepted, and include some method of specifying which method we want to intercept calls to. For example, a containing type and name, though we might want something more precise. This would perhaps be another argument in favor of "bona-fide" syntax, rather than attributes, so that we can actually bind the method being targeted, distinguish it by its parameter types and so on.

[InterceptsLocation("file.cs", startPosition: 1, endPosition: 100, containingType: typeof(InterceptableType), methodName: nameof(InterceptableType.InterceptableMethod))]
[InterceptsLocation("file.cs", containingType: typeof(InterceptableType), methodName: nameof(InterceptableType.InterceptableMethod))]
void Interceptor(int) intercepts InterceptableType.InterceptableMethod(int) at "file.cs" 123 { /* body */ }

One potential drawback of this is: it becomes harder to error when the range is wrong. With an exact position we can assume that the attribute is supposed to intercept exactly one call, and give an error if that can't occur for some reason.
Semantic locations

It has been considered to identify methods to intercept based on some semantic criteria, instead of strictly on source location. For example:

Calls which occur anywhere within a particular type or namespace.
Calls which occur anywhere within a particular method (somehow denoted in the attribute).
Any calls of a particular interceptable method (somehow denoted in the attribute).
Calls of an interceptable method where specific constant arguments are used for certain parameters.

However, it seems like this doesn't provide a sufficiently useful framework. Any number of calls within a particular container may specialize differently, and may specialize based on constant or non-constant arguments, or based on any other contextual information reasonably discoverable by the source generator. In addition, it feels like this kind of semantic framework could be built on top of a purely location-based interception scheme.
Dedicated syntax instead of attributes

Attributes might not be the right fit for a feature with this kind of semantic impact. We might want to consider introducing an intercepts clause for method declarations. Consistency with #line directives might help us here.
static void InterceptorMethod() : "Program.cs" 73, "Utils.cs" 56, ...
{
}
Source generator performance alternatives

Instead of providing specialized source generator API to identify calls of particular methods with minimal binding, we could introduce a syntactic marker for calls which are interceptable, and allow source generators to subscribe to it. This would reduce the amount of work we need to do in the "initial" compilation pass.
void M()
{
    intercept InterceptableCall(); // a la 'await'
    [Intercept] InterceptableCall(); // if attributes on statements were added to the language
    #InterceptableCall(); // imply macro-like behavior by imitating the syntax of directives?
}
Macros

What interceptors are really doing is a little bit like what a macro system does. Many of our principles around source generators currently are geared around the feeling that source generators must not have any capability which plain old code does not have. A macro system could push code generation into the compilation process such that there isn't a risk of those generators not being able to run at a future point. If we pursued this, we would need to look deeply at existing modern macro systems in other languages and understand what tradeoffs can be made in terms of power versus toolability. However, this would probably end up being a very large undertaking, and it's not clear if there is a constructive/harmonious way for the existing source generators concept to coexist with a macro system in the platform.
Replace/original

Some of the use cases of interceptors could be addressed by a replace/original feature which relies entirely on adding additional source to the compilation. Instead of performing a substitution separately at each call site, we perform it once at the implementation.
// User.cs
partial class C
{
    [AddLogging]
    public replaceable partial void M()
    {
        DoSomething();
    }
}

// Generated.cs
public partial class C
{
    // Although both partials have implementations, this is the implementation which is emitted.
    public partial void M()
    {
        Log("about to do something");
        DoSomething();
        Log("did something");
    }
}
Interceptors are distinct from replace/original for a few reasons:

The interceptable method can come from a metadata reference rather than needing to be defined in source.
Each call can be intercepted by a different interceptor based on what is known about it at compile time, versus with replace where a single replacement must be provided for all callers.
Interception can't occur for calls which are not statically present in the current compilation, unlike with replace where we know the replacement will be used no matter how the method is called.

It seems possible to implement a version of replace/original using interceptors, where the original is given by a call to an interceptable method, and the replacement is the interceptor for that call.
class C
{
    public Output M(Input i)
    {
        return MImpl(i);
    }

    [Interceptable]
    private Output MImpl(Input i)
    {
        DoSomething();
        return new Output(i);
    }
}

class D
{
    [InterceptsLocation]
    public static Output MImpl_Interceptor(this C c, Input i)
    {
        // ...
    }
}
Unresolved questions

What parts of the design are still undecided?
Design meetings

Link to design notes that affect this proposal, and describe in one sentence for each what changes they led to.