This is a .NET5 middleware component that will extract HTML elements based on a CSS-style selector in a querystring argument and return them as the body of the request.
/my-page?extract=article
Extracts the outer HTML (the default) of all ARTICLE tags and concatenates them.
/my-page?extract=article&scope=inner
The same, except the inner HTML.
if the selector finds multiple matching tags, it will concantenate them all. If you only need the first, then you need to find a way to narrow your CSS selector to only return the first.
Uses the AngleSharp library, installable via Nuget. extract
will take any syntax that AngleSharp supports,
which is most CSS selectors.
https://github.com/AngleSharp/
This needs to be configured as middleware in Startup.cs
, like this:
public void Configure(IApplicationBuilder app, IWebHostEnvironment env)
{
app.UseMiddleware<ExtractFragmentMiddleware>();
}
Note: it needs to be placed early in the pipeline. Honestly, I don't know how early, but I configured as the first middleware in the pipeline, and it worked consistently.