Skip to content

Instantly share code, notes, and snippets.

@lucasmeijer
Last active April 6, 2016 14:27
Show Gist options
  • Save lucasmeijer/9de952d508cd9b737518 to your computer and use it in GitHub Desktop.
Save lucasmeijer/9de952d508cd9b737518 to your computer and use it in GitHub Desktop.
What is up and down in CoreFX, CoreCLR world.
I've been trying to understand how all moving parts of CoreFX, CoreCLR & ReferenceSource, especially related to
mscorlib. These are my notes / conclusions. If you have more information, or see something that is wrong, please
let me know!
the CoreCLR repo, has an embedded mscorlib inside of it. When diffing this against the referencesource mscorlib,
it looks like it forked at some point, and has received some minor cleanups. most occuring changes:
- change license header
- remove [ResourceExposure] and [ResourceConsumption] attributes.
- some modest improvements. (files like Task.cs, Thread.cs, AppDomain.cs, are files with relatively high amount of changes)
- cleanup. if referencesource code had #if DOTNETCORE, that define has been removed in the coreclr one, as it is now always true.
- all filenames have been camelcased.
but by and large, these corlibs are pretty much the same.
What it looks like is that the intention is to move many of the types that are in corlib today, into seperate assemblies.
You can see this intention in the CoreFX framework, where it actually has a System.Reflection.dll assembly, but today
that only has typeforwarders to mscorlib. It is my interpetation that this is to futureproof a future where these types
can move to System.Reflection.dll, without breaking user code. (I cannot actually find these typeforwarders in the corefx
repo, but if you inspect a build of corefx, and use ILSpy to look in the asssembly, the typeforwarders are there. maybe
some other tool makes them, not sure).
There are also some parts of BCL that CoreFX seems to implement itself, without typeforwarding to the "legacy" mscorlib.
Example of this is System.IO.File here: https://github.com/dotnet/corefx/blob/master/src/System.IO.FileSystem/src/System/IO/File.cs
This is a bit akward, because System.IO.File is also in the legacy mscorlib, but it looks like that code is read, and could
probably be removed, or maybe it needs to be kept around to make syncing with some internal microsoft full .net sourcetree
easy or something. Apparently there is a tool called BclRewriter.exe (not (yet) opensource), which the CoreCLR build
procress downloads over nuget. On my build it doesn't run, but on kangaroo's build, it apparently makes internal all the
types inside mscorlib that you should no longer be using directly. (like the System.Reflection ones, that you should be
using through the System.Reflection.dll port forwarders)
An important realization is that unlike the Mono BCL, the CoreFX BCL is not platform agnostic. CoreFX wants to stop using
icalls for everything, and start using pinvoke. This leads to the implementation of File.Delete being done with a pinvoke
to a windows native library:
https://github.com/dotnet/corefx/blob/master/src/Common/src/Interop/Windows/mincore/Interop.DeleteFile.cs (pinvokes into: api-ms-win-core-console-l1-1-0.dll.
note to self: we should check what windows-backward-compatibility story is on these native libraries. do they work on win7? winxp?)
So System.IO.dll compiled for windows, always pinvokes into a windows lib. If you want to run it on mac, you need a
different System.IO.dll compiled for mac, that pinvokes into an osx/posix lib. For System.IO.File, an osx/unix
implementation already lives in the corefx repo, but we need to realize that we need to ship seperate BCLs (or at leats
a subset) per platform. Also note that the work of the "cross os porting" of these libraries is shared between all
Runtimes that decide to support CoreFX. As long as the runtime supports PInvoke, which Mono, IL2CPP and CoreCLR do,
System.IO.dll only needs to be ported to platformX once.
Does CoreFX BCL strip/treeshake better than Mono or referencesource BCL? I have been hoping that it would.
(int.ToString() pulling in ThaiBuddistCalendar is not something I hear game devs request a lot :) ). I have not tried
yet, but since the CoreFX corlib at least today seems almost identical to referencesource, I see no reason to believe that
a HelloWorld app, gets smaller after stripping/treeshaking than it did on a referencesource BCL.
@terrajobst
Copy link

Great article. Here are my two cents. Let me know if you have any questions.

Relationship between .NET Core and .NET Framework

The mscorlib that is part of CoreCLR is a fork of the mscorlib that is part of the .NET Framework. You can think of it as Silverlight's copy.

In general, we don't have (and don't want to) have automatic code flow between .NET Core and the .NET Framework. The reason being significant implementation differences and compatibility requirements with the 1.8 billion installs of the .NET Framework.

BCL rewriter

The BCL rewriter was created in the Silverlight days to make it easier for us to share the same code base for .NET Framework and Silverlight and yet get the footprint down to something that works for Silverlight.

We currently still uses the rewriter because it's already there and thus was easier for us to use. Long term, since we don't share the implementations, we can physically refactor CoreCLRs mscorlib.

As far as implementation dependencies on higher level components such as globalization goes: that's a good point which we're (painfully) aware of.

I think there are two answers for this:

  1. Agressive tree shaker. .NET Native's tree shaker allows us to strip code more aggressively and reliably than the BCL rewriter. The reason being that the BCL rewriter must be super conservative and has to assume that any public API might be called via reflection. .NET Native compiles a specific app. So it can remove public APIs if the app doesn't use it and hasn't been marked by the app author to be requrired for reflection.
  2. Multiple runtimes. While the .NET Native tree shaker does a good job there are certainly limits. For example, there is no way the tree shaker can statically know that your app doesn't have to pick up the OS culture for localization. We could extend the tree shaker and expose these parts as configurations but that might not work in all cases or result in more complexity than we'd like. We already have two runtimes (CoreCLR and .NET Native) so I could totally see a world where we've even more runtimes that are more tailored to specific scenarios, i.e. ones don't have any support for globalization.

Code relationship between CoreCLR's mscorlib and CoreFX

The version of mscorlib that is part of CoreCLR is bigger than it needs to. In a perfect world, it would only contain the code that is runtime specific and have no overlap with CoreFX. For example, String should live here but Console shouldn't.

The are two reasons for this duplication:

  1. We simply haven't done the engineering work to pull it out. Some code in mscorlib is simply dead.
  2. Some duplication is currently by-design. For example, mscorlib might need to have an implementation for, say, IList<T>. However, there is no reason why the implementation has to be List<T>. It's better if we can version the widely used type List<T> independently of the runtime itself. One way to do this is by having a simplified copy of List<T> that is private to mscorlib.

P/Invokes vs. runtime calls

Originally, the idea was that the managed pieces of the CLI are operating system agnostic and that the runtime provides the OS specific implementations.

However, we believe that this creates a factoring nightmare. First of all, runtime calls (QCalls, FCalls, etc) are fairly complicated. Secondly, it would force all OS specific implementations into a single spot which forces the runtime to version at the same rate as the fasted component that needs OS specific logic. In other words, it doesn't scale.

We believe that the runtime should only provide, well, the runtime specific pieces, such as the GC, the JIT etc. We're even thinking about breaking the runtime into multiple pieces so that, for example, we could update the JIT independently of the GC.

For the libraries, we'd rather leverage NuGet to select the appropriate implementation. I think the implementation of System.IO would use source sharing, #if, and partial classes, so that we can isolate the OS specific pieces into a small set. We blieve this makes us more agile. So yes, we believe that P/Invokes and source sharing is the way to go there.

Please note that because of NuGet consumers don't have to know that. For them, it doesn't matter whether System.IO is a single DLL that runs on all operating systems or whether there are multiple implementations. In the end, you reference the same package and rely on the build to select the right implementation for deployment.

api-ms-*

Windows has done engineering work to improve the dependencies in Win32. The result is called API sets which have funky names like api-ms-win-core-console-l1-1-0.dll.

My understanding is that there differences between operating systems. AFAIK for CoreCLR we support Win7 and higher.

@terrajobst
Copy link

Tagging a few more folks that might want to correct me or add to what I said :-)

@weshaggard @ellismg @jkotas @KrzysztofCwalina @stephentoub

@kangaroo
Copy link

One minor note. You don't really need to ship multi-BCLs if you embrace nuget. Basically you'd only ship multiple mscorlib's, and then nuget all the packages that the build requires.

That said, you'll probably need to host private builds of the nuget pkgs which extend and add support for platforms that CoreFx does not support.

@joncham
Copy link

joncham commented Jul 17, 2015

@terrajobst

For the libraries, we'd rather leverage NuGet to select the appropriate
implementation. I think the implementation of System.IO would use source
sharing, #if, and partial classes, so that we can isolate the OS specific
pieces into a small set. We blieve this makes us more agile. So yes, we
believe that P/Invokes and source sharing is the way to go there.

Does that mean have platform specific (or OS API specific, i.e. POSIX/Win32) versions of each assembly needing native resources?

@terrajobst
Copy link

You don't really need to ship multi-BCLs if you embrace nuget. Basically you'd only ship multiple mscorlib's, and then nuget all the packages that the build requires.

Correct. You do, however, need the runtime library (e.g. mscorlib) to support a certain set of contracts which is done via type forwarding, i.e. something needs to type forward System.Runtime!System.String to mscorlib!System.String. Those type forwarders could be shipped by the runtime or in the derived packages (I'm not up to speed where we currently ship those with NuGet v3).

@terrajobst
Copy link

@joncham

Does that mean have platform specific (or OS API specific, i.e. POSIX/Win32) versions of each assembly needing native resources?

Pretty much. In the past, we only had to support multiple architectures (x86, x64, ARM). Now our native resources would be multiplied by platforms. A binary-based ecosystem using native code is much harder than MSIL which is why we'll probably rethink some of our native dependencies. For example, System.IO.Compression uses a native implementation of deflate called zlib. We didn't do it for speed but mostly because zlib provides a superior compression quality and porting it to managed code would have required more work than simply P/Invoking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment