The epitome of multi platform libraries

Note: This post is in direct response to feedback from my previous post, so if you have not read that one, you should.

Daniel Plaisted (@dsplaisted) provided me with some great feedback on my previous post, and I'd like to address some of the points that he commented on. The feedback in it's entirety can be found over at github, however, for simplicity and completeness sake, I'm going to copy all of what's there as of today over to this blog post.

Hi Alexandre,

Thanks for the great blog post on Portable Class Libraries. It is informative and well-written, and I enjoyed reading it. I do have some questions and some feedback, however. :-)

Overall, I think the future should include support for both shared source code (via MPLs, VS Shared Projects, linked files, etc.) as well as shared binaries (PCLs as they exist today or as they could exist in the future by having NuGet packages that express their contract dependencies).

It sounds like you are saying that MPLs are the future replacement for PCLs. However, at the end you talk about contract assembly dependencies. You mention them as a falback mechanism. Are you saying that in the future nupkegs should have duplicate assemblies for all the platforms they know about, and that the contract dependencies should only be used as a fallback? In that case, the NuGet package would have a different assembly for several different platforms, and those assemblies might or might not be identical. How would NuGet choose which of those assemblies to fall back to?

I don't think MPLs offer a good solution to allowing existing libraries to work on a new platform when it shows up. Requiring separate binaries for each platform even when the result will be exactly the same will also needlessly increase compile times and NuGet package sizes. This quickly adds up when you think about all the .NET platforms you might want to target:

  • .NET Framework
  • Windows Store apps (.NETCore)
  • Windows Phone (wpa)
  • Windows Phone Silverlight (wp)
  • CoreCLR (k10)
  • Xamarin.iOS (monotouch)
  • Xamarin.Android (monoandroid)
  • Xamarin.Mac (monomac)
  • Unity on a bunch of platforms (iOS, Android, PS4, PS Vita, XBox One, Wii U, Ouya, etc...)

You point out that creating a bait and switch library is actually more difficult than creating a non-portable cross-platform library. However, bait and switch is only needed to create reusable libraries that bridge differences between platforms. So most people shouldn't need to create bait and switch libraries, they can create simple PCLs that depend on bait and switch libraries that others have written.

The Async and HttpClient NuGet packages are basically bait and switch libraries, and now people can write "simple" PCLs using async/await and HttpClient that target Silverlight, Windows Phone 7.5, Windows Store, .NET Framework, and Xamarin. That's why I don't think you should discount these packages when thinking about the limitations of PCLs- when there are gaps like that in the APIs or features available to PCLs, we will try to fix it, and often that will be in the form of a NuGet package.

You also describe the problem that happens if you reference a bait and switch PCL package from a PCL but not from a platform-specific project that consumes it. This isn't a fundamental problem with bait and switch, just a tooling issue that needs to be fixed.

Like I said, I think there is a place for both shared source and shared binaries. In my opinion, portable binaries with bait and switch where necessary is more appropriate for reusable libraries, while shared source is best for writing an application for multiple platforms. We have been working on improving both scenarios (ie in VS 2013 Update 2 and in the ASP.NET vNext tooling, and I expect and hope that we will continue to do so in the future.

Daniel Plaisted (@dsplaisted)

First of all, I'd like to thank for the feedback. Getting feedback of this size is a first for me, and it's quite a joy, cause it means that at least someone read most of my post :-). Then again, my name isn't Alexandre D:, it's Aleksander, or Alxandr for short (just to make things harder for you xD). With that out of the way I'd like to start addressing some of the feedback you've given. Now, as far as I've understood it, @dsplaisted is a part of the PCL team at Microsoft, so I'm probably a bit out of my depth here, but I'll do my best nonetheless.

Before I start addressing the feedback though, I'd like to take a minute and talk about the end-goal. The far away future we should all strive towards. Note though, that what I'm writing in this post is my own personal opinion, and anyone is entitled to have different ones.

One assembly to rule them all

What's a good blog-post without some LOTR references? Anyways. The obvious epitome of multi platform libraries, is one where all the platforms are identical. All of the CLRs have the all the same types, and none of them differ in what they support. This however, is quite unrealistic, and quite boring, when looking at multi platform, so we're going to ignore that and focus on more realistic approaches to doing MPLs. The first one is feature-targeting assemblies.

Feature-targeting assemblies

Today, when you create a MPL, or a simple multi platform nupkeg, you target platforms. You create an assembly for net45, one for net40, or portable-net40+sl5. However, if we could target features instead of platforms, that would enable a lot simpler multi platform. For instance, take the CatFinder example from the previous post, and change it to something along the lines of this:

namespace CatFinder  
{
    public class Cat
    {
        public byte[] GetCatImage()
        {
            // Magic implementation
        }

        #if FEATURE_ASYNC
        public async Task<byte[]> GetCatImageAsync()
        {
            // Magic implementation
        }
        #endif

        #if FEATURE_IO
        public void DownloadCatImage(string filePath)
        {
            // Magic implementation
        }
        #endif

        #if FEATURE_IO && FEATURE_ASYNC
        public async Task DownloadCatImageAsync(string filePath)
        {
            // Magic implementation
        }
        #endif
    }
}

In theory (if we went really nuts), it would be possible (using the awesome power of Roslyn) to figure out that there are 4 possible ways to compile this assembly. You have platforms that support both async and IO, platforms that only support one of them, and platforms that support none of them. We could then compile this into something like lib/0 which was the full library, lib/1 and lib/2 which where the one without support for either IO or async (order doesn't matter), and last lib/3 which would be without support for either. When a project were to select one of these, it would start from 0 and check "do I support the contracts required for this version?" If it did, great, if it didn't, continue downwards until you reach the last one. If that one isn't supported, then the package in general isn't supported.

However, as with having all CLRs be equal and support all of the same, this is also not very viable. It has a few problems, first of which is that compiling becomes a lot more complex. The compiler has to read all FEATURE_* if-defs, and generate a matrix of how they could be combined. Also, some features depend on others (for instance, CRYPTO_MD5 would probably depend on CRYPTO). Also, you could end up with potentially hundreds of assemblies, where some of them would never be used, because they could contain combinations of libraries that doesn't exist on any CLR. There are other problems too, but let's fix the ones mentioned here first.

Half-way manually configured feature-targeting assemblies

If we keep the code from the code-snippet above, but introduce the following project.json, we solve the problems of infinite possible ways to combine features (note that this obviously wouldn't work today, but with a few changes to K it could):

{
    "targets": [
        ["FEATURE_ASYNC", "FEATURE_IO"],
        ["FEATURE_IO"],
        []
    ]
}

Note that with this (imaginary) config, there wouldn't be an assembly that only supported async. Only if you support both async and IO would it be possible to use the async apis. This solves the problem of explosive combinations of framework targets, but it still have some issues that needs to be addressed.

The main problem with this approach is the fact that you have to check if the target frameworks supports the contracts. Using the old NuGet install way, this wouldn't present too much of a problem, as it's just a small one-time cost during installation of the nupkeg, however, using K which select nupkeg assemblies during runtime, this would probably not be acceptable. Even if it somehow always selected the first possible path, it'd still have to check if said path was supported, and this could be quite expensive. Especially if one were to load a lot of these packages.

Framework targeting assemblies with automatic feature-defines

So, now that we've invalidated this approach too, what's next? Well, if we could keep the fact that we have automatic generation of FEATURE_* defines based on what contracts any given framework supports, we could go back to generating assemblies on a per-framework basis, without too much hassle. Let's say we keep the code as is, with no changes, yet change the project.json to this:

{
    "frameworks": {
        "net45": {},
        "net40": {},
        "sl5": {}
    }
}

With the following config we would get an assembly for net45, that supported all 4 methods, one for net40 that supported the two non-async methods, and sl5 that only supported the first method. Because we automatically generate defines matching each platform, this would "just work". And if we tried to run/install this on say Windows Phone, it would check the three versions that existed in the nupkeg and select the most fitting one based on the contract assemblies (as explained in the previous post).

There are multiple ways this could be achieved. One is simply to take every dependency (contract assembly or not) and generate a define matching it name (upper-cased, replacing . with _), so you'd get something like #if SYSTEM_IO instead of #if FEATURE_IO. This would enable monkey-patching of features missing in a platform (like async is monkey patched on top of net40 today) by simply adding a nupkeg named System.IO. The obvious downside to this is that you get horrible long names for stuff like SYSTEM_SECURITY_CRYPTOGRAPHY.

Another way to go would be to incorporate into dlls or nupkegs the ability to add defines to the project. So that simply installing System.IO nupkeg added the define FEATURE_IO (or whatever name they decide on). The disadvantage to this is that if it's added to nupkegs (like nuspec), changes to NuGet are probably required. Yet changes to NuGet are required anyways, so this might not be too big a down-side.

Now. I've explained what I want. How I'd like stuff to work. Which will help when answering to the feedback I posted at the top of this post. So now, let's get back to answering @dsplaisted on his questions.

Too many binaries

One of the problems with MPLs is that you can easily end up with a nupkeg containing a lot of assemblies that are in practice identical. For instance, if I were to remove the async parts of the CatFinder example above, and target net40 and net45 the assemblies would be identical. This doesn't sound too bad until you introduce 20+ frameworks, where a lot of them are almost identical in what "contract assemblies" they support, meaning that the lot of them would have identical binaries. This sounds like a huge waste with regards to both bandwidth usage, and storage.

First, let's look at the upside of having all these assemblies. The main upside is that picking one is really easy. It's simply a file lookup. That's it. It's like having a Dictionary<string, Assembly>. You trade storage space for ease of access, and efficiency.

In newer version of KPM (which is the K Package Manager, or NuGet for K if you like) they have implemented a central package cache. This means that no matter how many projects you have using a package, it's only going to reference the same one, and there'll only be one of the nupkeg on your hard-drive. Let's take Newtonsoft.Json as an example. Newtonsoft.Json is a fairly large .NET assembly, at around 500kb (I say around because it's dependent on which platform you look at). Newtonsoft.Json version 6.0.3 supports 7 different platforms, meaning there are 7 different Newtonsoft.Json assemblies in the nupkeg. All in all the space taken up by the Newtonsoft.Json.6.0.3-folder in my packages-folder is about 7.5MB. On my computer though, I probably have over a hundred projects that all use Newtonsoft.Json, that are not K projects, meaning they use NuGet instead of KPM, so I probably have more than 700 copies of different versions of Newtonsoft.Json assemblies somewhere on my computer. Now, with a central package cache, even if I have 50 different frameworks supported by Newtonsoft.Json, that would still end up taking up less space than the solution that is today.

Note: The benefits of a central package cache obviously benefits space used by packages that use PCLs (such as Newtonsoft.Json of today) and not just MPLs. The point I'm trying to make is that given the fact that we introduce a central cache, the space problem shouldn't really be a problem anymore, even with potentially huge packages.

Now, some of you are probably going "but we don't want 50 copies of Newtonsoft.Json on a cellphone, where space is limited", right? But remember, when you pick a final target platform, like "windows phone" or "silverlight" or whatever, you only get that 1 assembly. When you compile to a target platform, it will copy the assemblies you need, and only those. The cost of having multiple copies of the assemblies only apply to development PCs.

Also. Let's face it. It's 2014, and I have a laptop with an 500 GB SSD installed. This number probably isn't going to go down any time soon, so using 20MB on Newtonsoft.Json isn't really a problem, or at least that's my opinion.

Monkey patching the platform

I briefly mentioned this during the section of framework targeting assemblies with automatic feature-defines, yet being able to monkey patch the platform you're working on is an important feature, so I'll talk a bit more about it here.

In my examples (thus far, and in the previous blog post) I've used async support as something that is different from different versions of the framework, because older versions of the framework doesn't natively support async. However, @dsplaisted argued that I couldn't just ignore the fact that you could "fix" this, with a bait and switch nupkeg. However, the point in ignoring the ability to do monkey patching of the platform like this was simply to make it easier to describe the problem itself, because async is something fairly new, and a lot of the platforms doesn't natively support it. It wasn't meant to say that you shouldn't be allowed to patch the platform like that.

In fact, the fact that you can install a nupkeg and get async support in net40 is something I think is really awesome. It allows us to do stuff using the new paradigm, while still being able to run on older .NET platforms, and that's something I really appreciate, because I've worked several places where using the newest version of .NET was not an option. However, there is no problem using the same pattern with MPLs, with the exception that you remove the PCL from the nupkeg. Today the Microsoft.Bcl.Async package contains 9 different frameworks. 3 of these are PCLs. If the 3 PCLs where removed, it would all still just work.

Now, in order for this to work with #if FEATURE_ASYNC as we have in the code above, some changes would have to be made. As specified earlier, one way would be to just use the contract-assemblies file-name, so if the contract assembly that specified async support were named Microsoft.Bcl.Async, we wouldn't have to make much changes except for changing our project.json to this:

{
    "frameworks": {
        "net45": {},
        "net40": {
            "dependencies": {
                "Microsoft.Bcl.Async": "1.1.8"
            }
        },
        "sl5": {
            "dependencies": {
                "Microsoft.Bcl.Async": "1.1.8"
            }
        }
    }
}

If however, we allowed for contracts and nupkegs to define #defines, some changes to NuGet and K would have to be made. The same update would still apply to project.json though.

Contract Assemblies

One of the points about the feedback provided by @dsplaisted that doesn't sit well with me is the following:

The contract packages would be pretty much like they are today, except that in addition to the lib/contract and lib/k10 folders, they would have additional lib folders for each platform they support (so lib/net45, lib/win8, lib/wp8, lib/wpa81, etc.), which would contain either the implementation or the type forwarders for those platforms. These packages would need the "simple" version of the selection algorithm.

Daniel Plaisted (@dsplaisted)

By by adding all the platforms to a "contract package" (as they effectively become at that point), in my opinion defeats most of the point of having contract assemblies in the first place. By requiring the contract packages to have all platforms that are known at the moment of their creation, turns the question from "does my platform support System.IO, into does System.IO know about my platform". This results in 2 things. First of all, if I create my own platform, no nupkegs are going to be installed at all, because none of the contract packages know about my platform. Second of all, whenever a actual new platform is introduces, all of the contract packages needs to be updated with that platform. That means hundreds of packages that needs to stay in sync with the implementations of all CLRs in the .NET community.

When I talk about contract assemblies, I don't mean nupkegs that simply typeforward into the platform, I mean actual contracts. Empty metadata-assemblies that does one thing, describe capabilities of a CLR. It doesn't need to be there at runtime, because at runtime, the appropriate assembly is going to be selected from the runtime anyways. Therefore, I argued in my previous post, that contract assemblies shouldn't really be nupkegs at all. They should be a new metadata field in the nupkegs, just like "frameworkReferences" are today.

Compiling

Note: This isn't an answer to @dspaisted, rather it's answers to other feedback I've gotten through other channels.

One of the things I've been asked about is performance of compilation. Since, with PCLs you generate a single assembly every time you compile, in contract, with MPLs you will probably generate at least 3-4 assemblies. That's going to take 3-4 times as long, right?

Well, that's true. At least today. However, good stuff are coming. One of which is compiling to different targets in parallel. That will potentially speed up the computation a lot when building multiple assemblies. Also, if we wanted to take things really far, using the power or Roslyn we could potentially read and parse the source one, then generate multiple compilations from the source-trees that are only read once, and then simply get diagnostics without actually compiling. This would make getting errors and warnings in Visual Studio (or similar) extremely fast, even if you did so for a lot of target frameworks. And the only IO that would be performed would be reading the source files once. This is just a guess, but I'm guessing that would be faster than compiling a PCL is today.

Conclusion

I still think MPLs are the way to go, and that PCLs can slowly, but surely be phased out. There are a lot of work to be made before things work optimal, sure, but I still think it's doable, and the way to go forward. Also, I appreciate the feedback, and please do provide more if there is something I did not answer :).