A provably safe parallel language extension for C#

4/12/2013 - Marco (updated on 11/13/2017)

This article originally appeared on earthli News and has been cross-posted here.

The paper Uniqueness and Reference Immutability for Safe Parallelism by Colin S. Gordon, Matthew J. Parkinson, Jared Parsons, Aleks Bromfield, Joe Duffy is quite long (26 pages), detailed and involved. To be frank, most of the notation was foreign to me -- to say nothing of making heads or tails of most of the proofs and lemmas -- but I found the higher-level discussions and conclusions quite interesting.

The abstract is concise and describes the project very well:

A key challenge for concurrent programming is that side-effects (memory operations) in one thread can affect the behavior of another thread. In this paper, we present a type system to restrict the updates to memory to prevent these unintended side-effects. We provide a novel combination of immutable and unique (isolated) types that ensures safe parallelism (race freedom and deterministic execution). The type system includes support for polymorphism over type qualifiers, and can easily create cycles of immutable objects. Key to the system's flexibility is the ability to recover immutable or externally unique references after violating uniqueness without any explicit alias tracking. Our type system models a prototype extension to C# that is in active use by a Microsoft team. We describe their experiences building large systems with this extension. We prove the soundness of the type system by an embedding into a program logic.

The project proposes a type-system extension with which developers can write provably safe parallel programs -- i.e. "race freedom and deterministic execution" -- with the amount of actual parallelism determined when the program is analyzed and compiled rather than decided by a programmer creating threads of execution.

Isolating objects for parallelism

The "isolation" part of this type system reminds me a bit of the way that SCOOP addresses concurrency. That system also allows programs to designate objects as "separate" from other objects while also releasing the program from the onus of actually creating and managing separate execution contexts. That is, the syntax of the language allows a program to be written in a provably correct way (at least as far as parallelism is concerned; see the "other provable-language projects" section below). In order to execute such a program, the runtime loads not just the program but also another file that specifies the available virtual processors (commonly mapped to threads). Sections of code marked as "separate" can be run in parallel, depending on the available number of virtual processors. Otherwise, the program runs serially.

In SCOOP, methods are used as a natural isolation barrier, with input parameters marked as "separate". See SCOOP: Concurrency for Eiffel and SCOOP (software) for more details. The paper also contains an entire section listing other projects -- many implemented on the the JVM -- that have attempted to make provably safe programming languages.

The system described in this paper goes much further, adding immutability as well as isolation (the same concept as "separate" in SCOOP). An interesting extension to the type system is that isolated object trees are free to have references to immutable objects (since those can't negatively impact parallelism). This allows for globally shared immutable state and reduces argument-passing significantly. Additionally, there are readable and writable references: the former can only be read but may be modified by other objects (otherwise it would be immutable); the latter can be read and written and is equivalent to a "normal" object in C# today. In fact, "[...] writable is the default annotation, so any single-threaded C# that does not access global state also compiles with the prototype."

Permission types

In this safe-parallel extension, a standard type system is extended so that every type can be assigned such a permission and there is "support for polymorphism over type qualifiers", which means that the extended type system includes the permission in the type, so that, given B => A, a reference to readable B can be passed to a method that expects an immutable A. In addition, covariance is also supported for generic parameter types.

When they say that the "[k]ey to the system's flexibility is the ability to recover immutable or externally unique references after violating uniqueness without any explicit alias tracking", they mean that the type system allows programs to specify sections that accept isolated references as input, lets them convert to writable references and then convert back to isolated objects -- all without losing provably safe parallelism. This is quite a feat since it allows programs to benefit from isolation, immutability and provably safe parallelism without significantly changing common programming practice. In essence, it suffices to decorate variables and method parameters with these permission extensions to modify the types and let the compiler guide you as to further changes that need to be made. That is, an input parameter for a method will be marked as immutable so that it won't be changed and subsequent misuse has to be corrected.

Even better, they found that, in practice, it is possible to use extension methods to allow parallel and standard implementations of collections (lists, maps, etc.) to share most code.

A fully polymorphic version of a map() method for a collection can coexist with a parallelized version pmap() specialized for immutable or readable collections. [...] Note that the parallelized version can still be used with writable collections through subtyping and framing as long as the mapped operation is pure; no duplication or creation of an additional collection just for concurrency is needed.

Real projects and performance impact

Much of the paper is naturally concerned with proving that their type system actually does what it says it does. As mentioned above, at least 2/3 of the paper is devoted to lemmas and large swaths of notation. For programmers, the more interesting part is the penultimate section that discusses the extension to C# and the experiences in using it for larger projects.

A source-level variant of this system, as an extension to C#, is in use by a large project at Microsoft, as their primary programming language. The group has written several million lines of code, including: core libraries (including collections with polymorphism over element permissions and data-parallel operations when safe), a webserver, a high level optimizing compiler, and an MPEG decoder.

Several million lines of code is, well, it's an enormous amount of code. I'm not sure how many programmers they have or how they're counting lines or how efficiently they write their code, but millions of lines of code suggests generated code of some kind. Still, taken with the next statement on performance, that much code more than proves that the type system is viable.

These and other applications written in the source language are performance-competitive with established implementations on standard benchmarks; we mention this not because our language design is focused on performance, but merely to point out that heavy use of reference immutability, including removing mutable static/global state, has not come at the cost of performance in the experience of the Microsoft team.

Not only is performance not impacted, but the nature of the typing extensions allows the compiler to know much more about which values and collections can be changed, which affects how aggressively this data can be cached or inlined.

In fact, the prototype compiler exploits reference immutability information for a number of otherwise-unavailable compiler optimizations. [...] Reference immutability enables some new optimizations in the compiler and runtime system. For example, the concurrent GC can use weaker read barriers for immutable data. The compiler can perform more code motion and caching, and an MSIL-to-native pass can freeze immutable data into the binary.

Incremental integration ("unstrict" blocks)

In the current implementation, there is an unstrict block that allows the team at Microsoft to temporarily turn off the new type system and to ignore safety checks. This is a pragmatic approach which allows the software to be run before it has been proven 100% parallel-safe. This is still better than having no provably safe blocks at all. Their goal is naturally to remove as many of these blocks as possible -- and, in fact, this requirement drives further refinement of the type system and library.

We continue to work on driving the number of unstrict blocks as low as possible without over-complicating the type systems use or implementation.

The project is still a work-in-progress but has seen quite a few iterations, which is promising. The paper was written in 2012; it would be very interesting to take it for a test drive in a CTP.

Other provable-language projects

A related project at Microsoft Research Spec# contributed a lot of basic knowledge about provable programs. The authors even state that the "[...] type system grew naturally from a series of efforts at safe parallelism. [...] The earliest version was simply copying Spec#s [Pure] method attribute, along with a set of carefully designed task-and data-parallelism libraries." Spec#, in turn, is a "[...] formal language for API contracts (influenced by JML, AsmL, and Eiffel), which extends C# with constructs for non-null types, preconditions, postconditions, and object invariants".

Though the implementation of this permissions-based type system may have started with Spec#, the primary focus of that project was more a valiant attempt to bring Design-by-Contract principles (examples and some discussion here) to the .NET world via C#. Though spec# has downloadable code, the project hasn't really been updated in years. This is a shame, as support for Eiffel[^1] in .NET, mentioned above as one of the key influences of spec#, was dropped by ISE Eiffel long ago.

Spec#, in turn, was mostly replaced by Microsoft Research's Contracts project (an older version of which was covered in depth in Microsoft Code Contracts: Not with a Ten-foot Pole). The Contracts project seems to be alive and well: the most recent release is from October, 2012. I have not checked it out since my initial thumbs-down review (linked above) but did note in passing that the implementation is still (A) library-only and (B) does not support Visual Studio 2012.

The library-only restriction is particularly galling, as such an implementation can lead to repeated code and unwieldy anti-patterns. As documented in the Contracts FAQ, the current implementation of the "tools take care of enforcing consistency and proper inheritance of contracts" but this is presumably accomplished with compiler errors that require the programmer to include contracts from base methods in overrides.

The seminal work Object-oriented Software Construction by Bertrand Meyer (vol. II in particular) goes into tremendous detail on a type system that incorporates contracts directly. The type system discussed in this article covers only parallel safety: null-safety and other contracts are not covered at all. If you're at all interested in these types of language extensions, the vol.2 of OOSC is a great read. The examples are all in Eiffel but should be relatively accessible. Though some features -- generics, notably but also tuples, once routines and agents -- have since made their way into C# and other more commonly used languages, many others -- such as contracts, anchored types (contravariance is far too constrained in C# to allow them), covariant return types, covariance everywhere, multiple inheritance, explicit feature removal, loop variants and invariants, etc. -- are still not available. Subsequent interesting work has also been done on extensions that allow creation of provably null-safe programs, something also addressed in part by Microsoft Research's Contracts project.

Sign up for our Newsletter