1 2 3 4 5 6
Should you return `null` or an empty list?

I've seen a bunch of articles addressing this topic of late, so I've decided to weigh in.

The reason we frown on returning null from a method that returns a list or sequence is that we want to be able to freely use these sequences or lists with in a functional manner.

It seems to me that the proponents of "no nulls" are generally those who have a functional language at their disposal and the antagonists do not. In functional languages, we almost always return sequences instead of lists or arrays.

In C# and other functional languages, we want to be able to do this:

var names = GetOpenItems()
  .Where(i => i.OverdueByTwoWeeks)
  .SelectMany(i => i.GetHistoricalAssignees()
    .Select(a => new { a.FirstName, a.LastName })
  );

foreach (var name in names)
{
  Console.WriteLine("{1}, {0}", name.FirstName, name.LastName);
}

If either GetHistoricalAssignees() or GetOpenItems() might return null, then we'd have to write the code above as follows instead:

var openItems = GetOpenItems();
if (openItems != null)
{
  var names = openItems
  .Where(i => i.OverdueByTwoWeeks)
  .SelectMany(i => (i.GetHistoricalAssignees() ?? Enumerable.Empty<Person>())
    .Select(a => new { a.FirstName, a.LastName })
  );

  foreach (var name in names)
  {
    Console.WriteLine("{1}, {0}", name.FirstName, name.LastName);
  }
}

This seems like exactly the kind of code we'd like to avoid writing, if possible. It's also the kind of code that calling clients are unlikely to write, which will lead to crashes with NullReferenceExceptions. As we'll see below, there are people that seem to think that's perfectly OK. I am not one of those people, but I digress.

The post, Is it Really Better to 'Return an Empty List Instead of null'? / Part 1 by Christian Neumanns serves as a good example of an article that seems to be providing information but is just trying to distract people into accepting it as a source of genuine information. He introduces his topic with the following vagueness.

If we read through related questions in Stackoverflow and other forums, we can see that not all people agree. There are many different, sometimes truly opposite opinions. For example, the top rated answer in the Stackoverflow question Should functions return null or an empty object? (related to objects in general, not specifically to lists) tells us exactly the opposite:

Returning null is usually the best idea ...

The statement "we can see that not all people agree" is a tautology. I would split the people into groups of those whose opinions we should care about and everyone else. The statement "There are many different, sometimes truly opposite opinions" is also tautological, given the nature of the matter under discussion -- namely, a question that can only be answered as "yes" or "no". Such questions generally result in two camps with diametrically opposed opinions.

As the extremely long-winded pair of articles writes: sometimes you can't be sure of what an external API will return. That's correct. You have to protect against those with ugly, defensive code. But don't use that as an excuse to produce even more methods that may return null. Otherwise, you're just part of the problem.

The second article Is it Really Better to 'Return an Empty List Instead of null'? - Part 2 by Christian Neumanns includes many more examples.

I just don't know what to say about people that write things like "Bugs that cause NullPointerExceptions are usually easy to debug because the cause and effect are short-distanced in space (i.e. location in source code) and time." While this is kind of true, it's also even more true that you can't tell the difference between such an exception being caused by a savvy programmer who's using it to his advantage and a non-savvy programmer whose code is buggy as hell.

He has a ton of examples that try to distinguish between a method that returns an empty sequence being different from a method that cannot properly answer a question. This is a concern and a very real distinction to make, but the answer is not to return null to indicate nonsensical input. The answer is to throw an exception.

The method providing the sequence should not be making decisions about whether an empty sequence is acceptable for the caller. For sequences that cannot logically be empty, the method should throw an exception instead of returning null to indicate "something went wrong".

A caller may impart semantic meaning to an empty result and also throw an exception (as in his example with a cycling team that has no members). If the display of such a sequence on a web page is incorrect, then that is the fault of the caller, not of the provider of the sequence.

  • If data is not yet available, but should be, throw an exception
  • If data is not available but the provider isn't qualified to decide, return an empty sequence
  • If the caller receives an empty sequence and knows that it should not be empty, then it is responsible for indicating an error.

That there exists calling code that makes assumptions about return values that are incorrect is no reason to start returning values that will make calling code crash with a NullPointerException.

All of his examples are similar: he tries to make the pure-data call to retrieve a sequence of elements simultaneously validate some business logic. That's not a good idea. If this is really necessary, then the validity check should go in another method.

The example he cites for getting the amount from a list of PriceComponents is exactly why most aggregation functions in .NET throw an exception when the input sequence is empty. But that's a much better way of handling it -- with a precise exception -- than by returning null to try to force an exception somewhere in the calling code.

But the upshot for me is: I am not going to write code that, when I call it, forces me to litter other code with null-checks. That's just ridiculous.

Working with EF Migrations and branches

The version of EF Migrations discussed in this article is 5.0.20627. The version of Quino is less relevant: the features discussed have been supported for years. For those in a hurry, there is a tl;dr near the end of the article.

We use Microsoft Entity Framework (EF) Migrations in one of our projects where we are unable to use Quino. We were initially happy to be able to automate database-schema changes. After using it for a while, we have decidedly mixed feelings.

As developers of our own schema migration for the Quino ORM, we're always on the lookout for new and better ideas to improve our own product. If we can't use Quino, we try to optimize our development process in each project to cause as little pain as possible.

EF Migrations and branches

We ran into problems in integrating EF Migrations into a development process that uses feature branches. As long as a developer stays on a given branch, there are no problems and EF functions relatively smoothly.1

However, if a developer switches to a different branch -- with different migrations -- EF Migrations is decidedly less helpful. It is, in fact, quite cryptic and blocks progress until you figure out what's going on.

Assume the following not-uncommon situation:

  • The project is created in the master branch
  • The project has an initial migration BASE
  • Developers A and B migrate their databases to BASE
  • Developer A starts branch feature/A and includes migration A in her database
  • Developer B starts branch feature/B and includes migration B in his database

We now have the situation in which two branches have different code and each has its own database schema. Switching from one branch to another with Git quickly and easily addresses the code differences. The database is, unfortunately, a different story.

Let's assume that developer A switches to branch feature/B to continue working there. The natural thing for A to do is to call "update-database" from the Package Manager Console2. This yields the following message -- all-too-familiar to EF Migrations developers.

image

Unable to update database to match the current model because there are pending changes and automatic migration is disabled. Either write the pending changes to a code-based migration or enable automatic migration. [...]

This situation happens regularly when working with multiple branches. It's even possible to screw up a commit within a single branch, as illustrated in the following real-world example.

  • Add two fields to an existing class
  • Generate a migration with code that adds two fields
  • Migrate the database
  • Realize that you don't need one of the two fields
  • Remove the C# code from the migration for that field
  • Tests run green
  • Commit everything and push it

As far as you're concerned, you committed a single field to the model. When your co-worker runs that migration, it will be applied, but EF Migrations immediately thereafter complains that there are pending model changes to make. How can that be?

Out-of-sync migrations != outdated database

Just to focus, we're actually trying to get real work done, not necessarily debug EF Migrations. We want to answer the following questions:

  1. Why is EF Migrations having a problem updating the schema?
  2. How do I quickly and reliably update my database to use the current schema if EF Migrations refuses to do it?

The underlying reason why EF Migrations has problems is that it does not actually know what the schema of the database is. It doesn't read the schema from the database itself, but relies instead on a copy of the EF model that it stored in the database when it last performed a successful migration.

That copy of the model is also stored in the resource file generated for the migration. EF Migrations does this so that the migration includes information about which changes it needs to apply and about the model to which the change can be applied.

If the model stored in the database does not match the model stored with the migration that you're trying to apply, EF Migrations will not update the database. This is probably for the best, but leads us to the second question above: what do we have to do to get the database updated?

Generate a migration for those "pending changes"

The answer has already been hinted at above: we need to fix the model stored in the database for the last migration.

Let's take a look at the situation above in which your colleague downloaded what you thought was a clean commit.

From the Package Manager Console, run add-migration foo to scaffold a migration for the so-called "pending changes" that EF Migrations detected. That's interesting: EF Migrations thinks that your colleague should generate a migration to drop the column that you'd only temporarily added but never checked in.

That is, the column isn't in his database, it's not in your database, but EF Migrations is convinced that it was once in the model and must be dropped.

How does EF Migrations even know about a column that you added to your own database but that you removed from the code before committing? What dark magic is this?

The answer is probably obvious: you did check in the change. The part that you can easily remove (the C# code) is only half of the migration. As mentioned above, the other part is a binary chunk stored in the resource file associated with each migration. These BLOBS are stored in the table _MigrationHistory table in the database.

imageimage

How to fix this problem and get back to work

Here's the tl;dr: generate a "fake" migration, remove all of the C# code that would apply changes to the database (shown below) and execute update-database from the Package Manager Console.

image

This may look like it does exactly nothing. What actually happens is that it includes the current state of the EF model in the binary data for the last migration applied to the database (because you just applied it).

Once you've applied the migration, delete the files and remove them from the project. This migration was only generated to fix your local database; do not commit it.

Everything's cool now, right?

Applying the fix above doesn't mean that you won't get database errors. If your database schema does not actually match the application model, EF will crash when it assumes fields or tables are available which do not exist in your database.

Sometimes, the only way to really clean up a damaged database -- especially if you don't have the code for the migrations that were applied there3 -- is to remove the misapplied migrations from your database, undo all of the changes to the schema (manually, of course) and then generate a new migration that starts from a known good schema.

Conclusions and comparison to Quino

The obvious answer to the complaint "it hurts when I do this" is "stop doing that". We would dearly love to avoid these EF Migrations-related issues but developing without any schema-migration support is even more unthinkable.

We'd have to create upgrade scripts manually or would have to maintain scripts to generate a working development database and this in each branch. When branches are merged, the database-upgrade scripts have to be merged and tested as well. This would be a significant addition to our development process, has maintainability and quality issues and would probably slow us down even more.

And we're certainly not going to stop developing with branches, either.

We were hoping to avoid all of this pain by using EF Migrations. That EF Migrations makes us think of going back to manual schema migration is proof that it's not nearly as elegant a solution as our own Quino schema migration, which never gave us these problems.

Quino actually reads the schema in the database and compares that model directly against the current application model. The schema migrator generates a custom list of differences that map from the current schema to the desired schema and applies them. There is user intervention but it's hardly ever really required. This is an absolute godsend during development where we can freely switch between branches without any hassle.4

Quino doesn't recognize "upgrade" versus "downgrade" but instead applies "changes". This paradigm has proven to be a much better fit for our agile, multi-branch style of development and lets us focus on our actual work rather than fighting with tools and libraries.



  1. EF Migrations as we use it is tightly bound to SQL Server. Just as one example, the inability of SQL Server to resolve cyclic cascade dependencies is in no way shielded by EF Migrations. Though the drawback originates in SQL Server, EF Migrations simply propagates it to the developer, even though it purports to provide an abstraction layer. Quino, on the other hand, does the heavy lifting of managing triggers to circumvent this limitation.

  2. As an aside, this is a spectacularly misleading name for a program feature. It should just be called "Console".

  3. I haven't ever been able to use the Downgrade method that is generated with each migration, but perhaps someone with more experience could explain how to properly apply such a thing. If that doesn't work, the method outlined above is your only fallback.

  4. The aforementioned database-script maintenance or having only very discrete schema-update points or maintaining a database per branch and switching with configuration files or using database backups or any other schemes that end up distracting you from working.

Dealing with improper disposal in WCF clients

There's an old problem in generated WCF clients in which the Dispose() method calls Close() on the client irrespective of whether there was a fault. If there was a fault, then the method should call Abort() instead. Failure to do so causes another exception, which masks the original exception. Client code will see the subsequent fault rather than the original one. A developer running the code in debug mode will have be misled as to what really happened.

You can see WCF Clients and the "Broken" IDisposable Implementation by David Barrett for a more in-depth analysis, but that's the gist of it.

This issue is still present in the ClientBase implementation in .NET 4.5.1. The linked article shows how you can add your own implementation of the Dispose() method in each generated client. An alternative is to use a generic adaptor if you don't feel like adding a custom dispose to every client you create.1

**public class** SafeClient<T> : IDisposable
  **where** T : ICommunicationObject, IDisposable
{
  **public** SafeClient(T client)
  {
    **if** (client == **null**) { **throw new** ArgumentNullException("client"); }

    Client = client;
  }
  
  **public** T Client { **get**; **private set**; }

  **public void** Dispose()
  {
    Dispose(**true**);
    GC.SuppressFinalize(**this**);
  }

  **protected virtual void** Dispose(**bool** disposing)
  {
    **if** (disposing)
    {
      **if** (Client != **null**)
      {
        **if** (Client.State == CommunicationState.Faulted) 
        {
          Client.Abort();
        }
        **else**
        {
          Client.Close();
        }

        Client = **default**(T);
      }
    }
  }  
}

To use your WCF client safely, you wrap it in the class defined above, as shown below.

**using** (**var** safeClient = **new** SafeClient<SystemLoginServiceClient>(**new** SystemLoginServiceClient(...)))
{
  **var** client = safeClient.Client;
  // Work with "client"
}

If you can figure out how to initialize your clients without passing parameters to the constructor, you could slim it down by adding a "new" generic constraint to the parameter T in SafeClient and then using the SafeClient as follows:

**using** (**var** safeClient = **new** SafeClient<SystemLoginServiceClient>())
{
  **var** client = safeClient.Client;
  // Work with "client"
}


  1. The code included in this article is a sketch of a solution and has not been tested. It does compile, though.

Entity Framework Generated SQL

This article originally appeared on earthli News and has been cross-posted here.


Microsoft just recently released Visual Studio 2013, which includes Entity Framework 6 and introduces a lot of new features. It reminded me of the following query that EF generated for me, way, way back when it was still version 3.5. Here's hoping that they've taken care of this problem since then.

So, the other day EF (v3.5) seemed to be taking quite a while to execute a query on SQL Server. This was a pretty central query and involved a few joins and restrictions, but wasn't anything too wild. All of the restrictions and joins were on numeric fields backed by indexes.

In these cases, it's always best to just fire up the profiler and see what kind of SQL is being generated by EF. It was a pretty scary thing (I've lost it unfortunately), but I did manage to take a screenshot of the query plan, shown below.

image

It doesn't look too bad until you notice that the inset on the bottom right (the black smudgy thing) is a representation of the entire query ... and that it just kept going on down the page.

.NET 4.5.1 and Visual Studio 2013 previews are available

The article Announcing the .NET Framework 4.5.1 Preview provides an incredible amount of detail about a relatively exciting list of improvements for .NET developers.

x64 Edit & Continue

First and foremost, the Edit-and-Continue feature is now available for x64 builds as well as x86 builds. Whereas an appropriate cynical reaction is that "it's about damn time they got that done", another appropriate reaction is to just be happy that they will finally support x64-debugging as a first-class feature in Visual Studio 2013.

Now that they have feature-parity for all build types, they can move on to other issues in the debugger (see the list of suggestions at the end).

Async-aware debugging

We haven't had much opportunity to experience the drawbacks of the current debugger vis à vis asynchronous debugging, but the experience outlined in the call-stack screenshot below is one that is familiar to anyone who's done multi-threaded (or multi-fiber, etc.) programming.

image

Instead of showing the actual stack location in the thread within which the asynchronous operation is being executed, the new and improved version of the debugger shows a higher-level interpretation that places the current execution point within the context of the asnyc operation. This is much more in keeping with the philosophy of the async/await feature in .NET 4.5, which lets developers write asynchronous code in what appears to be a serial fashion. This improved readability has been translated to the debugger now, as well.

image

Return-value inspection

The VS2013 debugger can now show the "direct return values and the values of embedded methods (the arguments)" for the current line.1 Instead of manually selecting the text segment and using the Quick Watch window, you can now just see the chain of values in the "Autos" debugger pane.

image

Nuget Improvements

We are also releasing an update in Visual Studio 2013 Preview to provide better support for apps that indirectly depend on multiple versions of a single NuGet package. You can think of this as sane NuGet library versioning for desktop apps.

We've been bitten by the afore-mentioned issue and are hopeful that the solution in Visual Studio 2013 will fill the gaps in the current release. The article describes several other improvements to the Nuget services, including integration with Windows Update for large-scale deployment. They also mentioned "a curated list of Microsoft .NET Framework NuGet Packages to help you discover these releases, published in OData format on the NuGet site", but don't mention whether the Nuget UI in VS2013 has been improved. The current UI, while not as awful and slow as initial versions, is still not very good for discovery and is quite clumsy for installation and maintenance.

User Voice for Visual Studio/.NET

You're not limited to just waiting on the sidelines to see which feature Microsoft has decided to implement in the latest version of .NET/Visual Studio. You should head over to the User Voice for Visual Studio site to get an account and vote for the issues you'd like the to work on next.

Here's a list of the ones I found interesting, and some of which I've voted on.



  1. In a similar vein, I found the issue Bring back Classic Visual Basic, an improved version of VB6 to be interesting, simply because of the large number of votes for it (1712 at the time of writing). While it's understandable that VB6 developers don't understand the programming paradigm that came with the transition to .NET, the utterly reactionary desire to go back to VB6 is somewhat unfathomable. It's 2013, you can't put the dynamic/lambda/jitted genie back in the bottle. If you can't run with the big dogs, you'll have to stay on the porch...and stop being a developer. There isn't really any room for software written in a glorified batch language anymore.

  2. This feature has been available for the unmanaged-code debugger (read: C++) for a while now.

Deleting multiple objects in Entity Framework

Many improvements have been made to Microsoft's Entity Framework (EF) since we here at Encodo last used it in production code. In fact, we'd last used it waaaaaay back in 2008 and 2009 when EF had just been released. Instead of EF, we've been using the Quino ORM whenever we can.

However, we've recently started working on a project where EF5 is used (EF6 is in the late stages of release, but is no longer generally available for production use). Though we'd been following the latest EF developments via the ADO.Net blog, we finally had a good excuse to become more familiar with the latest version with some hands-on experience.

Our history with EF

Entity Framework: Be Prepared was the first article we wrote about working with EF. It's quite long and documents the pain of using a 1.0 product from Microsoft. That version support only a database-first approach, the designer was slow and the ORM SQL-mapper was quite primitive. Most of the tips and advice in the linked article, while perhaps amusing, are no longer necessary (especially if you're using the Code-first approach, which is highly recommended).

Our next update, The Dark Side of Entity Framework: Mapping Enumerated Associations, discusses a very specific issue related to mapping enumerated types in an entity model (something that Quino does very well). This shortcoming in EF has also been addressed but we haven't had a chance to test it yet.

Our final article was on performance, Pre-generating Entity Framework (EF) Views, which, while still pertinent, no longer needs to be done manually (there's an Entity Framework Power Tools extension for that now).

So let's just assume that that was the old EF; what's the latest and greatest version like?

Well, as you may have suspected, you're not going to get an article about Code-first or database migrations.1 While a lot of things have been fixed and streamlined to be not only much more intuitive but also work much more smoothly, there are still a few operations that aren't so intuitive (or that aren't supported by EF yet).

Standard way to delete objects

One such operation is deleting multiple objects in the database. It's not that it's not possible, but that the only solution that immediately appears is to,

  • load the objects to delete into memory,
  • then remove these objects from the context
  • and finally save changes to the context, which will remove them from the database

The following code illustrates this pattern for a hypothetical list of users.

var users = context.Users.Where(u => u.Name == "John");

foreach (var u in users)
{
  context.Users.Remove(u);
}

context.SaveChanges();

This seems somewhat roundabout and quite inefficient.2

Support for batch deletes?

While the method above is fine for deleting a small number of objects -- and is quite useful when removing different types of objects from various collections -- it's not very useful for a large number of objects. Retrieving objects into memory only to delete them is neither intuitive nor logical.

The question is: is there a way to tell EF to delete objects based on a query from the database?

I found an example attached as an answer to the post Simple delete query using EF Code First. The gist of it is shown below.

context.Database.SqlQuery<User>(
  "DELETE FROM Users WHERE Name = @name",
  new [] { new SqlParameter("@name", "John") }
);

To be clear right from the start, using ESQL is already sub-optimal because the identifiers are not statically checked. This query will cause a run-time error if the model changes so that the "Users" table no longer exists or the "Name" column no longer exists or is no longer a string.

Since I hadn't found anything else more promising, though, I continued with this approach, aware that it might not be usable as a pattern because of the compile-time trade-off.

Although the answer had four up-votes, it is not clear that either the author or any of his fans have actually tried to execute the code. The code above returns an IEnumerable<User> but doesn't actually do anything.

After I'd realized this, I went to MSDN for more information on the SqlQuery method. The documentation is not encouraging for our purposes (still trying to delete objects without first loading them), as it describes the method as follows (emphasis added),

Creates a raw SQL query that will return elements of the given generic type. The type can be any type that has properties that match the names of the columns returned from the query, or can be a simple primitive type.

This does not bode well for deleting objects using this method. Creating an enumerable does very little, though. In order to actually execute the query, you have to evaluate it.

Die Hoffnung stirbt zuletzt3 as we like to say on this side of the pond, so I tried evaluating the enumerable. A foreach should do the trick.

var users = context.Database.SqlQuery<User>(
  "DELETE FROM Users WHERE Name = @name", 
  new [] { new SqlParameter("@name", "John") }
);

foreach (var u in users)
{
  // NOP?
}

As indicated by the "NOP?" comment, it's unclear what one should actually do in this loop because the query already includes the command to delete the selected objects.

Our hopes are finally extinguished with the following error message:

System.Data.EntityCommandExecutionException : The data reader is incompatible with the specified 'Demo.User'. A member of the type, 'Id', does not have a corresponding column in the data reader with the same name.

That this approach does not work is actually a relief because it would have been far too obtuse and confusing to use in production.

It turns out that the SqlQuery only works with SELECT statements, as was strongly implied by the documentation.

var users = context.Database.SqlQuery<User>(
  "SELECT * FROM Users WHERE Name = @name",
  new [] { new SqlParameter("@name", "John") }
);

Once we've converted to this syntax, though, we can just use the much clearer and compile-time--checked version that we started with, repeated below.

var users = context.Users.Where(u => u.Name == "John");

foreach (var u in users)
{
  context.Users.Remove(u);
}

context.SaveChanges();

So we're back where we started, but perhaps a little wiser for having tried.

Deleting objects with Quino

As a final footnote, I just want to point out how you would perform multiple deletes with the Quino ORM. It's quite simple, really. Any query that you can use to select objects you can also use to delete objects4.

So, how would I execute the query above in Quino?

Session.Delete(Session.CreateQuery<User>().WhereEquals(User.MetaProperties.Name, "John").Query);

To make it a little clearer instead of showing off with a one-liner:

var query = Session.CreateQuery<User>();
query.WhereEquals(User.MetaProperties.Name, "John");
Session.Delete(query);

Quino doesn't support using Linq to create queries, but its query API is still more statically checked than ESQL. You can see how the query could easily be extended to restrict on much more complex conditions, even including fields on joined tables.


Some combination of these reasons possibly accounts for EF's lack of support for batch deletes.


  1. As I wrote, We're using Code-first, which is much more comfortable than using the database-diagram editor of old. We're also using the nascent "Migrations" support, which has so far worked OK, though it's nowhere near as convenient as Quino's automated schema-migration.

  2. Though it is inefficient, it's better than a lot of other examples out there, which almost unilaterally include the call to context.SaveChanges() inside the foreach-loop. Doing so is wasteful and does not give EF an opportunity to optimize the delete calls into a single SQL statement (see footnote below).

  3. Translates to: "Hope is the last (thing) to die."

  4. With the following caveats, which generally apply to all queries with any ORM:

    * Many databases use a different syntax and provide different support for `DELETE` vs. `SELECT` operations.
    * Therefore, it is more likely that more complex conditions are not supported for `DELETE` operations on some database back-ends
    * Since the syntax often differs, it's more likely that a more complex query will fail to map properly in a `DELETE` operation than in a `SELECT` operation simply because that particular combination has never come up before.
    * That said, Quino has quite good support for deleting objects with restrictions not only on the table from which to delete data but also from other, joined tables.
    

A provably safe parallel language extension for C#

This article originally appeared on earthli News and has been cross-posted here.


The paper Uniqueness and Reference Immutability for Safe Parallelism by Colin S. Gordon, Matthew J. Parkinson, Jared Parsons, Aleks Bromfield, Joe Duffy is quite long (26 pages), detailed and involved. To be frank, most of the notation was foreign to me -- to say nothing of making heads or tails of most of the proofs and lemmas -- but I found the higher-level discussions and conclusions quite interesting.

The abstract is concise and describes the project very well:

A key challenge for concurrent programming is that side-effects (memory operations) in one thread can affect the behavior of another thread. In this paper, we present a type system to restrict the updates to memory to prevent these unintended side-effects. We provide a novel combination of immutable and unique (isolated) types that ensures safe parallelism (race freedom and deterministic execution). The type system includes support for polymorphism over type qualifiers, and can easily create cycles of immutable objects. Key to the system's flexibility is the ability to recover immutable or externally unique references after violating uniqueness without any explicit alias tracking. Our type system models a prototype extension to C# that is in active use by a Microsoft team. We describe their experiences building large systems with this extension. We prove the soundness of the type system by an embedding into a program logic.

The project proposes a type-system extension with which developers can write provably safe parallel programs -- i.e. "race freedom and deterministic execution" -- with the amount of actual parallelism determined when the program is analyzed and compiled rather than decided by a programmer creating threads of execution.

Isolating objects for parallelism

The "isolation" part of this type system reminds me a bit of the way that SCOOP addresses concurrency. That system also allows programs to designate objects as "separate" from other objects while also releasing the program from the onus of actually creating and managing separate execution contexts. That is, the syntax of the language allows a program to be written in a provably correct way (at least as far as parallelism is concerned; see the "other provable-language projects" section below). In order to execute such a program, the runtime loads not just the program but also another file that specifies the available virtual processors (commonly mapped to threads). Sections of code marked as "separate" can be run in parallel, depending on the available number of virtual processors. Otherwise, the program runs serially.

In SCOOP, methods are used as a natural isolation barrier, with input parameters marked as "separate". See SCOOP: Concurrency for Eiffel and SCOOP (software) for more details. The paper also contains an entire section listing other projects -- many implemented on the the JVM -- that have attempted to make provably safe programming languages.

The system described in this paper goes much further, adding immutability as well as isolation (the same concept as "separate" in SCOOP). An interesting extension to the type system is that isolated object trees are free to have references to immutable objects (since those can't negatively impact parallelism). This allows for globally shared immutable state and reduces argument-passing significantly. Additionally, there are readable and writable references: the former can only be read but may be modified by other objects (otherwise it would be immutable); the latter can be read and written and is equivalent to a "normal" object in C# today. In fact, "[...] writable is the default annotation, so any single-threaded C# that does not access global state also compiles with the prototype."

Permission types

In this safe-parallel extension, a standard type system is extended so that every type can be assigned such a permission and there is "support for polymorphism over type qualifiers", which means that the extended type system includes the permission in the type, so that, given B => A, a reference to readable B can be passed to a method that expects an immutable A. In addition, covariance is also supported for generic parameter types.

When they say that the "[k]ey to the system's flexibility is the ability to recover immutable or externally unique references after violating uniqueness without any explicit alias tracking", they mean that the type system allows programs to specify sections that accept isolated references as input, lets them convert to writable references and then convert back to isolated objects -- all without losing provably safe parallelism. This is quite a feat since it allows programs to benefit from isolation, immutability and provably safe parallelism without significantly changing common programming practice. In essence, it suffices to decorate variables and method parameters with these permission extensions to modify the types and let the compiler guide you as to further changes that need to be made. That is, an input parameter for a method will be marked as immutable so that it won't be changed and subsequent misuse has to be corrected.

Even better, they found that, in practice, it is possible to use extension methods to allow parallel and standard implementations of collections (lists, maps, etc.) to share most code.

A fully polymorphic version of a map() method for a collection can coexist with a parallelized version pmap() specialized for immutable or readable collections. [...] Note that the parallelized version can still be used with writable collections through subtyping and framing as long as the mapped operation is pure; no duplication or creation of an additional collection just for concurrency is needed.

Real projects and performance impact

Much of the paper is naturally concerned with proving that their type system actually does what it says it does. As mentioned above, at least 2/3 of the paper is devoted to lemmas and large swaths of notation. For programmers, the more interesting part is the penultimate section that discusses the extension to C# and the experiences in using it for larger projects.

A source-level variant of this system, as an extension to C#, is in use by a large project at Microsoft, as their primary programming language. The group has written several million lines of code, including: core libraries (including collections with polymorphism over element permissions and data-parallel operations when safe), a webserver, a high level optimizing compiler, and an MPEG decoder.

Several million lines of code is, well, it's an enormous amount of code. I'm not sure how many programmers they have or how they're counting lines or how efficiently they write their code, but millions of lines of code suggests generated code of some kind. Still, taken with the next statement on performance, that much code more than proves that the type system is viable.

These and other applications written in the source language are performance-competitive with established implementations on standard benchmarks; we mention this not because our language design is focused on performance, but merely to point out that heavy use of reference immutability, including removing mutable static/global state, has not come at the cost of performance in the experience of the Microsoft team.

Not only is performance not impacted, but the nature of the typing extensions allows the compiler to know much more about which values and collections can be changed, which affects how aggressively this data can be cached or inlined.

In fact, the prototype compiler exploits reference immutability information for a number of otherwise-unavailable compiler optimizations. [...] Reference immutability enables some new optimizations in the compiler and runtime system. For example, the concurrent GC can use weaker read barriers for immutable data. The compiler can perform more code motion and caching, and an MSIL-to-native pass can freeze immutable data into the binary.

Incremental integration ("unstrict" blocks)

In the current implementation, there is an unstrict block that allows the team at Microsoft to temporarily turn off the new type system and to ignore safety checks. This is a pragmatic approach which allows the software to be run before it has been proven 100% parallel-safe. This is still better than having no provably safe blocks at all. Their goal is naturally to remove as many of these blocks as possible -- and, in fact, this requirement drives further refinement of the type system and library.

We continue to work on driving the number of unstrict blocks as low as possible without over-complicating the type systems use or implementation.

The project is still a work-in-progress but has seen quite a few iterations, which is promising. The paper was written in 2012; it would be very interesting to take it for a test drive in a CTP.

Other provable-language projects

A related project at Microsoft Research Spec# contributed a lot of basic knowledge about provable programs. The authors even state that the "[...] type system grew naturally from a series of efforts at safe parallelism. [...] The earliest version was simply copying Spec#s [Pure] method attribute, along with a set of carefully designed task-and data-parallelism libraries." Spec#, in turn, is a "[...] formal language for API contracts (influenced by JML, AsmL, and Eiffel), which extends C# with constructs for non-null types, preconditions, postconditions, and object invariants".

Though the implementation of this permissions-based type system may have started with Spec#, the primary focus of that project was more a valiant attempt to bring Design-by-Contract principles (examples and some discussion here) to the .NET world via C#. Though spec# has downloadable code, the project hasn't really been updated in years. This is a shame, as support for Eiffel[^1] in .NET, mentioned above as one of the key influences of spec#, was dropped by ISE Eiffel long ago.

Spec#, in turn, was mostly replaced by Microsoft Research's Contracts project (an older version of which was covered in depth in Microsoft Code Contracts: Not with a Ten-foot Pole). The Contracts project seems to be alive and well: the most recent release is from October, 2012. I have not checked it out since my initial thumbs-down review (linked above) but did note in passing that the implementation is still (A) library-only and (B) does not support Visual Studio 2012.

The library-only restriction is particularly galling, as such an implementation can lead to repeated code and unwieldy anti-patterns. As documented in the Contracts FAQ, the current implementation of the "tools take care of enforcing consistency and proper inheritance of contracts" but this is presumably accomplished with compiler errors that require the programmer to include contracts from base methods in overrides.

The seminal work Object-oriented Software Construction by Bertrand Meyer (vol. II in particular) goes into tremendous detail on a type system that incorporates contracts directly. The type system discussed in this article covers only parallel safety: null-safety and other contracts are not covered at all. If you're at all interested in these types of language extensions, the vol.2 of OOSC is a great read. The examples are all in Eiffel but should be relatively accessible. Though some features -- generics, notably but also tuples, once routines and agents -- have since made their way into C# and other more commonly used languages, many others -- such as contracts, anchored types (contravariance is far too constrained in C# to allow them), covariant return types, covariance everywhere, multiple inheritance, explicit feature removal, loop variants and invariants, etc. -- are still not available. Subsequent interesting work has also been done on extensions that allow creation of provably null-safe programs, something also addressed in part by Microsoft Research's Contracts project.

Visual Studio & Resharper Hotkey Reference (PDF)

It's always a good idea to familiarize yourself with the hotkeys (shortcuts, accelerators, etc.) for any piece of software that you use a lot. Using a keyboard shortcut is often much faster than reaching for the mouse. Applications with a lot of functionality -- like Word, IDEs and graphics tools -- have a lot of hotkeys, and they will help you become much more efficient.

imageAt Encodo, we do a lot of our programming in Visual Studio and Resharper. There are a lot of very useful hotkeys for this IDE combination and we listed the ones we use in a new document that you can download called the Encodo Visual Studio & Resharper Hotkey Reference.

We managed to make all of the shortcuts fit on a single A4 page, so you can print it out and keep it on your desk until you've got them memorized. Almost all of the hotkeys are the standard ones for Visual Studio and for ReSharper (when using the Visual Studio key mapping), so you won't even have to change anything.

To get the few non-standard hotkeys in the Encodo standard settings, you can import this VS2012 settings file. In addition to the most important combinations listed in the hotkey reference, the file includes the following changes to the standard layout.

  • Ctrl+W: Close window
  • Alt+Shift+W: Close all but this window
  • Ctrl+Shift+W: Close all windows
  • Ctrl+Shift+E: Select current word

And, even if you have your own preferred hotkeys, you can still take a look at the chart to discover a feature of Visual Studio or ReSharper that you may have missed.


At the time of writing, we're using Visual Studio 2012 SP1 and Resharper 7.1.

ASP.Net MVC 3 auf Mono/Linux

In letzter Zeit hatten wir von der Encodo vermehrt Berührungen mit Mono, da eines der Projekte heterogen mit Windows- und Linux-Hardware (ARM Prozessorarchitektur) operiert. Wir nehmen dies zum Anlass um den aktuellen Stand von Mono zu "protokollieren".

In dem Projekt hat es sowohl C#-Services (Konsolen- und Service-Apps) wie auch Webanwendungen. Eine dieser Webanwendung muss sowohl auf Windows (.Net 4) wie auch auf Open-Embedded-Linux (Mono) laufen.

Da wir für die .Net-Webentwicklung von Microsoft ASP.Net MVC Framework begeistert sind, wollten wir dieses moderne Framework auch auf allen Zielplattformen verwenden können. Am liebsten natürlich die neuste MVC Version 3 (basiert auf .Net 4 und ASP.Net 4) welche auch die neue View-Engine namens "Razor" mit sich bringt. Kurz: das Neuste und coolste was Microsoft in Sachen Webframeworks zu bieten hat.

Mono-Versionen und -Kompatibilität:

An dieser Stelle erst mal ein kleiner Update über die aktuelleren Mono-Versionen aus Sicht der Web-Entwicklung:

Mono 2.6 (Dezember 2009, Standard auf vielen Linux-Distributionen z.B. Ubuntu):

  • C# 3.5
  • Linq
  • ASP.Net 3.5
  • ASP.Net MVC 1 (teilweise)

Mono 2.8 (Oktober 2010, Stable):

  • C# 4
  • ASP.Net 4
  • ASP.Net MVC 1 + 2 (komplett)

Mono 2.10 (Ende 2011):

  • C# 4 + 5 (Experimentell)
  • ASP.Net MVC 3 (teilweise - ohne Razor)

Schnell war klar, dass MVC 1 relativ schnell auf bestehenden Systemen zum Laufen bekommen ist. MVC 2 war etwas anspruchsvoller, da auf dem jeweiligen Linux mindestens Mono 2.8 benötigt wurde, welches bei einigen Herstellern erst in die neusten bzw. kommenden Linux-Versionen einfliesst.

Aber wir wollen ja sowieso lieber MVC 3 mit Razor! Somit ist klar, dass wir uns für die jeweiligen Plattformen eine Mono 2.10 kompilieren mussten (da 2.10 noch nicht offiziell freigegeben ist).

Weitere Informationen zu Mono auf der Mono-Webseite oder in Wikipedia.

Razor:

Aber was ist da mit Razor? Warum ist Razor explizit nicht dabei?

Das Problem mit der View-Engine "Razor" ist, dass Micorosft deren Sourcecode - im Gegensatz zum restlichen MVC - nicht freigegeben hat. Toll! Was nun?

Da die .Net-Assemblies (Binaries, .DLL's) binärkompatibel mit Mono sind, kann man diese grundsätzlich auf "allen" Plattformen ausführen - sofern sie keinen plattformabhängigen Programmcode beinhalten (ähnlich wie z.B. die Java-Packete).

So haben wir folgende Assemblies direkt vom Windows-PC auf das open-embedded Linux-Board in das bin-Verzeichnis der Webanwendung kopiert:

System.Web.Helpers.dll
System.Web.Mvc.dll
System.Web.Razor.dll
System.Web.WebPages.Deployment.dll
System.Web.WebPages.dll
System.Web.WebPages.Razor.dll

Bei den WebPages-Assemblies handelt es ich um den gemeinsamen Code von ASP.Net WebForms und ASP.Net MVC. Dies wurde mit ASP.Net 4 eingeführt.

Den Rest der MVC 3 Webanwendung haben wir als Sourcecode 1:1 von Windows auf das Linux-Board kopiert.

Lizenz-Hinweis: Dieses Kopieren der Razor-Assemblies ist nicht im Sinne von Microsoft. Ob das für das jeweilige Projekt in Frage kommt, muss jeder selbst entscheiden. Es kann zu Lizenzrechtlichen Problemen mit Microsoft führen.

Webserver:

Als Webserver, welche unsere Mono-Anwendung hostet, stehen diverse Optionen zur Verfügung. Wir haben uns mangels Ressourcen auf den Linux-Boards (1GHz CPU, 512 MB Harddisk, 200 MB Memory) für den XSP-Webserver entschieden. Wird mehr Traffic erwartet wäre evtl. ein Webserver wie Apache oder Nginx die bessere Wahl. In unserem Fall wird die Webanwendung aber nur als Web-Admin-GUI von Supportpersonal verwendet.

Der XSP-Webserver ist ein in C# geschriebener OpenSource-Webserver. Pro CLR-Version gibt es ein ausführbares Assembly (XSP2: .Net 2, XSP3: .Net 3.5, XSP4: .Net 4). Dieser Webserver wird im Hauptverzeichnis der Webanwendung gestartet und mit optionalen Parametern (z.B. Portnummer etc.) konfiguriert.

Der Webserver kompiliert dann - wie auch der Casini (DevServer) oder der IIS auf Windows - die Webanwendung on-the-fly. Einziger Unterschied: er merkt nicht, wenn der Sourcecode geändert wird, während er läuft und kompiliert diesen entsprechend erst bei einem Neustart erneut. Damit lässt sich allerdings gut leben.

Mono-Probleme:

Mit diesem Setup lief die Webanwendung welche wir unter Visual Studio 2010 und Casini/IIS entwickelt haben fast auf Anhieb unter Linux. Folgende zwei Probleme mussten noch umgangen werden:

  1. Mono versteht den neuen @:-Syntax nicht. Dieser wird verwendet um nach Bedarf ein automatisches HTML-Encoding zu erhalten. Stattdessen muss der alte Syntax verwendet werden: @= oder @Html.Encode() - je nachdem ob mit oder ohne HTML-Encoding.
  2. Das zweite Problem trat beim Kompilieren der Razor-Views auf. Wir hatten einen Html-Helper welcher optionale (und somit benannte) C#-Parameter verwendete. Dies wurde von Mono innerhalb der Razor-View nicht korrekt erkannt. Anscheinend ist das ein bekannter Mono-Fehler welcher in einer künftigen Mono-Version gefixt wird. Wir haben die optionalen Parameter mit Oldschool überladenen Funtionen (=oveloading) ersetzt.

E voila! Die MVC 3 Razor Webanwendung läuft auf den embedded Linux-Boards und kann unter Windows mit VisualStudio 2010 entwickelt werden!

Kennzahlen:

Da die embedded Linux-Geräte in Sachen Ressourcen nicht mit herkömmlichen PC's mithalten können, interessierte uns der Ressourcenhunger von Mono, XSP und der Webanwendung natürlich brennend. Hier waren wir erneut positiv überrascht.

Der Prozess, welcher die Mono-Runtime, den XSP-Webserver, MVC 3, Razor und unsere Webanwendung hostet ist auf folgende Ressourcen gekommen:

  • Gestartet im Ruhezustand mit gefülltem Output-Cache: 66MB reservierter Speicher wovon 33% effektiv verwendet wurden, 0% CPU
  • Unter Last: 66MB reservierter Speicher wovon 33% effektiv verwendet wurden, 15% CPU

Diese Zahlen sind ohne jegliche Optimierungen und somit mit dem Standard Output-Cache von ASP.Net, den vollen Mono-Libraries etc. Hier könnte noch an einigen Stellen optimiert werden. Auf der Mono-Webseite sind hierzu Dokumentationen verfügbar wie die Mono-Runtime minimiert werden kann.

Wir haben bis jetzt noch nicht weiter optimiert, das diese Kennzahlen für unsere Anwendungen ausreichten. Ob sich das in Zukunft noch ändert wird sich zeigen.

Fazit:

Nachdem die Linux-Experten Mono 2.10 und XSP4 auf den open-embedded Linux-Geräten zum kompilieren gebracht hatten, war die eigentliche Portierung von Windows/.Net auf Linux/Mono erstaunlich unproblematisch und in ca. 1-2 Stunden erledigt. Neben der eigentlichen Webanwendung haben wir auch die eine oder andere Konsolenanwendung 1:1 von Windows auf Linux/Mono kopiert und laufen gelassen - auch das problemlos. Wir waren alle überrascht, wie einfach und schnell das ging und wie gut Mono da out-of-the-box läuft. Wir hatten uns das im Vorfeld aufwendiger und problembehafteter Vorgestellt.

Saving & Loading Performance in Quino

The GenericObject in Quino had recently undergone a performance overhaul, as documented in the article, Improving performance in GenericObject...but we weren't finished yet.

I'm going to assume that you read the overview on "How Data Objects are Implemented" and understand what the GenericObject actually is. In the other article, we optimized performance when creating objects in-memory and when loading and setting values. Those optimizations were driven by an application that used Quino data in a highly specialized way. In this article, we address other performance issues that came up with another Quino application, this one a more classical client for a database.

To be more precise, the performance of the Quino application itself was satisfactory, but an importer for existing customer data was so slow as to be almost useless for testing -- because it took hours instead of minutes.

So out came the YourKit Profiler for .NET again. As mentioned in the other article, we ran a part of the tests below (the smallest dataset) with tracing enabled, had YourKit show us the "Hot Spots", fixed those. Rinse, lather, repeat.

Charts and Methodology

As to methodology, I'm just going to cite the other article:

The charts below indicate a relative improvement in speed and memory usage. The numbers are not meant to be compared in absolute terms to any other numbers. In fact, the application being tested was a simple console application we wrote that created a bunch of objects with a bunch of random data. Naturally we built the test to adequately approximate the behavior of the real-world application that was experiencing problems. This test application emitted the numbers you see below.

Note: The vertical axis for all graphs uses a logarithmic scale.

Even though the focus was not on optimizing performance of creating objects in memory, we managed to squeeze another 30% out of that operation as well. Creating objects in memory means creating the C# object and setting default values as required by the metadata.

image

The "Saving New Objects to PostgreSql" test does not indicate how many objects can be saved per second with Quino. The data is based on a real-world model and includes some data on a timeline, the maintenance of which requires queries to be made after an object is saved in order to maintain the integrity of the timeline. So, the numbers below include a lot of time spent querying for data as well.

Still, you can see from the numbers below that saving operations got slower the more objects there were. Saving 150k objects in one large graph is now 20x faster than previous versions.

image

This final number is relatively "clean" in that it really only includes time spent reading data from the database and creating objects in memory from it. That there are more objects in the resulting graph than were saved in the previous step is due to the way the data was loaded, not due to an error. The important thing was to load a lot of data, not to maintain internal consistency between tests.

image

Again, though the focus was on optimizing save performance, loading 250k objects is now twice as fast as it was in previous versions.

These improvements are available to any application using Quino 1.6.2.1 and higher.