1 2 3 4 5 6
Cross MonoTouch off the list

This article originally appeared on earthli News and has been cross-posted here.


Apple presented the iPhone OS 4.0 late last week. The new version includes hundreds of new API calls for third-party developers, including long-sought-after support for multi-tasking. The changes extended to the licensing agreement for iPhone developers, with section 3.3.1 getting considerable modification, as documented in the article, Adobe man to Apple: 'Go screw yourself' by Cade Metz. That section now reads:

Applications must be originally written in Objective-C, C, C++, or JavaScript as executed by the iPhone OS WebKit engine, and only code written in C, C++, and Objective-C may compile and directly link against the Documented APIs (e.g., Applications that link to Documented APIs through an intermediary translation or compatibility layer or tool are prohibited).

That doesn't sound too good for Adobe, which had planned to allow direct compilation of iPhone applications from Flash in CS5. And it doesn't sound too good for MonoTouch either, which allows developers to write iPhone applications using the .Net framework and the C# language. The license for iPhone 3.2 prevented applications from using interpreters or virtual machines, but both CS5 and MonoTouch steered clear of those problems by compiling directly to iPhone OS machine code.

The new wording in section 3.3.1 seems to be Apple's attempt to exclude these technologies with about as much subtelety as a five-year--old making up new rules during a game he invented. The official response, MonoTouch and iPhone OS 4, is understandably upbeat: they've already invested way too much time and effort to give up now. Their optimism that "[a]pplications built with MonoTouch are native applications indistinguishable from native applications" (whatever that means) seems suspiciously desperate since MonoTouch applications are written against the .NET framework in the C#-language, which means that they are most certainly not "written in C, C++, and Objective-C".

Maybe the MonoTouch project will continue to be able to build iPhone applications that have a hope of being accepted by the iPhone App Store. But the rewording of section 3.3.1 puts the power to discontinue support wholly in Apple's hands. Developers would be silly to get on board with MonoTouch now without a far more explicit show of support from Apple. MonoTouch is putting on a brave face and promises that "[s]upport for iPhoneOS 4.0 on MonoTouch will be arriving soon."

A typically well--thought-out article, Why Apple Changed Section 3.3.1 by John Gruber details what the new wording means for Apple. And the answer, as usual, is control. It "makes complete sense" from Apple's perspective of "ruthless competitiveness". Apple is using the popularity of its platform to force developers to only spend time developing for Apple's platform instead of for multiple platforms simultaneously.

Flash CS5 and MonoTouch arent so much cross-platform as meta-platforms. Adobes goal isnt to help developers write iPhone apps. Adobes goal is to encourage developers to write Flash apps that run on the iPhone (and elsewhere) instead of writing iPhone-specific apps. Apple isnt just ambivalent about Adobes goals in this regard it is in Apples direct interest to thwart them.

There are aesthetic arguments to be made that cross-platform applications sully an operating system. There are very few of them that are truly well-integrated -- and those that are take a tremendous amount of time, patience and versions to get that far. On the OS X platform especially, it's incredibly easy to spot applications that were made exclusively for OS X and those that were ported from another operating system. It's truly like night and day. Preferring native applications, however, is a good deal different than banning non-native ones. As a C# developer with a large library of code I'd like to use, I can no longer assure clients that an iPhone application is easily achievable -- not without spending a lot of time and money learning Objective-C, the XCode toolset and the Cocoa APIs. Jobs and Co. would argue that I have no business developing applications for a platform without an intimate knowledge of its APIs, but that's philosophical until they see the end-product.

Simply banning a procedure for building applications because the end-product may be unsatisfactory seems arbitrarily iron-fisted. Apple has always reserved the right to determine which Apps show up in the App Store and which do not. (As of this writing, Apple has been "evaluating" Opera Mini for the iPhone for almost 20 days.) That's why Gruber's analysis probably does get the real reason right: Apple's doing it because (A) they can and (B) they retain more control and (C) most of their users don't care one way or the other and (D) there are enough iPhone developers willing to follow Apple's rules and make mountains of money for Apple.

Backing up this impression is an actual, honest-to-God response from El Jobso, as documented in the post Steve Jobs response on Section 3.3.1 by Greg Slepak, where Jobs says that "Grubers post is very insightful" and goes on to say that Apple prefers native applications because:

[...] intermediate layers between the platform and the developer ultimately produces sub-standard apps and hinders the progress of the platform.

As discussed above, though such layers may produce sub-standard apps -- and often do -- one does not necessarily follow from the other. That is, Jobs is merely hand-waving, arguing that a decision made for cut-throat business reasons was made in the interests of quality. There will always be developers writing bad software with Apple's tools and there would have been developers writing insanely great software using CS5 or MonoTouch.

Apple actually already had what could be considered a user-friendly and customer-oriented program in place: They were able to reject bad applications individually. Is Jobs arguing that cross-platform tools were creating so many bad applications that Apple was losing profits just from the time and effort involved in rejecting them? Or does Jobs fear the flood of Flash-to-iPhone applications descending on Cupertino with the advent of CS5?

Maybe Apple will bow to pressure and modify the section again -- it wouldn't be the first time a company tried to get away with something and had to backtrack. In the end, though, Apple can do what it wants with its platform -- and it plans to.

Building pseudo-DSLs with C# 3.5

This article originally appeared on earthli News and has been cross-posted here.


DSL is a buzzword that's been around for a while and it stands for [D]omain-[S]pecific [L]anguage. That is, some tasks or "domains" are better described with their own language rather than using the same language for everything. This gives a name to what is actually already a standard practice: every time a program assumes a particular format for an input string (e.g. CSV or configuration files), it is using a DSL. On the surface, it's extremely logical to use a syntax and semantics most appropriate to the task at hand; it would be hard to argue with that. However, that's assuming that there are no hidden downsides.

DSL Drawbacks

And the downsides are not inconsequential. As an example, let's look at the DSL "Linq", which arrived with C# 3.5. What's the problem with Linq? Well, nothing, actually, but only because a lot of work went into avoiding the drawbacks of DSLs. Linq was written by Microsoft and they shipped it at the same time as they shipped a new IDE -- Visual Studio 2008 -- which basically upgraded Visual Studio 2005 in order to support Linq. All of the tools to which .NET developers have become accustomed worked seamlessly with Linq.

However, it took a little while before JetBrains released a version of ReSharper that understood Linq...and that right there is the nub of the problem. Developer tools need to understand a DSL or you might as well just write it in Notepad. The bar for integration into an IDE is quite high: developers expect a lot these days, including:

  • The DSL must include a useful parser that pinpoints problems exactly.
  • The DSL syntax must be clear and must support everything a developer may possibly want to do with it.1
  • The DSL must support code-completion.
  • ReSharper should also work with the DSL, if possible.
  • And so on...

What sounds, on the surface, like a slam-dunk of an idea, suddenly sounds like a helluva lot more work than just defining a little language2. That's why Encodo decided early on to just use C# for everything in its Quino framework, wherever possible. The main part of a Quino application is its metadata, or the model definition. However, instead of coming up with a language for defining the metadata, Encodo lets the developer define the metadata using a .NET-API, which gives that developer the full power of code-completion, ReSharper and whatever other goodies they may have installed to help them get their jobs done.

Designing a C#-based DSL

Deciding to use C# for APIs doesn't mean, however, that your job is done quickly: you still have to design an API that not only works, but is intuitive enough to let developers use it with as little error and confusion as possible.

I recently extended the API for building metadata to include being able to group other metadata into hierarchies called "layouts". Though the API is implementation-agnostic, its primary use will initially be to determine how the properties of a meta-class are laid out in a form. That is, most applications will want to have more control over the appearance than simply displaying the properties of a meta-class in a form from first-to-last, one to a line.

In the metadata itself, a layout is a group of other elements; an element can be a meta-property or another group. A group can have a caption. Essentially, it should look like this when displayed (groups are surrounded by []; elements with <>):

[MainTab]
-----------------------------------
|  <Company>
|  [MainFieldSet]
|  --------------------------------
|  |  <Contact>
|  |  [ <FirstName> <LastName> ]
|  |  <Picture>
|  |  <Birthdate>
|  --------------------------------
|  [ <IsEmployee> <Active> ]
-----------------------------------

From the example above, we can extract the following requirements:

  1. Groups can be nested.
  2. Groups can have captions, but a caption is not required.
  3. An element can be an anonymous group, a named group or an individual metadata element.

Design Considerations

One way of constructing this in a traditional programming language like C# is to create a new group when needed, using a constructor with a caption or not, as needed. However, I also wanted to make a DSL, which has as little cruft as possible; that is, I wanted to avoid redundant parameters and unnecessary constructors. I also wanted to avoid forcing the developer to provide direct references to meta-property elements where it would be more comfortable to just use the name of the property instead.

To that end, I decided to avoid making the developer create or necessarily provide the actual destination objects (i.e. the groups and elements); instead, I would build a parallel set of throwaway objects that the developer would either implicitly or explicitly create. The back-end could then use those objects to resolve references to elements and create the target object-graph with proper error-checking and so on. This approach also avoids getting the target metadata "dirty" with properties or methods that are only needed during this particular style of construction.

Defining the Goal

I started by writing some code in C# that I thought was both concise enough and offered visual hints to indicate what was being built. That is, I used whitespace to indicate grouping of elements, exactly as in the diagram from the requirements above.

Here's a simple example, with very little grouping:

builder.AddLayout(
  personClass, "Basic", 
  Person.Relations.Contact,
  new LayoutGroup(Person.Fields.FirstName, Person.Fields.LastName),
  Person.Fields.Picture,
  Person.Fields.Birthdate
  new LayoutGroup(Person.Fields.IsEmployee, Person.Fields.Active)
);

The code above creates a new "layout" for the class personClass named "Details". That takes care of the first two parameters; the much larger final parameter is an open array of elements. These are primarily the names of properties to include from personClass (or they could also be the properties themselves). In order to indicate that two properties are on the same line, the developer must group them using a LayoutGroup object.

Here's a more complex sample, with nested groups (this one corresponds to the original requirement from above):

builder.AddLayout(
  personClass, "Details", 
  new LayoutGroup("MainTab",
    Person.Relations.Company,
    new LayoutGroup("MainFieldSet",
      Person.Relations.Contact,
      new LayoutGroup(Person.Fields.FirstName, Person.Fields.LastName),
      Person.Fields.Picture,
      Person.Fields.Birthdate
    ),
    new LayoutGroup(Person.Fields.IsEmployee, Person.Fields.Active)
  )
);

In this example, we see that the developer can also use a LayoutGroup to attach a caption to a group of other items, but that otherwise everything pretty much stays the same as in the simpler example.

Finally, a developer should also be able to refer to other layout definitions in order to avoid repeating code (adhering to the D.R.Y. principle3). Here's the previous example redefined using a reference to another layout (highlighted):

builder.AddLayout(
  personClass, "Basic", 
  Person.Relations.Contact,
  new LayoutGroup(Person.Fields.FirstName, Person.Fields.LastName),
  Person.Fields.Picture,
  Person.Fields.Birthdate
);

builder.AddLayout(
  personClass, "Details", 
  new LayoutGroup("MainTab",
    Person.Relations.Company,
    new LayoutGroup("MainFieldSet",
      new LayoutReference("Basic");
    )),
    new LayoutItems(Person.Fields.IsEmployee, Person.Fields.Active)
  ))
);

Implementation

Now that I had an API I thought was good enough to use, I had to figure out how to get the C# compiler to not only accept it, but also to give me the opportunity to build the actual target metadata I wanted.

The trick ended up being to define a few objects for the different possibilities -- groups, elements, references, etc. -- and make them implicitly convert to a basic LayoutItem. Using implicit operators allowed me to even convert strings to meta-property references, like this:

public static implicit operator LayoutItem(string identifier)
{
  return new LayoutItem(identifier);
}

Each of these items has a reference to each possible type of data and a flag to indicate which of these data are valid and can be extracted from this item. The builder receives a list of such items, each of which may have a sub-list of other items. Processing the list is now as simple as iterating them with foreach, something like this:

private void ProcessItems(IMetaGroup group, IMetaClass metaClass, LayoutItem[] items)
{
  foreach (var item in items)
  {
    if (!String.IsNullOrEmpty(item.Identifier))
    {
      var element = metaClass.Properties[item.Identifier];
      group.Elements.Add(element);
    }
    else if (item.Items != null)
    {
      var subGroup = CreateNextSubGroup(group);
      group.Elements.Add(subGroup);
      ProcessItems(subGroup, metaClass, item.Items.Items);
    }
    else if (item.Group != null)
    {
      ...
    }
    else (...)
  }
}

If the item was created from a string, the builder looks up the property to which it refers in the meta-class and add that to the current group. If the item corresponds to an anonymous group, the builder creates a new group and calls adds the items to it recursively. Here we can see how this solution spares the application developer the work of looking up each and every referenced property in application code. Instead, the developer's code stays clean and short.

Naturally, my solution has many more cases but the sample above should suffice to show how the full solution works.

Cleaning it up

The story didn't just end there, as there are limitations to forcing C# to doing everything we'd like. The primary problem came from distinguishing between the string that is the caption from strings that are references to meta-properties. To avoid this problem, I was forced to introduce a LayoutItems class for anonymous groups and reserve the LayoutGroup for groups with captions.

I was not able to get the implementation to support my requirements exactly as I'd designed them, but it ended up being pretty close. Below is the first example from the requirements, but changed to accommodate the final API; all changes are highlighted.

builder.AddLayout(
  personClass, "Details", 
  new LayoutGroup("MainTab", new LayoutItems(
    Person.Relations.Company,
    new LayoutGroup("MainFieldSet", new LayoutItems(
      Person.Relations.Contact,
      new LayoutItems(Person.Fields.FirstName, Person.Fields.LastName),
      Person.Fields.Picture,
      Person.Fields.Birthdate
    )),
    new LayoutItems(Person.Fields.IsEmployee, Person.Fields.Active)
  ))
);

All in all, I'm pretty happy with how things turned out: the API is clear enough that the developer should be able to both visually debug the layouts and easily adjust them to accommodate changes. For example, it's quite obvious how to add a new property to a group, move a property to another line or put several properties on the same line. Defining this pseudo-DSL in C# lets the developer use code-completion, popup documentation and the full power of ReSharper and frees me from having to either write or maintain a parser or development tools for a DSL.



  1. Even Linq has its limitations, of course, notably when using together with Linq-to-Entities in the Entity Framework. One obvious limitation in the first version is that "Contains" or "In" are not directly supported, requiring the developer to revert to yet another DSL, ESQL (Entity-SQL).

  2. Before getting the moniker "DSL", the literature referred to such languages as "little languages".

  3. On a side note, Encodo recently looked into the Spark View Engine for .NET MVC. Though we decided not to use it because we don't really need it yet, we were also concerned that it has only nascent support for code-completion and ReSharper in its view-definition language.

Designing a small API: Bit manipulation in C#

This article originally appeared on earthli News and has been cross-posted here.


A usable API doesn't usually spring forth in its entirety on the first try. A good, usable API generally arises iteratively, improving over time. Naturally, when using words like good and usable, I'm obliged to define what exactly I mean by that. Here are the guidelines I use when designing an API, in decreasing order of importance:

Static typing & Compile-time Errors

Wherever possible, make the compiler stop the user from doing something incorrectly instead of letting the runtime handle it.

Integrates into standard practices

That is, do not invent whole new ways of doing things; instead, reuse or build on the paradigms already present in the language.

Elegance

Ideally, using the API should be intuitive, read like natural language and not involve a bunch of syntactic tricks or hard-to-remember formulations or parameter lists.

Clean Implementation

The internals should be as generalized and understandable as possible and involve as little repetition as possible.

CLS-Compliance

Cross-language compliance is also interesting and easily achieved for all but the most low-level of APIs

Using those guidelines, I designed an API to manage bits and sets of bits in C#. Having spent a lot of time using Delphi Pascal, I'd become accustomed to set and bit operations with static typing. In C#, the .Net framework provides the Set generic type, but that seems like overkill when the whole idea behind using bits is to use less space. That means using enumerated types and the FlagsAttribute; however, there are some drawbacks to using the native bit-operations directly in code:

  1. Bit-manipulation is more low-level than most of the rest of the coding a C#-developer typically does. That, combined with doing it only rarely, makes direct testing of bits error-prone.
  2. The syntax for testing, setting and removing bits is heavy with special symbols and duplicated identifiers.

To demonstrate, here is a sample:

[Flags]
enum TestValues
{
  None = 0,
  One = 1,
  Two = 2,
  Three = 4,
  Four = 8,
  All = 15,
}

// Set bits one and two:
var bitsOneAndTwo = TestValues.One | TestValues.Two;

// Remove bit two :
var bitOneOnly = bitsOneAndTwo & ~TestValues.Two;

// Testing for bit two:
if ((bitsOneAndTwo & TestValues.Two) == TestValues.Two)
{
  ...
}

As you can see in the example above, setting a bit is reasonably intuitive (though it's understandable to get confused about using | instead of & to combine bits). Removing a bit is more esoteric, as the combination of & with the ~ (inverse) operator is easily forgotten if not often used. Testing for a bit is quite verbose and extending to testing for one of several flags even more so.

Version One

Therefore, to make things easier, I decided to make some extension methods for these various functions and ended up with something like the following:

public static void Include<T>(this T flags, T value) { ... }
public static void Exclude<T>(this T flags, T value) { ... }
public static bool In<T>(this T flags, T value) { ... }
public static void ForEachFlag<T>(this T flags, Action<T> action) { ... }

These definitions compiled and worked as expected, but had the following major drawbacks:

  • At the time, we were only using them with enum values, but code completion was offering the methods for all objects because there was no generic constraint on T.
  • Not only that, but much of the bit-manipulation code needed to know the base type of the arguments in order to be able to cast it to and from the correct types. There were a lot of checks, but it all happened at runtime.
  • The ForEachFlag() function was implemented as a lambda when it is clearly an iteration. Using a lambda instead makes it impossible to use break or continue with this method.

This version, although it worked, broke several of the rules outline above; namely: while it did offer compile-time checking, the implementation had a lot of repetition in it and the iteration did not make use of the common library enumeration support (IEnumerable and foreach). That the operations were available for all objects and polluted code-completion only added insult to injury.

Version Two

A natural solution to the namespace-pollution problem is to add a generic constraint to the methods, restricting the operations to objects of type Enum, as follows:

public static void Include<T>(this T flags, T value)
  where T : Enum
{ ... }

public static void Exclude<T>(this T flags, T value)
  where T : Enum
{ ... }

public static bool In<T>(this T flags, T value)
  where T : Enum
{ ... }

public static void ForEachFlag<T>(this T flags, Action<T> action)
  where T : Enum
{ ... }

.NET enum-declarations, however, do not inherit from Enum; instead, they inherit from Int32, by default, but can also inherit from a handful of other base types (e.g. byte, Int16). This makes sense so that enum-values can be freely converted to and from these base values. Not only will a generic constraint as defined above not have the intended effect, it's explicitly disallowed by the compiler. So, that's a dead-end.

The other, more obvious way of restricting the target type of an extension method is to change the type of the first parameter from T to something else. However, since enum types don't inherit from Enum, what type do we use? Well, it turns out that Enum is a strange type, indeed. It can't be used in a generic constraint and does not serve as the base class for enumerated types but, when used as the target of an extension method, it magically applies to all enumerated types!

I took advantage of this loophole to build the next version of the API, as follows:

public static void Include<T>(this Enum flags, T value) { ... }
public static void Exclude<T>(this Enum flags, T value) { ... }
public static bool In<T>(this T flags, Enum value) { ... }
public static void ForEachFlag<T>(this Enum flags, Action<T> action) { ... }

This version had two advantages over the first version:

  1. The methods are only available for enumerated types instead of for all types, which cleans up the code-completion pollution.
  2. The implementation could take advantage of the Enum.GetTypeCode() method instead of the is and as-operators to figure out the type and cast the input accordingly.

Version Three

After using this version for a little while, it became obvious that there were still problems with the implementation:

  1. Though using Enum as the target type of the extension method was a clever solution, it turns out to be a huge violation of the first design-principle outlined above: The type T for the other parameters is not guaranteed to conform to Enum. That is, the compiler cannot statically verify that the bit being checked (value) is of the same type as the bit-set (flags).
  2. The solution only works with Enum objects, where it would also be appropriate for Int32, Int64 objects and so on.
  3. The ForEach method still has the same problems it had in the first version; namely, that it doesn't allow the use of break and continue and therefore violates the second design-principle above.

A little more investigation showed that the Enum.GetTypeCode() method is not unique to Enum but implements a method initially defined in the IConvertible interface. And, as luck would have it, this interface is implemented not only by the Enum class, but also by Int32, Int64 and all of the other types to which we would like to apply bit- and set-operations.

Knowing that, we can hope that the third time's a charm and redesign the API once again, as follows:

public static void Include<T>(this T flags, T value)
  where T : IConvertible
{ ... }

public static void Exclude<T>(this T flags, T value)
  where T : IConvertible
{ ... }

public static bool In<T>(this T flags, T value)
  where T : IConvertible
{ ... }

public static void ForEachFlag<T>(this T flags, Action<T> action)
  where T : IConvertible
{ ... }

Now we have methods that apply only to those types that support set- and bit-operations (more or less1). Not only that, but the value and action arguments are once again guaranteed to be statically compliant with the flags arguments.

With two of the drawbacks eliminated with one change, we converted the ForEachFlag method to return an IEnumerable<T> instead, as follows:

public static IEnumerable<T> GetEnabledFlags<T>(this T flags)
  where T : IConvertible
{ ... }

The result of this method can now be used with foreach and works with break and continue, as expected. Since the method also now applies to non-enumerated types, we had to re-implement it to return the set of possible bits for the type instead of simply iterating the possible enumerated values returned by Enum.GetValues().2

This version satisfies the first design principles (statically-typed, standard practice, elegant) relatively well, but is still forced to make concessions in implementation and CLS-compliance. It turns out that the IConvertible interface is somehow not CLS-compliant, so I was forced to mark the whole class as non-compliant. On the implementation side, I was avoiding the rather clumsy is-operator by using the IConvertible.GetByteCode() method, but still had a lot of repeated code, as shown below in a sample from the implementation of Is:

switch (flags.GetTypeCode())
{
  case TypeCode.Byte:
    return (byte)(object)flags == (byte)(object)value;
  case TypeCode.Int32:
    return (int)(object)flags == (int)(object)value;
  ...
}

Unfortunately, bit-testing is so low-level that there is no (obvious) way to refine this implementation further. In order to compare the two convertible values, the compiler must be told the exact base type to use, which requires an explicit cast for each supported type, as shown above. Luckily, this limitation is in the implementation, which affects the maintainer and not the user of the API.

Since implementing the third version of these "BitTools", I've added support for Is (shown partially above), Has, HasOneOf and it looks like the third time might indeed be the charm, as the saying goes.


[Flags]
enum TestValues
{
  None = 0,
  One = 1,
  Two = 2,
  OneOrTwo = 3,
  All = 3,
}

That is, foreach (Two.GetEnabledFlags()) { ... } should return only Two and foreach (All.GetEnabledFlags()) { ... } should return One and Two.


  1. The IConvertible interface is actually implemented by other types, to which our bit-operations don't apply at all, like double, bool and so on. The .NET library doesn't provide a more specific interface -- like "INumeric" or "IIntegralType" -- so we're stuck constraining to IConvertible instead.

  2. Which, coincidentally, fixed a bug in the first and second versions that had returned all detected enumerated values -- including combinations -- instead of individual bits. For example, given the type shown below, we only ever expect values One and Two, and never None, OneAndTwo or All.

Waiting for C# 4.0: A casting problem in C# 3.5

This article originally appeared on earthli News and has been cross-posted here.


C# 3.5 has a limitation where generic classes don't necessarily conform to each other in the way that one would expect. This problem manifests itself classically in the following way:

class A { }
class B : A { }
class C : A { }

class Program
{
  void ProcessListOfA(IList<A> list) { }
  void ProcessListOfB(IList<B> list) { }
  void ProcessSequenceOfA(IEnumerable<A> sequence) { }
  void ProcessSequenceOfB(IEnumerable<B> sequence) { }

  void Main()
  {
    var bList = new List<B>();
    var aList = new List<A>();

    ProcessListOfA(aList); // OK
    ProcessListOfB(aList); // Compiler error, as expected
    ProcessSequenceOfA(aList); // OK
    ProcessSequenceOfB(aList); // Compiler error, as expected

    ProcessListOfA(bList); // Compiler error, unexpected!
    ProcessListOfB(bList); // OK
    ProcessSequenceOfA(bList); // Compiler error, unexpected!
    ProcessSequenceOfB(bList); // OK
  }
}

Why are those two compiler errors unexpected? Why shouldn't a program be able to provide an IList<B> where an IList<A> is expected? Well, that's where things get a little bit complicated. Whereas at first, it seems that there's no down side to allowing the assignment -- B can do everything expected of A, after all -- further investigation reveals a potential source of runtime errors.

Expanding on the example above, suppose ProcessListOfA() were to have the following implementation:

void ProcessListOfA(IList<A> list)
{
  if (SomeCondition(list))
  {
    list.Add(new C());
  }
}

With such an implementation, the call to ProcessListOfA(bList), which passes an IList<B> would cause a runtime error if SomeCondition() were to return true. So, the dilemma is that allowing co- and contravariance may result in runtime errors.

A language design includes a balance of features that permit good expressiveness while restricting bad expressiveness. C# has implicit conversions, but requires potentially dangerous conversions to be made explicit with casts. Similarly, the obvious type-compatibility outlined in the first example is forbidden and requires a call to the System.Linq.Enumerable.Cast<T>(this IEnumerable) method instead. Other languages -- most notably Eiffel -- have always allowed the logical conformance between generic types, at the risk of runtime errors.1

Some of these limitations will be addressed in C# 4.0 with the introduction of covariance. See Covariance and Contravariance (C# and Visual Basic) and LINQ Farm: Covariance and Contravariance in C# 4.0 for more information.

A (Partial) Solution for C# 3.5

Until then, there's the aforementioned System.Linq.Enumerable.Cast<T>(this IEnumerable) method in the system library. However, that method, while very convenient, makes no effort to statically verify that the input and output types are compatible with one another. That is, a call such as the following is perfectly legal:

var numbers = new [] { 1, 2, 3, 4, 5 };
var objects = numbers.Cast<object>(); // OK
var strings = numbers.Cast<string>(); // runtime error!

Instead of an unchecked cast, a method with a generic constraint on the input and output types would be much more appropriate in those situations where the program is simply avoiding the generic-typing limitation described in detail in the first section. The method below does the trick:

public static IEnumerable<TOutput> Convert<TInput, TOutput>(this IEnumerable<TInput> input)
  where TInput : TOutput
{
  if (input == null) { throw new ArgumentNullException("input"); }

  if (input is IList<TOutput>) { return (IList<TOutput>)input; }

  return input.Select(obj => (TOutput)(object)obj);
}

While it's entirely possible that the Cast() function from the Linq library is more highly optimized, it's not as safe as the method above. A check with Redgate's Reflector would probably reveal just how that method actually works. Correctness come before performance, but YMMV.2

The initial examples can now be rewritten to compile without casting:

ProcessListOfA(bList.Convert<B, A>()); // OK
ProcessListOfB(bList); // OK
ProcessSequenceOfA(bList.Convert<B, A>()); // OK
ProcessSequenceOfB(bList); // OK

One More Little Snag

Unlike the Enumerable.Cast<TOutput>() method, which has no restrictions and can be used on any IEnumerable, there will be places where the compiler will not allow an application to use Convert<TOutput>(). This is because the generic constraint to which TOutput must conform (TInput) is, in some cases, not statically provable (i.e. at compile-time). A concrete example is shown below:

abstract class A
{
  abstract IList<TResult> GetObject<TResult>();
}

class B<T> : A
{
  public override IList<TResult> GetObject<TResult>() 
  {
    return _objects.Convert<T, TResult>(); // Compile error!
  }

  private IList<T> _objects;
}

The example above does not compile because TResult does not provably conform to T. A generic constraint on TResult cannot be applied because it would have to be applied to the original, abstract function, which knows nothing of T. In these cases, the application will be forced to use the System.Linq.Enumerable.Cast<T>(this IEnumerable) instead.



  1. I've addressed this issue before in Static-typing for languages with covariant parameters, which reviewed the paper, Type-safe covariance: Competent compilers can catch all catcalls, a proposal for statically identifying potential runtime errors and requiring them to be addressed with a recast definition. Similarly, another runtime plague -- null-references -- is also addressed in Eiffel, a feature extensively documented in the paper, Attached types and their application to three open problems of object-oriented programming.

  2. YMMV = "Your Mileage May Vary", but remember, Donald Knuth famously said: "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."

Creating fluent interfaces with inheritance in C#

This article originally appeared on earthli News and has been cross-posted here.


Fluent interfaces -- or "method chaining" as it's also called -- provide an elegant API for configuring objects. For example, the Quino query API provides methods to restrict (Where or WhereEquals), order (OrderBy), join (Join) and project (Select) data. The first version of this API was very traditional and applications typically contained code like the following:

var query = new Query(Person.Metadata);
query.WhereEquals(Person.Fields.Name, "Müller");
query.WhereEquals(Person.Fields.FirstName, "Hans");
query.OrderBy(Person.Fields.LastName, SortDirection.Ascending);
query.OrderBy(Person.Fields.FirstName, SortDirection.Ascending);
var contactsTable = query.Join(Person.Relations.ContactInfo);
contactsTable.Where(ContactInfo.Fields.Street, ExpressionOperator.EndsWithCI, "Strasse");

(This example gets all people named "Hans Müller" that live on a street with a name that ends in "Strasse" (case-insensitive) sorted by last name, then first name. Fields and Relations refer to constants generated from the Quino metadata model.)

Fluent Examples

The syntax above is very declarative and relatively easy-to-follow, but is a bit wordy. It would be nice to be able to chain together all of these calls and remove the repeated references to query. The local variable contactsTable also seems kind of superfluous here (it is only used once).

A fluent version of the query definition looks like this:

var query = new Query(Person.Metadata);
query.WhereEquals(Person.Fields.Name, "Müller")
  .WhereEquals(Person.Fields.FirstName, "Hans")
  .OrderBy(Person.Fields.LastName, SortDirection.Ascending)
  .OrderBy(Person.Fields.FirstName, SortDirection.Ascending)
  .Join(Person.Relations.ContactInfo)
    .Where(ContactInfo.Fields.Street, ExpressionOperator.EndsWithCI, "Strasse");

The example uses indenting to indicate that restriction after the join on the "ContactInfo" table applies to the "ContactInfo" table instead of to the "Person" table. The call to Join logically returns a reference to the joined table instead of the query itself. However, each such table also has a Query property that refers to the original query. Applications can use this to "jump" back up and apply more joins, as shown in the example below where the query only returns a person if he or she also works in the London office:

var query = new Query(Person.Metadata);
query.WhereEquals(Person.Fields.Name, "Müller")
  .WhereEquals(Person.Fields.FirstName, "Hans")
  .OrderBy(Person.Fields.LastName, SortDirection.Ascending)
  .OrderBy(Person.Fields.FirstName, SortDirection.Ascending)
  .Join(Person.Relations.ContactInfo)
    .Where(ContactInfo.Fields.Street, ExpressionOperator.EndsWithCI, "Strasse").Query
  .Join(Person.Relations.Office)
    .WhereEquals(Office.Fields.Name, "London");

A final example shows how even complex queries over multiple table levels can be chained together into one single call. The following example joins on the "ContactInfo" table to dig even deeper into the data by restricting to people whose web sites are owned by people with at least 10 years of experience:

var query = new Query(Person.Metadata);
query.WhereEquals(Person.Fields.Name, "Müller")
  .WhereEquals(Person.Fields.FirstName, "Hans")
  .OrderBy(Person.Fields.LastName, SortDirection.Ascending)
  .OrderBy(Person.Fields.FirstName, SortDirection.Ascending)
  .Join(Person.Relations.ContactInfo)
    .Where(ContactInfo.Fields.Street, ExpressionOperator.EndsWithCI, "Strasse")
    .Join(ContactInfo.Relations.WebSite)
      .Join(WebSite.Relations.Owner)
        .Where(Owner.Fields.YearsExperience, ExpressionOperator.GreaterThan, 10).Query
  .Join(Person.Relations.Office)
    .WhereEquals(Office.Fields.Name, "London");

This API might still be a bit too wordy for some (.NET 3.5 Linq would be less wordy), but it's refactoring-friendly and it's crystal-clear what's going on.

Implementation

When there's only one class involved, it's not that hard to conceive of how this API is implemented: each method just returns a reference to this when it has finished modifying the query. For example, the WhereEquals method would look like this:

IQuery WhereEquals(IMetaProperty prop, object value);
{
  Where(CreateExpression(prop, value);

  return this;
}

This isn't rocket science and the job is quickly done.

However, what if things in the inheritance hierarchy aren't that simple? What if, for reasons known to the Quino framework architects, IQuery actually inherits from IQueryCondition, which defines all of the restriction and ordering operations. The IQuery provides projection and joining operations, which can easily just return this, but what type should the operations in IQueryCondition return?

The problem area is indicated with question marks in the example below:

public interface IQueryCondition
{
  ??? WhereEquals(IMetaProperty prop, object value);
}

public interface IQueryTable : IQueryCondition
{
  IQueryTable Join(IMetaRelation relation);
}

public interface IQuery : IQueryTable
{
  IQueryTable SelectDefaultForAllTables();
}

The IQueryCondition can't simply return IQueryTable because it might be used elsewhere1, but it can't return IQueryCondition because then the table couldn't perform a join after a restriction because applying the restriction would have restricted the fluent interface to an IQueryCondition instead of an IQueryTable.

The solution is to make IQueryCondition generic and pass it the type that it should return instead of hard-coding it.

public interface IQueryCondition<TSelf>
{
  TSelf WhereEquals(IMetaProperty prop, object value);
}

public interface IQueryTable : IQueryCondition<IQueryTable>
{
  IQueryTable Join(IMetaRelation relation);
}

public interface IQuery : IQueryTable
{
  IQueryTable SelectDefaultForAllTables();
}

That takes care of the interfaces, on to the implementation. The standard implementation runs into a small problem when returning the generic type:

public class QueryCondition<TSelf> : IQueryCondition<TSelf>
{
  TSelf WhereEquals(IMetaProperty prop, object value)
  {
    // Apply restriction

    return (TSelf)this; // causes a compile error
  }
}

public class QueryTable : QueryCondition<IQueryTable>, IQueryTable
{
  IQueryTable Join(IMetaRelation relation) 
  {
    // Perform the join

    return result;
  }
}

public class Query : IQuery
{
  IQueryTable SelectDefaultForAllTables()
  {
    // Perform the select

    return this;
  }
}

One simple solution to the problem is to cast down to object and back up to TSelf, but this is pretty bad practice as it short-circuits the static checker in the compiler and defers the problem to a potential runtime one.

public class QueryCondition<TSelf> : IQueryCondition<TSelf>
{
  TSelf WhereEquals(IMetaProperty prop, object value)
  {
    // Apply restriction

    return (TSelf)(object)this;
  }
}

In this case, it's guaranteed by the implementation that this is compliant with TSelf, but it would be even better to solve the problem without resorting to the double-cast above. As it turns out, there is a simple and quite elegant solution, using an abstract method called ThisAsTSelf, as illustrated below:

public abstract class QueryCondition<TSelf> : IQueryCondition<TSelf>
{
  TSelf WhereEquals(IMetaProperty prop, object value)
  {
    // Apply restriction

    return ThisAsTSelf();
  }

  protected abstract TSelf ThisAsTSelf();
}

public class Query : IQuery
{
  protected override TSelf ThisAsTSelf()
  {
    return this;
  }
}

The compiler is now happy without a single cast at all because Query returns this, which the compiler knows conforms to TSelf. The power of a fluent API is now at your disposal without restricting inheritance hierarchies or making end-runs around the compiler. Naturally, the concept extends to multiple levels of inheritance (e.g. if all calls had to return IQuery instead of IQueryTable), but it gets much uglier, as it requires nested generic types in the return types, which makes it much more difficult to understand. With a single level, as in the example above, the complexity is still relatively low and the resulting API is very powerful.



  1. And, in Quino, it is used elsewhere, for the IQueryJoinCondition.

Pre-generating Entity Framework (EF) Views

These instructions apply to the 1.x release of EF and its designer integration into Visual Studio 2008.

Overview

The Entity Framework requires what it calls "views" in order to access a database. EF generates these views automatically if they are not available. In order to avoid generating these views at application startup, they can be pre-generated and stored as C# code.

A post-build step in the compilation process would be the ideal place for this, but there's a snag: the view generator needs the entity model data stored as files in the output directory whereas the deployment would rather have them stored as resources. An EF model has an option that indicates whether it is stored as files or as a resource; knowing that, we could set up the build like this:

  1. Toggle all model files to generate model data as files
  2. Build
  3. Generate the views
  4. Toggle all model files to generate model data as resources
  5. Build

Generating the Views

However, if the model has not changed (as is usually the case when a project is mature), there is no need to waste time regenerating the views and building twice. Therefore, we've settled on the following manual method for updating the views:

  1. Open each model (.edm) file and change the "Metadata Artifact Processing" property to "Copy to Output Directory".
  2. Build the application to force the model files to be generated.
  3. Run a batch file to generate the views (shown below).
  4. Change the "Metadata Artifact Processing" property back to "Embed in Output Assembly".
  5. Build again to embed the model as a resource and bind the newly generated views.

The Batch File

Here is a sample command you can execute in order to generate your views:

"%windir%\Microsoft.NET\Framework\v3.5\EdmGen.exe" ^
  /mode:ViewGeneration ^
  /language:CSharp ^
  /nologo ^
  "/inssdl:D:\Encodo\projects\customer\project\bin\Debug\Models\EntityModel.ssdl" ^
  "/incsdl:D:\Encodo\projects\customer\project\bin\Debug\Models\EntityModel.csdl" ^
  "/inmsl:D:\Encodo\projects\customer\project\bin\Debug\Models\EntityModel.msl" ^
  "/outviews:D:\Encodo\projects\customer\project\Models\EntityModel.Views.cs"
  • Change "EntityModel" to the name of your own model file.
  • Make sure to include the "EntityModel.Views.cs" in your project to actually use the generated views.
Microsoft Code Contracts: Not with a Ten-foot Pole

This article originally appeared on earthli News and has been cross-posted here. In the meantime, a lot has changed and the major complaint -- a lack of explicit contracts in C# -- will finally be addressed in the next version of C#, 4.0.


After what seems like an eternity, a mainstream programming language will finally dip its toe in the Design-by-contract (DBC) pool. DBC is a domain amply covered in one less well-known language called Eiffel (see ISE Eiffel Goes Open-Source for a good overview), where preconditions, postconditions and invariants of various stripes have been available for over twenty years.

Why Contracts?

Object-oriented languages already include contracts; "classic" signature-checking involves verification of parameter counts and type-conformance. DBC generally means extending this mechanism to include assertions on a higher semantic level. A method's signature describes the obligations calling code must fulfill in order to execute the method. The degree of enforcement varies from language to language. Statically-typed languages verify types according to conformance at compile-time, whereas dynamically-typed languages do so at run-time. Even the level of conformance-checking differs from language to language, with statically-typed languages requiring hierarchical conformance via ancestors and dynamically-typed languages verifying signatures via duck-typing.

And that's only for individual methods; methods are typically collected into classes that also have a semantic meaning. DBC is about being able to specify the semantics of a class (e.g. can property A ever be false when property B is true?) as well as those of method parameters (can parameter a ever be null?) using the same programming language.

Poor-man's DBC

DBC is relatively tedious to employ without framework or language support. Generally, this takes the form of using Debug.Assert1 at the start of a method call to verify arguments, throwing ArgumentExceptions when the caller did not satisfy the contract. Post-conditions can also be added in a similar fashion, at the end of the funtion. Naturally, without library support, post-conditions must be added before any return-statements or enclosed in an artificial finally-clause around the rest of the method body. Class invariants are even more tedious, as they must be checked both at the beginning and end of every single "entering" method call, where the "entering" method call is the first on the given object. A proper implementation may not check the invariant for method calls that an object calls on itself because its perfectly all right for an object to be in an invalid state until the "entering" method returns.

One assertion that arises quite often is that of requiring that a parameter be non-null in a precondition. An analysis of most code bases that used poor-man's DBC will probably reveal that the majority of its assertions are of this form. Therefore, it would be nice to handle this class of assertion separately using a language feature that indicates that a particular type can statically never be null. Eiffel has added this support with a separate notation for denoting "attached" types (types that are guaranteed to be attached to a non-null reference). Inclusion of such a feature not only improves the so-called "provability" of programs written in that language, it also transforms null-checking contracts to another notation (e.g. in Eiffel, objects are no longer nullable by default and the ?-operator is used to denote nullability) and removes much of the clutter from the precondition block.

Without explicit language support, a DBC solution couched in terms of assertions and/or exceptions quickly leads to clutter that obscures the actual program logic. Contracts should be easily recognizable as such by both tools and humans. Ideally, the contract can be extracted and included in documentation and code completion tooltips. Eiffel provides such support with separate areas for pre- and post-conditions as well as class invariants. All assertions can be labeled to give them a human-readable name, like "param1_not_null" or "list_contains_at_most_one_element". The Eiffel tools provide various views on the source code, including what they call the "short" view, showing method signatures and contracts without implementation, as well as the "short flat" view, which is the "short" view, but includes all inherited methods to present the full interface of a type.

Looking at "Code Contracts"

Other than Eiffel, no close-to-mainstream programming language2 has attempted to make the implicit semantics of a class explicit with DBC. Until now. Code Contracts will be included in C# 4.0, which will be released with Visual Studio 2010. It is available today as a separate assembly and compatible with C# 3.5 and Visual Studio 2008, so no upgrade is required to start using it. Given the lack of an upgrade requirement, we can draw the conclusion that this contracting solution is library-only without any special language support.

That does not bode well; as mentioned above, such implementations will be limited in their support of proper DBC. The user documentation provides an extensive overview of the design and proper use of Code Contracts.

There are, as expected, no new keywords or language support for contracts in C# 4.0. That means that tools and programmers will have to rely on convention in order to extract semantic meaning from the contracts. Pre- and postconditions are mixed together at the top of the method call. Post-conditions have support for accessing the method result and original values of arguments. Contracts can refer to fields not visible to other classes and there is an attribute-based hack to make these fields visible via a proxy property.

Abstract classes and Interfaces

Contracts for abstract classes and interfaces are, simply put, a catastrophe. Since these constructs don't have method implementations, they can't contain contracts. Therefore, in order to attach contracts to these constructs -- and, to be clear, the mechanism would be no improvement over the current poor-man's DBC if there was no way to do this -- there is a ContractClass attribute. Attaching contracts to an interface involves making a fake implementation of that interface, adding contracts there, hacking expected results so that it compiles, presumably adding a private constructor so it can't be instantiated by accident, then referencing it from the interface via the attribute mentioned above. It works, but it's far from pretty and it move the contracts far from the place where it would be intuitive to look for them.

No Support for Precondition Weakening

Just as the specification side is not so pretty, the execution side also suffers. Contracts are, at least, inherited, but preconditions cannot be weakened. That is, a sub-type -- and implementations of interfaces with contracts are sub-types -- cannot add preconditions; end of story. As soon as a type contains at least one contract on one method, all methods in that type without contracts are interpreted as specifying the "empty" contract.

Instead of simply acknowledging that precondition weakening could be a useful feature, the authors state:

While we could allow a weaker precondition, we have found that the complications of doing so outweigh the benefits. We just haven't seen any compelling examples where weakening the precondition is useful.

Let's have an example, where we want to extend an existing class with support for a fallback mechanism. In the following case we have a transmitter class that sends data over a server; the contracts require that the server be reachable before sending data. The descendant adds support for a second server over which to send, should the first be unreachable. All examples below have trimmed initialization code that guarantees non-null properties for clarity's sake. All contracts are included.

**class** Transmitter
{
  **public** Server Server { **get**; }

  **public virtual void** SendData(Data data)
  {
     Contracts.Requires(data != null);
     Contracts.Requires(Server.IsReachable);
     Contracts.Ensures(data.State == DataState.Sent);

     Server.Send(data);
  }

  [ContractInvariantMethod]
  **protected void** ObjectInvariant
  {
    Contract.Invariant(Server != null);
  }
}

**class** TransmitterWithFallback : Transmitter
{
  **public** Server FallbackServer { **get**; }

  **public override void** SendData(Data data)
  {
     // **contract violation**

     // If "Server" is not reachable, we will never be given
     // the opportunity to send using the fallback server
  }

  [ContractInvariantMethod]
  **protected void** ObjectInvariant
  {
    Contract.Invariant(FallbackServer != null);
  }
}

We can't actually implement the fallback without adjusting the original contracts. With access to the code for the base class, we could address this shortcoming by moving the check for server availability to a separate method, as follows:

**class** Transmitter
{
  **public** Server Server { **get**; }

  [Pure]
  **public virtual bool** ServerIsReachable 
  { 
    **get** { return Server.IsReachable; }
  }

  **public virtual void** SendData(Data data)
  {
     Contracts.Requires(data != null);
     Contracts.Requires(ServerIsReachable);
     Contracts.Ensures(data.State == DataState.Sent);

     Server.Send(data);
  }

  [ContractInvariantMethod]
  **protected void** ObjectInvariant
  {
    Contract.Invariant(Server != null);
  }
}

**class** TransmitterWithFallback : Transmitter
{
  **public** Server FallbackServer { **get**; }

  [Pure]
  **public override bool** ServerIsReachable 
  { 
    **get** { return Server.IsReachable || FallbackServer.IsReachable; }
  }

  **public override void** SendData(Data data)
  {
    if (Server.IsReachable)
    {
      base.SendData(data);
    }
    else
    {
      FallbackServer.Send(data);
    }
  }

  [ContractInvariantMethod]
  **protected void** ObjectInvariant
  {
    Contract.Invariant(FallbackServer != null);
  }
}

With careful planning in the class that introduces the first contract -- where precondition contracts are required to go -- we can get around the lack of extensibility of preconditions. Let's take a look at how Eiffel would address this. In Eiffel, the example above would look something like the following3:

**class** TRANSMITTER
  **feature**
    server: SERVER

    send_data(data: DATA) **is**
    **require**
      server.reachable
    **do**
      server.send(data)
    **ensure**
      data.state = DATA_STATE.sent;
    **end
end**

**class** TRANSMITTER_WITH_FALLBACK
  **inherits**
    TRANSMITTER
      **redefine**
        send_data
      **end
  feature**
    fallback_server: SERVER

    send_data (data: DATA) **is**
      **require else**
        fallback_server.reachable
      **do**
        **if** server.reachable **then**
          Precursor;
        **else**
          fallback_server.send(data)
        **end
      end
end**

The Eiffel version has clearly separated boundaries between contract code and implementation code. It also did not require a change to the base implementation in order to implement a useful feature. The author of the library has that luxury, whereas users of the library would not and would be forced to use less elegant solutions.

To sum up, it seems that, once again, the feature designers have taken the way out that makes it easier on the compiler, framework and library authors rather than providing a full-featured design-by-contract implementation. It was the same with the initial generics implementation in C#, without co- or contra-variance. The justification at the time was also that "no one really needed it". C# 4.0 will finally include this essential functionality, belying the original assertion.

Thumbs Up or Thumbs Down?

The implementation is so easy-to-use that even the documentation leads off by warning that:

a word of caution: Static code checking or verification is a difficult endeavor. It requires a relatively large effort in terms of writing contracts, determining why a particular property cannot be proven, and finding a way to help the checker see the light. [...] If you are still determined to go ahead with contracts [...] To not get lost in a sea of warnings [...] (emphasis added)

Not only is that not ringing, that's not even an endorsement.

Other notes on implementation include:

  • Testing frameworks require scaffolding to redirect contract exceptions to the framework instead of an assertion dialog.
  • There is no support for edit-and-continue in contracted assemblies. Period. Contracting injects code into assemblies during the compile process, which makes them unusable for the edit-and-continue debugger.4
  • Because of this instrumentation, expect medium to massive slowdowns during compilation; the authors recommend enabling contracts in a special build instead of in all DEBUG builds. This is a ridiculous restriction as null-checks and other preconditions are useful throughout the development process, not just for pre-release testing. Poor-man's DBC is currently enabled in all builds; a move to MS Contracts with the recommended separate build would remove this support, weakening the development process.
  • Some generated code (e.g. Windows Forms code) currently causes spurious errors that must be suppressed by manually editing that generated code. Such changes will be wiped out as soon as a change is made in the Winforms designer.

Because the feature is not a proper language extension, the implementation is forced within the bounds of the existing language features. A more promising implementation was Spec# -- which extended the C# compiler itself -- but there hasn't been any activity on that project from Microsoft Research in quite some time. There are, however, a lot of interesting papers available there which offer a more developer-friendly insight into the world of design-by-contract than the highly compiler-oriented point-of-view espoused by the Contracts team.

This author will be taking a pass on the initial version of DBC as embodied by Microsoft Contracts.



  1. With which this author is acquainted.

  2. Examples use C# 3.5 unless otherwise noted.

  3. Please excuse any and all compile errors, as I haven't got access to a current Eiffel installation and am piecing this example together from documentation and what I remember about writing Eiffel code.

  4. This admission goes a long way toward explaining why code with generics and lambdas cannot be changed in an edit-and-continue debugging session. These language features presumably also rely on rewriting, instrumentation and code-injection.

An analysis of C# language design

This article originally appeared on earthli News in 2004 and has been cross-posted here. In the meantime, a lot has changed and the major complaint -- a lack of explicit contracts in C# -- will finally be addressed in the next version of C#, 4.0.


A Conversation with Anders Hejlsberg (Part I: The C# Design Process --- the process used by the team that designed C#, and the relative merits of usability studies and good taste in language design.) is a four-part series on the ideas that drove the design of C#. (The link is to the first page of the first section; browse to Artima.com Interviews to see a list of all the sections.)

Virtual vs. Static

I found some points of interest in Part IV, Versioning, Virtual, and Override (Part IV: Versioning, Virtual, and Override --- why C# instance methods are non-virtual by default and why programmers must explicitly indicate an override.), which Anders Hjelsberg (designer of both Delphi Pascal and C#) chats about the reasoning behind making methods non-virtual by default.

One answer is performance; he cites method usage in Java: "We can observe that as people write code in Java, they forget to mark their methods final. ... Because they're virtual, they don't perform as well." Another cited reason is 'versioning', which seems to be another term for formal contracts. Lack of versioning accounts for API instability in most software systems and C#'s approach, or lack thereof, is discussed in more detail later. First, let's examine the arguments supporting performance as a reason to make methods static by default.

In Java's case, methods are virtual by necessity; since classes can always be loaded into the namespace and their bytecode interpreted, methods must be virtual in case a descendant is loaded that overrides the method. In C#'s case, assemblies are built with a known 'universe' of classes (to borrow a term from the Eiffel world) -- there is no need to leave methods virtual in case other classes are loaded.

Leaving methods as statically linked by default puts the burden on the developer. That is, the developer must explicitly decide whether a method should be virtual or not. This prevents you from designing, then optimizing; you are immediately faced with the question: can a descendent legitimately redefine this method?

Private data

There are those who claim one can always answer this question. They are the same ones who squirrel variables away in 'private' areas, right when you would need it in your descendant most. Private features (data or methods visible only to the current class) limit the number of uses to which a class can be put: if a class has the correct interface, but an unacceptable implementation, a programmer is forced to define an entirely new, non-conforming class or, at the very least, to duplicate code in order to get the desired effect. Inheritance provides 'is a' semantics; if a class is another class, why is it valid that it can't see parts of itself?

Marking methods as 'final' (Java) or leaving them non-virtual (C#) and using private fields is akin to saying "I have created an infallible design and the private implementation is beyond reproach".

This is an especially dangerous attitude to take in library code. Library code is incorporated into other products; clients of the library will often define classes based on library classes. What if some part of a class doesn't function correctly, or works differently than expected, or desired? Can a client class alter the behavior of the library class enough? Or does the client need to alter library source code, or, worse yet, do without functionality, because the library class doesn't allow that kind of modification? Is a client forced to simply rewrite the entire class in a non-conforming class to get functionality that the library almost provides?

Implicit contracts

To this you may say "there are certain things you should not be able to do with a class". Fair enough, a good design imbues every class with a purpose and provides it with an API that fulfills it. However, what does it mean to say "you should not be able to do" something with a class? Does your class explicitly define an intent? The intent of a class is, at best, stored explicitly in external documentation. At worst, it is defined implicitly in the API; the secret of a class's purpose is stored in the visibility of features (private/protected/public) and in the redefinablity of methods. Even if the documentation is defined in the class itself, it is stored in comments that are beyond the reach of the compiler. The purpose of a class can't be verified or enforced at compile-time or run-time.

We come once again to the notion of contracts. Contracts to help the compiler, to help the developer, and to help the end-user or client. Contracts to make documentation simpler and clearer and explicit rather than implicit. All software enforces contracts; almost no programming language provides mechanisms for making these contracts explicit -- C# is no exception.

Easy Way Out

Language designers today have no imagination, creating clone after clone after clone. There's a reason C# looks so much like Java: given the choice between making writing software in the language easy and writing a compiler for the language easy, they go for an easy compiler every time. Neither of these languages lets a programmer express a design without immediately worrying about implementation. Anders Hjelsberg explains why C# took the easy way out:

The academic school of thought says, "Everything should be virtual, because I might want to override it someday." The pragmatic school of thought, which comes from building real applications that run in the real world, says, "We've got to be real careful about what we make virtual."

Now it's clear: whiners who are sick of working for their compilers are "academic, ... [not] pragmatic" and don't know about the "real world". That's a pretty specious argument, since he hasn't backed up his assertion with any evidence (other than the performance argument, which, while perfectly valid, is still addressable on the compiler side, as explained below).

Consider the question of whether to make methods static or dynamic by default. A compiler-friendly language makes everything static, forcing the programmer to explicitly mark redefinable methods with a keyword. A nicer language would make all methods dynamic. If a descendant redefines a method, it is compiled as dynamic. All methods in program (the universe of classes available at compile time) that are not redefined can safely be statically compiled.

Helpful features

A corollary to this is how a language handles function inlining. Inlining replaces a function call with the body of the function itself to increase performance for smaller functions. C and C++ still have an explicit 'inline' keyword. C# thankfully does not and has rightly chosen to put the burden of choosing which functions to inline on a compiler that examines the heuristics of the entire program. Since it's a newer compiler, there are still a few kinks to work out (Poor inline optimization of C#/JIT), but C# is headed in the right direction.

Another issue affecting a language's usability is its redefinition policy. When is a method considered a redefinition of another method? C++ has the absolute worst policy in this respect, assuming a redefinition as soon as a method with the same signature in an ancestor is marked as 'virtual'. If the signature of the 'virtual' method or the redefinition changes, it is simply assumed to no longer be a redefinition. What fun!

C# has thankfully adopted the policy of explicit redefinition, forcing a method with the same signature to be marked as an 'override'. The method being redefined must, of course, be marked as 'virtual' when defined (as explained above).

These are the language features that lighten the load for a programmer. Garbage collection is another such feature that C# got right. Given garbage collection, a developer can freely design structures without immediately considering which object is responsible for what. The accompanying complications of 'has' and 'uses' falls away in almost all cases and a design can be much more easily mapped to the language without accommodating memory management so early in the process. It is still possible to have memory problems with garbage collection (a dangling pointer no longer causes a crash, but instead causes inconsistent logic and excessive memory usage). Nevertheless, languages that provide garbage collection allow elegant designs that require a lot of scaffolding code in non-memory-managed languages.

Back to versioning

Anders goes on at length about the problem of 'versioning':

When we make something virtual ... we're making an awful lot of promises about how it evolves in the future. ... When we publish a virtual method in an API, we not only promise that when you call this method, x and y will happen. We also promise that when you override this method, we will call it in this particular sequence with regard to these other ones and the state will be in this and that invariant.

What promises? C# has no contracting mechanism, so discussion of promises is limited to non-functional documentation and perhaps the method name, which implies what it does. Though he mentions an "invariant", which is presumably the class invariant, there is no mechanism for specifying one: how can you prove that code broke an implicit contract?

He continues talking about contracts, noting that "[v]irtual has two sides to it: the incoming and the outgoing". He talks all around the notion of contracts and documentation and the pitfalls associated with trusting developers to write documentation that shows "what you're supposed to do when you override a virtual method". Documentation should include information about "[w]hat are the invariants before you're called? What should be true after?". At this point, you're screaming with frustration that a man so seemingly knowledgeable of Design-by-Contract decided to leave everything implicit in his language. He acknowledges the problem of contracting, then, in the same breath, leaves the entirety of the solution up to the developer. Not only does C# have no way of specifying these obviously important and troublesome contracts, its designer has invented whole new terms (ingoing/outgoing instead of precondition/postcondition) in a seemingly willful ignorance of existing Design-by-Contract theory.

As justification for this somewhat fuzzy 'versioning' concept he's espousing, he mentions that "[w]henever [Java or C++ languages] introduce a new method in a base class, if someone in a derived class had a method of that same name, that method is now an override" Honestly, that has nothing to with contracts or making sure redefinitions enforce the same contracts; that's simply about explicit redefinition rather than implicit signature-matching. It's a trivial language feature that C# got exactly right, but, lacking contracts of any sort, how is C# any better at managing valid redefinitions than Java or C++? If a method is marked as 'virtual' in C#, a redefinition can do whatever it likes, including nothing.

The 'versioning' problem is not solved; it is simply no longer applicable to all methods because many more methods are static. That's not an advancement; that's removing functionality in order to prevent programmers from breaking things. Putting the burden on the developer limits the expressiveness of the language and constrains the power of solutions you can build in that language. Just because you might break a window with a hammer doesn't mean it's better to build a house without one.

Given a proper contracting mechanism in the language, "ingoing and outgoing" contracts could be explicitly specified in the language. A redefinition of such a method would inherit the ancestor method's contracts and be forced to support them. A method redefinition is free to weaken the precondition, but must support or strengthen the inherited postcondition. In addition, contracts at the class scope should be defined in a class invariant, which is checked before and after each method call, to ensure that the class is in a stable state before executing code on it.

There is a Design by Contract Framework for C# available, but it's only a library extension, and like all non-language implementations of Design-by-Contract, is only a pale imitation of the power afforded by a language-level solution. It's a real shame to see a language designer who knows so much about the pitfalls of programming and does so little to help the users of his language avoid them.

It's not the first time this has happened and it won't be the last. So many programmers are sticking with C++ because it has at least some form of generics (C++ templates are not truly generic, but are nonetheless extremely useful). Java, a language whose programs are littered with typecasts because of a lack of generics, plans to finally introduce generics after ten years. C# also skipped generics in the first version, and introduces them in the next revision, 2.0. One can only wonder when and if either will ever support contracting or whether we have to sit back and wait another ten years for a new language.

If you can't wait that long and want a language that has real generics, allows no private data, compiles non-redefined methods as static, has automatic inlining, explicit redefinition, garbage collection and incorporates a rich contract mechanism, try Eiffel.

Remote Debugging with [ASP].NET

When a .NET application exhibits behavior on a remote server that cannot be reproduced locally, you'll need to debug application directly on the server. The following article includes specific instructions for debugging ASP.NET applications, but applies just as well to standalone executables.

Prerequisites

There are several prerequisites for remote debugging; don't even bother trying until you have all of the items on the following list squared away or the Remote Debugger will just chortle at your naiveté.

  • The SERVER must have the Visual Studio Remote Debugging Monitor installed.
  • The firewall must be opened for Visual Studio on the client (which means that ReSharper sees other instances); remote debugging involves two-way communication.
  • A local user, BOB, with administrator rights on the client machine.
  • A server user, BOB, with administrator rights on the SERVER machine.
  • The names must match.
  • The monitor must be started on the SERVER using BOB (using "Run as...")
  • If you're not debugging in the same domain, then you have to change the options to use the Server name in the options to "BOB@SERVER"

Before you think you can get all fancy and simply debug remotely without authentication, know this: unauthenticated, native debugging does not support breakpoints, so forget it. You'll technically be able to connect to a running application but, without breakpoints, you'll only be able to watch any pre-existing debug output appear on the console, if that.

Firewall ports

The following ports must be open in order for Remote Debugging to function correctly in all situations:

**Protocol**    **Port**     **Service Name**
TCP         139      File and Printer Sharing
TCP         445      File and Printer Sharing
UDP         137      File and Printer Sharing
UDP         138      File and Printer Sharing
UDP         4500     IPsec (IKE NAT-T)
UDP         500      IPsec (IKE)
TCP         135      RPC Endpoint Mapper and DCOM infrastructure

Additionally, the application "Microsoft Visual Studio 2008" must be in the exceptions list on the client and "Visual Studio Remote Debugging Monitor" must be in the exceptions list on the server.

Recommendations

Once you've satisfied the requirements above, you should probably also heed the following tips: it's best to read about them now rather than learn about them the hard way later:

  • Make sure to turn off recycling and auto-shutdown for the AppPool while debugging, so you don't run the risk of your PID suddenly being gone.
  • Make sure that you're using debug versions of all assemblies where you want to debug or you'll be staring at IL assembly code more often than you'd like.
  • Make sure your local source code is in-sync with the source code on the SERVER or you'll be debugging on the wrong lines at best or be staring at IL assembly at worst.
  • It's best if the path to your local symbols is also valid and writable on the server so that symbols cached during remote debugging can be stored on the server. Check the "Options..Debugging..Symbols" to change that path if you need to. (there's more about this below if this doesn't make sense)

Test Run

Here are steps you can follow to debug an application remotely. These steps worked for me, but the remote debugging situation seems to be extremely hit-or-miss, so your mileage may vary.

  1. Open your web project in Visual Studio and compile it in debug mode.
  2. Outside of Visual Studio, build a deployment version of the web site and copy it to the SERVER.
  3. If this is the first time setting it up, move the application to its own ApplicationPool, so you can detect it more easily later.
  4. If you haven't already, install the Visual Studio Remote Debugging Monitor to the SERVER.
  5. Make sure you have a local user on that machine with your own user name, USER.
  6. Start the Visual Studio Remote Debugging Monitor using "Run as..." and entering USER on that server.
  7. When it has started, select "File..Options" from the menu and change the server name to USER@SERVERNAME.
  8. From within Visual Studio, select "Debug..Attach to Process" from the menu.
  9. In the dialog, specify the USER@SERVERNAME you used in the debugging monitor above and hit Refresh.
  10. Scroll down until you see the "w3wp.exe" processes

You've set up the server and attached to it so far. If anything has gone wrong, check the troubleshooting section below to see if your problem is addressed there. Now, the next steps are optional if you think you can identify your process without knowing the PID (Process ID). This is generally the case only when yours is the only .NET application deployed to that server. In that case, your process is the "w3wp.exe" process which includes "managed code". If you don't know your PID, follow the optional instructions below to figure out which one is yours.

  1. From the client machine, download the attached script "ASP.NET PID Detector" and open it in a text editor.
  2. Change the machineName, appPoolName and url to match the settings for your application on the SERVER. (this is the reason we put our application into its own application pool at the beginning.)
  3. Save the file as a different name (probably with the machine name and server in the title).
  4. Execute the file and follow instructions; it will probably launch your web site in IE. It will probably also claim to have failed. Run it again and it will give you the PID of your application on the server.

If that didn't work, then you probably aren't configured to query WMI remotely; your only options are to try to run it remotely using the instructions and tips below or to run it from the server.

  • If you have remote desktop access to the server, then copy the script to the server and configure the batch file to query the local script and server (recommended).
  • Turn off the Windows Firewall on the server completely (not recommended if the server is open to the internet).
  • Follow instructions at Enable WMI (Windows Management Instrumentation) to enable remote administration through the firewall. Not only must you execute a special command to configure the Firewall (only available from the command line) but your user must also have the correct permissions (also not recommended, as enabling WMI can open up the server in unexpected ways if you don't know what you're doing). I did not attempt this, as I could simply run the PID-detector from the server.

Once you have the PID in hand, continue:

  1. Select the "w3wp.exe" process with your PID and double-click it to attach to that process.
  2. It will ask whether remote symbols can be stored on the server in the given location. You should say yes but it will try to save those symbols on the server using the same path you use for storing symbols locally on your development machine.1
  3. Set a breakpoint where desired; the breakpoint should be solid red. If it is, you're done.
  4. Browse the application in IE to trigger the breakpoint and debug away.

Troubleshooting

As you can probably tell from the massive list of prerequisites and recommendations as well as the 20-step guide to triggering a breakpoint, there's a lot that can go wrong with Remote Debugging. It's not insurmountable, but it's not something you're going to want to attempt unless your job pretty much depends on it. These are some of the errors I encountered along the way and how I addressed them.

Unable to connect to the Microsoft Visual Studio Remote Debugging Monitor named 'USER@SERVER'. The Visual Studio Remote Debugger on the target computer cannot connect back to this computer. Authentication failed. Please see Help for assistance.

You need to create a local administrator with the same password as the one you're using on the server to run the debugging monitor.

Unable to connect to the Microsoft Visual Studio Remote Debugging Monitor named 'USER@SERVER'. The Visual Studio Remote Debugger on the target computer cannot connect back to this computer. A firewall may be preventing communication via DCOM to the local computer. Please see Help for assistance.

You opened the firewall, but only for computers on the same subnet. The computer to which you are connecting is probably not on the same subnet, so you'll need to go to the firewall settings and open them up all the way (Visual Studio will not ask again). To edit the firewall settings, do the following:

  1. Open the "Windows Firewall" control panel.
  2. Select the "Exceptions" tag.
  3. Scroll to the "Microsoft Visual Studio 2008" entry and double-click it.
  4. From the dialog, press the "Change Scope" button.
  5. Change it to "Any computer (including those on the Internet)".
  6. Press "Ok" three times to save changes.

It's also possible that the Remote Debugger is being blocked on the server side. To address this, run the "Visual Studio 2008 Remote Debugger Configuration Wizard" again; if the wizard wants to adjust firewall settings, let it do so (for internal or external networks, as appropriate to your situation -- if you're not sure, use external). To make sure that the settings were applied, run the wizard again; it should ask you about running the service, but should no longer complain about the firewall.

If it still complains about the firewall, then you've got another problem, which is that the setup is having trouble adjusting the settings for the firewall but isn't telling you that it's utterly failing when it attempts to do so. Verify that you're running the wizard as a user that has permission to adjust the firewall settings.

Unable to connect to the Microsoft Visual Studio Remote Debugging Monitor named 'USER@SERVER'. Logon failure: unknown user name or bad password. See help for more information.

The user with which you are executing Visual Studio on the client does not exist on the server or has a different password. In order to avoid adding useless user accounts to the server's domain, you should restart your IDE using "Run as..." to set the security context to the same user as you have on the server.

You can impersonate other users, but you have set a registry key; see Remote Debugging Under Another User Account for more information. This doesn't help though, if the user you are trying to use doesn't even have an account on the remote machine.

Conclusion

Remote debugging sounds way cool and is the major difference between the Standard and Professional versions of Visual Studio, but it's not for the faint of heart or the inexperienced. If you Google around a bit, you'll notice that most people get a big heap of epic fail when they try it and I've tried to make as comprehensive guide to remote debugging as my own experience and time constraints allowed.

Here's hoping you never have to do remote debugging (write a test instead! smile) but, if you do, I wish you the best of luck.


This article originally appeared on earthli News and has been cross-posted here.


  1. I'm honestly not sure whether this is required or not, but I allowed it and it worked. It may also work without caching the symbols if the path can't be written.

The Dark Side of Entity Framework: Mapping Enumerated Associations

At Encodo, we're using the Microsoft Entity Framework (EF) to map objects to the database. EF treats everything -- and I mean everything -- as an object; the foreign key fields by which objects are related aren't even exposed in the generated code. But I'm getting ahead of myself a bit. We wanted to figure out the most elegant way of mapping what we are going to call enumerated associations in EF. These are associations from a source table to a target table where the target table is a lookup value of type int. That is, the enumerated association could be mapped to a C# enum instead of an object. We already knew what we wanted the solution to look like, as we'd implemented something similar in Quino, our metadata framework (see below for a description of how that works).

The goals are as follows:

  1. Properties of the enumerated type are stored in the database, including its identifier, its value and a mapping to translations.
  2. Relations to the enumerated value are defined in the database as constraints.
  3. The database is therefore internally consistent
  4. C# code can work with an enumerated type rather than a sub-object; this avoids joining the enumerated type tables when retrieving the main object or restricting to the enumerated type's value.

EF encourages -- nay, requires -- that one develop the application model in the database. A database model consists of tables, fields and relationships between those tables. EF will map those tables, fields and relationships to classes, properties and sub-objects in your C# code. The properties used to map an association -- the foreign keys -- are not exposed by the Entity Framework and are simply unavailable in the generated code. You can, however, add custom code to your partial classes to expose those values1:

return Child.ParentReference.ID;

However, you can't use those properties with LINQ queries because those extra properties cannot be mapped to the database by EF. Without restrictions or orderings on those properties, they're as good as useless, so we'll have to work within EF itself.

Even though EF has already mapped the constraint from the database as a navigational property, let's add the property to the model as a scalar property anyway. You'll immediately be reprimanded for mapping the property twice, with something like the following error message:

Error 3007: Problem in Mapping Fragments starting at lines 1383, 1617: Non-Primary-Key column(s) [ColumnName] are being mapped in both fragments to different conceptual side properties - data inconsistency is possible because the corresponding conceptual side properties can be independently modified.

Since we're feeling adventurous, we open the XML file directly (instead of inside the designer) and remove the navigational property and association, then add the property to the conceptual model by hand. Now, we're reprimanded for not having mapped the association EF found in the database, with something like the following error message:

Error 11008: Association 'FOREIGN_KEY_NAME' is not mapped.

Not giving up yet, we open the model in the designer again and delete the offending foreign key from the diagram. Now, we get something like the following error message:

Error 3015: Problem in Mapping Fragments starting at lines 6680, 6699, 6716, 6724, 6801, 6807, 6815: Foreign key constraint 'FOREIGN_KEY_NAME' from table Source (SourceId) to table TargetType (Id):: Insufficient mapping: Foreign key must be mapped to some AssociationSet on the conceptual side.

The list of line numbers indicate where the foreign key we've deleted is still being referenced. Despite having used the designer to delete the key, EF has neglected to maintain consistency in the model, so it's time to re-open the model as XML and delete the remaining references to 'FOREIGN_KEY_NAME' manually.

We're finally in the clear as far as the designer and compiler are concerned, with the constraint defined as we want it in the database and EF exposing the foreign key as an integer -- to which we can assign a typecast enum -- instead of an object. This was the goal, so let's run the application and see what happens.

Everything works as expected and there are no nasty surprises waiting for us at runtime. We've got a much more comfortable way of working with the special case of enumerated types working in EF. This special case, arguably, comes up quite a lot; in the model for our application, about half of the tables contain enumerated data, which are used as lookups for reports.

It wasn't easy and the solution involved switching from designer to XML-file and back a few times2, but at least it works. However, before we jump for joy that we at least have a solution, let's pretend we've changed our database again and update the model from the database.

Oops.

The EF-Designer has detected the foreign key we so painstakingly deleted and re-established it without asking for so much as a by-your-leave, giving us the error of type 3007 shown above. We're basically back where we started ... and will be whenever anyone changes the database and updates the model automatically. At this point, it seems that the only way to actually expose the foreign key in the EF model is to remove the association from the database! Removing the constraint in the database, however, is unacceptable as that would destroy the relational integrity just to satisfy a crippled object mapper.

In a last-ditch effort, we can fool EF into thinking that the constraint has been dropped not by removing the constraint but by removing the related table from the EF model. That is, once EF no longer maps the destination table -- the one containing the enumerated data -- it will no longer try to map the constraint, mapping the foreign key as just another integer field.

This solution finally works and the model can be updated from the designer without breaking it -- as long as no one re-adds the table with the enumerated data. This is the solution we've chosen for all of our lookup data, establishing a second EF-model to hold those tables.

  • The main model contains non-enumerated data; relations to enumerated data end in integer fields instead of objects.
  • The lookup model contains a list of enumerated data tables; these are queried for the contents of drop-down lists and so on.
  • We defined an enumerated type in C# for each table in the lookup model, with values corresponding to the values that go in the lookup table.
  • We wrote a synchronizer to keep the data in the lookup tables synchronized with the enum-values in C#.
  • Business logic uses these enumerated types to assign the values to the foreign-key integer fields (albeit with a cast).

Using Quino to Solve the Problem

It's not a beautiful solution, but it works better than the alternative (using objects for everything). Quino, Encodo's metadata framework includes an ORM that addresses this problem much more elegantly. In Quino, if you have the situation outlined above -- a data table with a relation to a lookup table -- you define two classes in the metadata, pretty much as you do with EF. However, in Quino, you can specify that one class corresponds to an enumerated type and both the code generator and schema migrator will treat that meta-class accordingly.

  • The code generator maps relations with the enumerated class as the target to the C# enum instead of an object, automatically converting the underlying integer foreign key to the enumerated type and back.
  • The schema migrator detects differences between the C# enumerated type and the values available in the lookup table in the database and keeps them synchronized.
  • As simple integer enums, the values can be easily restricted and ordered without joining extra tables.
  • Generated code used the C# enumerated type, which ensures type-safety and code-completion, including documentation, in business code.

EF has a graphical designer, whereas Quino does not, but the designer only gets in the way for the situation outlined above. Quino offers an elegant solution for lookup values with only two lines of code: one to create the lookup class and indicate which C# enum it represents and one to create a property of that type on the target class. The Quino Demo (not yet publicly available) contains an example.



  1. You can also try to modify the T4 templates used to generate code, but that would be futile for reasons that follow.

  2. Which is, frankly, appalling, but hardly unexpected for a 1.0 product from Microsoft, which usually needs a few tries to get things working smoothly.