Saving & Loading Performance in Quino

The GenericObject in Quino had recently undergone a performance overhaul, as documented in the article, Improving performance in GenericObject...but we weren't finished yet.

I'm going to assume that you read the overview on "How Data Objects are Implemented" and understand what the GenericObject actually is. In the other article, we optimized performance when creating objects in-memory and when loading and setting values. Those optimizations were driven by an application that used Quino data in a highly specialized way. In this article, we address other performance issues that came up with another Quino application, this one a more classical client for a database.

To be more precise, the performance of the Quino application itself was satisfactory, but an importer for existing customer data was so slow as to be almost useless for testing -- because it took hours instead of minutes.

So out came the YourKit Profiler for .NET again. As mentioned in the other article, we ran a part of the tests below (the smallest dataset) with tracing enabled, had YourKit show us the "Hot Spots", fixed those. Rinse, lather, repeat.

Charts and Methodology

As to methodology, I'm just going to cite the other article:

The charts below indicate a relative improvement in speed and memory usage. The numbers are not meant to be compared in absolute terms to any other numbers. In fact, the application being tested was a simple console application we wrote that created a bunch of objects with a bunch of random data. Naturally we built the test to adequately approximate the behavior of the real-world application that was experiencing problems. This test application emitted the numbers you see below.

Note: The vertical axis for all graphs uses a logarithmic scale.

Even though the focus was not on optimizing performance of creating objects in memory, we managed to squeeze another 30% out of that operation as well. Creating objects in memory means creating the C# object and setting default values as required by the metadata.

image

The "Saving New Objects to PostgreSql" test does not indicate how many objects can be saved per second with Quino. The data is based on a real-world model and includes some data on a timeline, the maintenance of which requires queries to be made after an object is saved in order to maintain the integrity of the timeline. So, the numbers below include a lot of time spent querying for data as well.

Still, you can see from the numbers below that saving operations got slower the more objects there were. Saving 150k objects in one large graph is now 20x faster than previous versions.

image

This final number is relatively "clean" in that it really only includes time spent reading data from the database and creating objects in memory from it. That there are more objects in the resulting graph than were saved in the previous step is due to the way the data was loaded, not due to an error. The important thing was to load a lot of data, not to maintain internal consistency between tests.

image

Again, though the focus was on optimizing save performance, loading 250k objects is now twice as fast as it was in previous versions.

These improvements are available to any application using Quino 1.6.2.1 and higher.

Improving performance in GenericObject

Quino is Encodo's metadata framework, written in C#/.NET 4.0. Since its inception four years ago, we've used it in several products and the code base has been updated continuously.

However, it was only in a recent product that one of the central features of the framework came under scrutiny for performance issues. It turned out that reading and writing to Quino data objects was a bit slower than we needed it to be.

How Data Objects are Implemented

A typical ORM (like Hibernate or Microsoft's Entity Framework) uses a C# class as the base entity in the model, decorating those classes with attributes to add to the model. The ORM then uses this information to communicate with the database, reading and writing values through reflection. Creating objects and getting and setting values -- including default values -- is all done through direct calls to property getters and setters.

Quino took a different approach, putting the model at the center of the framework and defining an in-memory structure for the model that is accessible through a regular API rather than reflection. The actual C# classes used by business logic are then generated from this model -- instead of the other way around.

This decoupling of metadata from the classes has a lot of advantages, not the least of which is that Quino provides generalized access to any of these business objects. Components that work with Quino data do not need to be aware of the actual classes: instead, those components use the metadata and an API to read and write values. Since the interface is generalized, these values are get and set using Quino code rather than direct getters and setters.

As you would expect, there is a base class from which all Quino data objects inherit that provides the support for this interface, called GenericObject. It was in this central class that we had to go to work with a profiler to squeeze out some more speed.

Improving Performance

The actual use case for our data objects didn't even use our ORM, as such. Instead, we were generating the objects from a data stream with 0 to n columns defined (a perfect situation to use an object that supports a flexible interface).

Once those objects were created, they were handed off to the user interface, which applied them to a grid, replacing rows or updating values as required.

So, we needed to improve things on several fronts:

  • We needed to improve speed when creating objects because data was arriving at a serious clip.
  • We needed to improve speed when applying values because there were often several grids open at once, and they all needed to be updated as quickly as possible.1
  • We also needed to decrease the memory footprint because when the data flow was heavy, there were a lot of objects in memory and the application was reaching the limit of its address space.2

As mentioned above, the data object we had worked fine. It was fast enough and slim enough that we never noticed any performance or memory issues in more classical client applications. It was only when using the data object in a very high-demand, high-performance product that the issue arose. That's actually the way we prefer working: get the code running correctly first, then make it faster if needed.

And how do you make it faster and slimmer without breaking everything else you've already written? You run each subsequent version against your unit, regression and integration tests to verify it, that's how. Quino has several thousand automated tests that we ran each step of the way to make sure that our performance improvements didn't break behavior.

Charts and Methodology

The charts below indicate a relative improvement in speed and memory usage. The numbers are not meant to be compared in absolute terms to any other numbers. In fact, the application being tested was a simple console application we wrote that created a bunch of objects with a bunch of random data. Naturally we built the test to adequately approximate the behavior of the real-world application that was experiencing problems. This test application emitted the numbers you see below.

We used the YourKit Profiler for .NET to find code points that still needed improvement and iterated until we were happy with the result. We are very happy with YourKit as a profiler. It's fast and works well for sampling and tracing as well as detecting memory leaks and tracking memory usage. To test performance, we would execute part of the tests below with tracing enabled (no recompilation necessary), show "Hot Spots" and fix those.

The tests focused on creating a certain number of objects with a certain number of columns (with total data fields = #objects * #columns), corresponding to the first two columns in the table. The other columns are v0 (the baseline) and v1--v3, which are various versions we made as we tried to hone performance. The final three columns show the speed of v1--v3 vs. v0.

image

image

Finally, not only did we make creating objects over 3 times faster and changing values more than twice as fast, but we also decreased the memory footprint of each object to just over 1/3 of the original size.

image

These improvements didn't come by magic: the major change we made was to move from using a dictionary as an internal representation to using arrays and direct indexing. The dictionary is the more natural choice as the generalized API maps property and relation names to values, but it uses more space and is slower than an array. It is, however, much easier to use if you don't have to worry about extreme performance situations. Using an array gives us the speed we need, but it also requires that we be much more careful about index-out-of-bounds situations. That's where our rich suite of tests came to the rescue and let us have our cake and eat it too.

These improvements are available to any application using Quino 1.6.2.0 and higher.



  1. In a subsequent version of this product, we would move each grid/window into its own UI thread in order to parallelize the work and use all 8 cores on the target machine to make updates even faster.

  2. Because of the parallelization mentioned in the footnote above, the subsequent version was still reaching the limit of the 32-bit address space, even with the decreased memory footprint per object. So we compiled as 64-bit to remove that limitation as well.

v1.6.2.0: New reporting system and UI

The summary below describes major new features, items of note and breaking changes. The full list of issues is also available for those with access to the Encodo issue tracker.

Highlights

  • A new reporting system and UI (based on the DevExpress Winforms UI)

Breaking changes

No known breaking changes

v1.6.1.0: Introduced support for interdependent modules

The summary below describes major new features, items of note and breaking changes. The full list of issues is also available for those with access to the Encodo issue tracker.

Highlights

  • General performance improvements
  • GenericObject performance improvements
  • QNO-2926: Added HSV support for managing skinned colors
  • QNO-2894: Added a MAPIIncidentSubmitter to the Encodo library

Breaking changes

No known breaking changes

Encodo C# Handbook 7.17 -- Using System.Linq

I'm currently revising the Encodo C# Handbook to update it for the last year's worth of programming experience at Encodo, which includes a lot more experience with C# 4.0 features like optional parameters, dynamic types and more. The following is an expanded section on working with Linq. A final draft should be available by the middle of April or so.

7.17 -- Using System.Linq

When using Linq expressions, be careful not to sacrifice legibility or performance simply in order to use Linq instead of more common constructs. For example, the following loop sets a property for those elements in a list where a condition holds.

foreach (var pair in Data)
{
  if (pair.Value.Property is IMetaRelation)
  {
    pair.Value.Value = null;
  }
}

This seems like a perfect place to use Linq; assuming an extension method ForEach(this IEnumerable<T>), we can write the loop above using the following Linq expression:

Data.Where(pair => pair.Value.Property is IMetaRelation).ForEach(pair => pair.Value.Value = null);

This formulation, however, is more difficult to read because the condition and the loop are now buried in a single line of code, but a more subtle performance problem has been introduced as well. We have made sure to evaluate the restriction (Where) first so that we iterate the list (with ForEach) with as few elements as possible, but we still end up iterating twice instead of once. This could cause performance problems in border cases where the list is large and a large number of elements satisfy the condition.

7.17.1 -- Lazy Evaluation

Linq is mostly a blessing, but you always have to keep in mind that Linq expressions are evaluated lazily. Therefore, be very careful when using the Count() method because it will iterate over the entire collection (if the backing collection is of base type IEnumerable<T>). Linq is optimized to check the actual backing collection, so if the IEnumerable<T> you have is a list and the count is requested, Linq will use the Count property instead of counting elements naively.

A few concrete examples of other issues that arise due to lazy evaluation are illustrated below.

7.17.2 -- Capturing Unstable Variables/Access to Modified Closure

You can accidentally change the value of a captured variable before the sequence is evaluated. Since ReSharper will complain about this behavior even when it does not cause unwanted side-effects, it is important to understand which cases are actually problematic.

var data = new[] { "foo", "bar", "bla" };
var otherData = new[] { "bla", "blu" };
var overlapData = new List<string>();

foreach (var d in data)
{
  if (otherData.Where(od => od == d).Any())
  {
    overlapData.Add(d);
  }
}

// We expect one element in the overlap, bla
Assert.AreEqual(1, overlapData.Count);

The reference to the variable d will be flagged by ReSharper and marked as an access to a modified closure. This is a reminder that a variable referencedor capturedby the lambda expressionclosurewill have the last value assigned to it rather than the value that was assigned to it when the lambda was created. In the example above, the lambda is created with the first value in the sequence, but since we only use the lambda once, and then always before the variable has been changed, we dont have to worry about side-effects. ReSharper can only detect that a variable referenced in a closure is being changed within the scope that it checks and letting you know so you can verify that there are no unwanted side-effects.

Even though there isnt a problem, you can rewrite the foreach-statement above as the following code, eliminating the Access to modified closure warning.

var overlapData = data.Where(d => otherData.Where(od => od == d).Any()).ToList();

The example above was tame in that the program ran as expected despite capturing a variable that was later changed. The following code, however, will not run as expected:

var data = new[] { "foo", "bar", "bla" };
var otherData = new[] { "bla", "blu" };
var overlapData = new List<string>();

var threshold = 2;
var results = data.Where(d => d.Length == threshold);
var overlapData = data.Where(d => otherData.Where(od => od == d).Any());
if (overlapData.Any())
{
  threshold += 1;
}

// All elements are three characters long, so we expect no matches
Assert.AreEqual(0, results.Count());

Here we have a problem because the closure is evaluated after a local variable that it captured has been modified, resulting in unexpected behavior. Whereas its possible that this is exactly what you intended, its not a recommended coding style. Instead, you should move the calculation that uses the lambda after any code that changes variables that it capture:

var threshold = 2;
var overlapData = data.Where(d => otherData.Where(od => od == d).Any());
if (overlapData.Any())
{
  threshold += 1;
}
var results = data.Where(d => d.Length == threshold);

This is probably the easiest way to get rid of the warning and make the code clearer to read.

Encodo C# Handbook 7.30 -- Loose vs. Tight Coupling

I'm currently revising the Encodo C# Handbook to update it for the last year's worth of programming experience at Encodo, which includes a lot more experience with C# 4.0 features like optional parameters, dynamic types and more. The following is an expanded section on working with Linq. A final draft should be available by the middle of April or so.

7.30 -- Loose vs. Tight Coupling

Whether to use loose or tight coupling for components depends on several factors. If a component on a lower-level must access functionality on a higher level, this can only be achieved with loose coupling: e.g. connecting the two by using one or more delegates or callbacks.

If the component on the higher level needs to be coupled to a component on a lower level, then its possible to have them be more tightly coupled by using an interface. The advantage of using an interface over a set or one or more callbacks is that changes to the semantics of how the coupling should occur can be enforced. The example below should make this much clearer.

Imagine a class that provides a single event to indicate that it has received data from somewhere.

public class DataTransmitter
{
  public event EventHandler<DataBundleEventArgs> DataReceived;
}

This is the class way of loosely coupling components; any component that is interested in receiving data can simply attach to this event, like this:

public class DataListener
{
  public DataListener(DataTransmitter transmitter)
  {
    transmitter.DataReceived += TransmitterDataReceived;
  }

  private TransmitterDataReceived(object sender, DataBundleEventArgs args)
  {
    // Do something when data is received
  }
}

Another class could combine these two classes in the following, classic way:

var transmitter = new DataTransmitter();
var listener = new DataListener(transmitter);

The transmitter and listener can be defined in completely different assemblies and need no dependency on any common code (other than the .NET runtime) in order to compile and run. If this is an absolute must for your component, then this is the pattern to use for all events. Just be aware that the loose coupling may introduce semantic errorserrors in usage that the compiler will not notice.

For example, suppose the transmitter is extended to include a new event, NoDataAvailableReceived.

public class DataTransmitter
{
  public event EventHandler<DataBundleEventArgs> DataReceived;
  public event EventHandler NoDataAvailableReceived;
}

Lets assume that the previous version of the interface threw a timeout exception when it had not received data within a certain time window. Now, instead of throwing an exception, the transmitter triggers the new event instead. The code above will no longer indicate a timeout error (because no exception is thrown) nor will it indicate that no data was transmitted.

One way to fix this problem (once detected) is to hook the new event in the DataListener constructor. If the code is to remain highly decoupledor if the interface cannot be easily changedthis is the only real solution.

Imagine now that the transmitter becomes more sophisticated and defines more events, as shown below.

public class DataTransmitter
{
  public event EventHandler<DataBundleEventArgs> DataReceived;
  public event EventHandler NoDataAvailableReceived;
  public event EventHandler ConnectionOpened;
  public event EventHandler ConnectionClosed;
  public event EventHandler<DataErrorEventArgs> ErrorOccured;
}

Clearly, a listener that attaches and responds appropriately to all of these events will provide a much better user experience than one that does not. The loose coupling of the interface thus far requires all clients of this interface to be proactively aware that something has changed and, once again, the compiler is no help at all.

If we can change the interfaceand if the components can include references to common codethen we can introduce tight coupling by defining an interface with methods instead of individual events.

public interface IDataListener
{
  void DataReceived(IDataBundle bundle);
  void NoDataAvailableReceived();
  void ConnectionOpened();
  void ConnectionClosed();
  void ErrorOccurred(Exception exception, string message);
}

With a few more changes, we have a more tightly coupled system, but one that will enforce changes on clients:

  • Add a list of listeners on the DataTransmitter
  • Add code to copy and iterate the listener list instead of triggering events from the DataTransmitter.
  • Make DataListener implement IDataListener
  • Add the listener to the transmitters list of listeners.

Now when the transmitter requires changes to the IDataListener interface, the compiler will enforce that all listeners are also updated.

v1.6.0.0: Added support for Mongo/NoSQL databases

The summary below describes major new features, items of note and breaking changes. The full list of issues is also available for those with access to the Encodo issue tracker.

Highlights

  • Mongo/NoSql database support
  • Portable deployment format
  • Translation fixes
  • Improved dependency decoupling (parsers)
  • QNO-2654, QNO-2652, QNO-2653, QNO-2655: MetaEditPanel: Add more layouting options, dynamic visibilty for layout groups, dynamic ReadOnly property control, allow to use rich text control
  • Many bug fixes

Breaking changes

No known breaking changes

Troubleshooting a misbehaving designer in Visual Studio 2010

This article originally appeared on earthli News and has been cross-posted here.


Anyone who's used Visual Studio 20101 for a non-trivial Windows Forms project has run into situations wherein the designer can no longer be opened. Usually, it's because the class encounters null-reference exceptions when referencing data that is unavailable until runtime. Those are easy to fix: just avoid referencing that data in the constructor or load-routine while in design-mode.

However, sometimes Visual Studio has problems loading assemblies that it seems it should have available. Sometimes Visual Studio seems to have a devil of a time loading assemblies whose location it has quite explicitly been told.

If you like, there is a walkthrough -- with screenshots! -- at the end of this article, which shows how to solve even the most intractable designer problems.

A Tale of Two Platforms

One of the troubles is that many developers have moved to 64-bit Windows in order to take advantage of the higher RAM limits. The move to 64-bit causes some issues with many .NET assemblies in that the developer (i.e. probably YOU) didn't remember to take into account that an assembly might be loaded by x86 code or x64 code or some combination thereof. The designer will sometimes be unable to load an assembly because it has been compiled in a way that cannot be loaded by the runtime currently being used by the designer as explicitly requested in the project settings. That said, the request is explicit as far as Visual Studio is concerned, but implicit as far as the developer is concerned.

The only long-lasting solution is to learn how assemblies are loaded and what the best compile settings are for different assemblies so that you will run into as few problems as possible.

There are several considerations:

  1. It would be nice to have class libraries that can be loaded by any executable instead of having separate versions for x64 and x86.
  2. It would also be nice to be able to benefit from as many debugging features of the environment as possible (e.g. the Edit & Continue feature does not work with x64 builds).
  3. It would be nice to have the most optimal executable for the target platform. (This is usually taken to mean an executable compiled to run natively on the target, but turns out not necessarily to be so, as shown below.)

In order to help decide what to do, it's best to go to the source: Microsoft. To that end, the article AnyCPU Exes are usually more trouble than they're worth by Rick Byers provides a lot of guidance.

  • "Running in two very different modes increases product complexity and the cost of testing": two different platforms equals two times as much testing. Build servers have to compile and run tests for all configurations because there can be subtle differences.2
  • "32-bit tends to be faster anyway": the current version of the WOW (Windows-on-Windows) runtime on 64-bit systems actually runs code faster than the native 64-bit runtime. That still holds true as of this writing.
  • "Some features aren't avai[l]able in 64-bit": the aforementioned Edit & Continue counts among these, as does historical debugging if you're lucky enough to have a high-end version of Visual Studio.

Given all of the points made above and assuming that your application does not actually need to be 64-bit (i.e. it needs to address more RAM than is available in the 32-bit address space), your best bet is to use the following rules as your guide when setting up default build and release settings.

  • Pure class libraries should always be compiled for "Any CPU" (i.e. able to be loaded by both x86 and x64 assemblies).
  • Executables should always be compiled as x86.
  • Unit-test assemblies should also be compiled as x86 in order to be able to use Edit & Continue.

Where Did You Find That?!

Once you've set up your build configuration appropriately and rebuilt everything, you will avoid many design-time errors.

Though not all of them.

Visual Studio has a nasty habit of loading assemblies wherever it can find one that matches your requirements, regardless of the location from which you linked in the assembly. If you look in the project file for a C# Visual Studio project (the .csproj-file), you'll actually see an XML element called <HintPath> after each assembly reference. The name is appropriately chosen: Visual Studio will look for an assembly in this location first, but will continue looking elsewhere if it's not there. It will look in the GAC and it will look in the bin/Debug or bin/x86/Debug folder to see if can scrounge up something against which to link. Only if the assembly is not to be found anywhere will Visual Studio give up and actually emit an error message.

At Encodo, we stopped using the GAC entirely, relying instead on local folders containing all required third-party libraries. In this way, we try to control the build configuration and assemblies used when code is downloaded to a new environment (such as a build server). However, when working locally, it is often the case that a developer's environment is a good deal dirtier than that of a build server and must be cleaned.

Though Visual Studio offers an option to clean a project or solution, it doesn't do what you'd expect: assemblies remain in the bin/Debug or bin/x86/Debug folders. We've added a batch command that we use to explicitly delete all of these folders so that Visual Studio once again must rely on the HintPath to find its assemblies.

If you find yourself switching between x86 and x64 assemblies with any amount of frequency, you will run into designer loading errors when the designer manages to find an assembly compiled for the wrong platform. When this happens, you must shut down Visual Studio, clean all output folders as outlined above and re-open the solution.

Including References with ReSharper

A final note on references: if you adopt the same policy as Encodo of very carefully specifying the location of all external references, you have to watch out for ReSharper. If ReSharper offers to "reference assembly X" and "include the namespace Y", you should politely decline and reference the assembly yourself. ReSharper will reference the assembly as expected but will not include a HintPath so the reference will be somewhere in the bin/Debug or bin/x86/Debug folder and will break as soon as you clean all of those directories (as will be the case on a build server).

Designer Assemblies

This almost always works, but Visual Studio can still find ways of loading assemblies over which you have little to no control: the designer assemblies.

In all likelihood, you won't be including the designer assemblies in your third-party binaries folder for several reasons:

  1. They are not strictly required for compilation
  2. The are usually a good deal larger than the assembly that they support and are only used during design-time
  3. Design-time assemblies are usually associated with visual component packages that must be installed anyway in order for a compiled executable to be considered licensed.3

For all of the reasons above, it's best not to even try to get Visual Studio to load designer assemblies out of a specific folder and just let it use the GAC instead.

Walkthrough: Solving a Problem in the Designer

Despite all of the precautions mentioned above, it is still possible to have a misbehaving designer. The designer can be so mischievous that it simply refuses to load, showing neither a stack not an error message, keeping its reasons to itself. How do we solve such a problem?

You know you have a problem when the designer presents the following view instead of your form or user control.

image

In the worst case, you will be given neither a useful error message nor a stack from which to figure out what happened.

image

There's a little link at the top -- right in the middle -- that you can try that may provide you with more information.

image

The designer will try to scare you off one last time before giving up its precious secrets; ignore it.

image

At this point, the designer will finally show the warnings and errors that describe the reason it cannot load.4

image

The text is a bit dense, but one thing pops out immediately:

image

It looks like Visual Studio is checking some cached location within your application settings to find referenced assemblies and their designer assemblies.5 This is a bit strange as Visual Studio has been explicitly instructed to load those assemblies from the third-party folder that we carefully prepared above. Perhaps this cache represents yet another location that must be cleared manually every once in a while in order to keep the designer running smoothly.

[A]DevExpress.XtraLayout.LayoutControl cannot be cast to [B]DevExpress.XtraLayout.LayoutControl. 
Type A originates from 'DevExpress.XtraLayout.v10.2, Version=10.2.5.0, Culture=neutral, PublicKeyToken=b88d1754d700e49a' 
in the context 'LoadNeither' at location 
'C:\Documents and Settings\Marco\Local Settings\Application Data\Microsoft\VisualStudio\10.0\ProjectAssemblies\kn8q9qdt01\DevExpress.XtraLayout.v10.2.dll'. 
Type B originates from 'DevExpress.XtraLayout.v10.2, Version=10.2.4.0, Culture=neutral, PublicKeyToken=b88d1754d700e49a'
in the context 'Default' at location
'C:\WINDOWS\assembly\GAC_MSIL\DevExpress.XtraLayout.v10.2\10.2.4.0__b88d1754d700e49a\DevExpress.XtraLayout.v10.2.dll'.

This will turn out to be a goose chase, however.6 The problem does not lie in the location of the assemblies, but rather in the version. We can see that the designer was attempting to load version 10.2.4.0 of the third-party component library for DevExpress. However, the solution and all projects were referencing the 10.2.5.0 version, which had not been officially installed on that workstation. It was unofficially available because the assemblies were included in the solution-local third-party folder, but the designer files were not.

Instead of simply showing an error message that the desired version of a required assembly could not be loaded, Visual Studio chose instead to first hide the warnings quite well, then to fail to mention the real reason the assembly could not be loaded (i.e. that it conflicted with a newer version already in memory). Instead, the designer left it up to the developer to puzzle out that the error message only mentioned versions that were older than the current one.7

From there, a quick check of the installed programs and the GAC confirmed that the required version was not installed, but the solution was eminently non-obvious.

That's about all for Visual Studio Designer troubleshooting tips. Hopefully, they'll be useful enough to prevent at least some hair from being torn out and some keyboards from being thrown through displays.



  1. All tests were performed with the SP1 Beta version available as of Mid-February 2010.

  2. One such difference is how hash-codes are generated by the default implementation of GetHashCode(): the .NET implementation is optimized for speed, not portability so the codes generated by the 32-bit and 64-bit runtimes are different.

  3. In the case of SyncFusion, this means the application won't even compile; in the case of DevExpress, the application will both compile and run, but will display a nag screen every once in a while.

  4. If you're lucky, of course. If you're unlucky, Visual Studio will already have crashed and helpfully offered to restart itself.

  5. Then it encountered a null-reference exception, which we can only hope will actually get fixed in some service pack or other.

  6. I tried deleting this folder, but it was locked by Visual Studio. I shut down Visual Studio and could delete the folder. When I restarted and reloaded the project as well as the designer, I found to my surprise that Visual Studio had exactly recreated the folder structure that I had just deleted. It appears that this is a sort of copy of the required assemblies, but the purpose of copying assemblies out of the GAC to a user-local temporary folder is unclear. It stinks of legacy workarounds.

  7. In the case of DevExpress, this didn't take too long because it's a large component package and the version number was well-known to the developers in the project. However, for third-party components that are not so frequently updated or which have a less recognizable version number, this puzzle could have remained insoluble for quite some time.

Overriding Equality Operators: A Cautionary Tale

This article originally appeared on earthli News and has been cross-posted here.


tl;dr: This is a long-winded way of advising you to always be sure what you're comparing when you build low-level algorithms that will be used with arbitrary generic arguments. The culprit in this case was the default comparator in a HashSet<T>, but it could be anything. It ends with cogitation about software processes in the real world.

Imagine that you have a framework (The Quino Metadata framework from Encodo Systems AG) with support for walking arbitrary object graphs in the form of a GraphWalker. Implementations of this interface complement a generalized algorithm.

This algorithm generates nodes corresponding to various events generated by the graph traversal, like beginning or ending a node or edge or encountering a previously processed node (in the case of graphs with cycles). Such an algorithm is eminently useful for formatting graphs into a human-readable format, cloning said graphs or other forms of processing.

A crucial feature of such a GraphWalker is to keep track of the nodes it has seen before in order to avoid traversing the same node multiple times and going into an infinite loop in graphs with cycles. For subsequent encounters with a node, the walker handles it differently -- generating a reference event rather than a begin node event.

A common object graph is the AST for a programming language. The graph walker can be used to quickly analyze such ASTs for nodes that match particular conditions.

Processing a Little Language

Let's take a look at a concrete example, with a little language that defines simple boolean expressions:

OR(
  (A < 2)
  (B > A)
)

It's just an example and we don't really have to care about what it does, where A and B came from or the syntax. What matters is the AST that we generate from it:

1 Operator (OR)
2  Operator (<)
3    Variable (A)
4    Constant (2)
5  Operator (>)
6    Constant (B)
7    Variable (A)

When the walker iterates over this tree, it generates the following events (note the numbers at the front of the line correspond to the object in the diagram above:

1 begin node
1  begin edge
2    begin node
2      begin edge
3        begin node
3        end node
4        begin node
4        end node
2      end edge
2    end node
5    begin node
5      begin edge
6        begin node
6        end node
7        begin node
7        end node
5      end edge
5    end node
1  end edge

Now that's the event tree we expect. This is also the event tree that we get for the objects that we've chosen to represent our nodes (Operator, Variable and Constant in this case). If, for example, we process the AST and pass it through a formatter for this little language, we expect to get back exactly what we put in (namely the code in Listing 1). Given the event tree, it's quite easy to write such a formatter -- namely, by handling the begin node (output the node text), begin edge (output a "(") and end edge (output a ")") events.

So far, so good?

Running Into Trouble

However, now imagine that we discover a bug in other code that uses these objects and we discover that when two different objects refer to the same variable, we need them to be considered equal. That is, we update the equality methods -- in the case of .NET, Equals() and GetHashCode() -- for Variable.

As soon as we do, however, the sample from Listing 1 now formats as:

OR(
  (A < 2)
  (B > )
)

Now we have to figure out what happened. A good first step is to see what the corresponding event tree looks like now. We discover the following:

1 begin node
1  begin edge
2    begin node
2      begin edge
3        begin node
3        end node
4        begin node
4        end node
2      end edge
2    end node
5    begin node
5      begin edge
6        reference
7        begin node
7        end node
5      end edge
5    end node
1  end edge

The change is highlighted and affects the sixth node, which has now become a reference because we changed how equality is handled for Variables. The algorithm now considers any two Variables with the same name to be equivalent even if they are two different object references.

Fix #1 -- Hack the Application Code)

If we look back at how we wrote the simple formatter above, we only handled the begin node, begin edge and end edge events. If we throw in a handler for the reference event and output the text of the node, we're back in business and have "fixed" the formatter.

Fix #2 -- Fix the Algorithm

But we ignore the more subtle problem at our own peril: namely, that the graph walking code is fragile in that its behavior changes due to seemingly unrelated changes in the arguments that it is passed. Though we have a quick fix above, we need to think about providing more stability in the algorithm -- especially if we're providers of low-level framework functionality.1

The walker algorithm uses a HashSet<T> to track the nodes that it has previously encountered. However, the default comparator -- again, in .NET -- leans on the equality functions of the objects stored in the map to determine membership.

The first solution -- or rather, the second one, as we already "fixed" the problem with what amounts to a hack above by outputting references as well -- is to change the equality comparator for the HashSet<T> to explicitly compare references. We make that change and we can once again remove the hack because the algorithm no longer generates references for subsequent variable encounters.

Fix #3 -- Giving the Caller More Control

However, we're still not done. We've now not only gotten our code running but we've fixed the code for the algorithm itself so the same problem won't crop up again in other instances. That's not bad for a day's work, but there's still a nagging problem.

What happens if the behavior that was considered unexpected in this case is exactly the behavior that another use of the algorithm expects? That is, it may well be that other types of graph walker will actually want to be able to control what is and is not a reference by changing the equivalence functions for the nodes.2

Luckily, callers of the algorithm already pass in the graph walker itself, the methods of which the algorithm already calls to process nodes and edges. A simple solution is to add a method to the graph walker interface to ask it to create the kind of HashSet<T> that it would like to use to track references.

Tough Decisions: Which Fix to Use?

So how much time does this all take to do? Well, the first solution -- the hack in application code -- is the quickest, with time spent only on writing the unit test for the AST and verifying that it once again outputs as expected.

If we make a change to the framework, as in the second solution where we change the equality operator, we have to create unit tests to test the behavior of the AST in application code, but using test objects in the framework unit tests. That's a bit more work and we may not have time for it.

The last suggestion -- to extend the graph walker interface -- involves even more work because we then have to create two sets of test objects: one set that tests a graph walker that uses reference equality (as the AST in the application code) and one that uses object equality (to make sure that works as well).

It is at this point that we might get swamped and end up working on framework code and unit tests that verify functionality that isn't even being used -- and certainly isn't being used by the application with the looming deadline. However, we're right there, in the code, and will never be better equipped to get this all right than we are right now. But what if we just don't have time? What if there's a release looming and we should just thank our lucky stars that we found the bug? What if there's no time to follow the process?

Well, sometimes the process has to take a back seat, but that doesn't mean we do nothing. Here are a few possibilities:

  1. Do nothing in the framework; add an issue to the issue tracker explaining the problem and the work that needs to be done so that it can be fixed at a more opportune time (or by a developer with time). This costs a few minutes of time and is the least you should do.
  2. Make the fix in the framework to prevent others from getting bitten by this relatively subtle bug and add an issue to the issue tracker describing the enhanced fix (adding a method to the graph walker) and the tests that need to be written.
  3. Add the method to the graph walker interface so that not only do others not get bitten by the bug but, should they need to control equivalence, they can do so. Add an issue describing the tests that need to be written to verify the new functionality.

What about those who quite rightly frown at the third possibility because it would provide a solution for what amounts to a potential -- as opposed to actual -- problem? It's really up to the developer here and experience really helps. How much time does it take to write the code? How much does it change the interface? How many other applications are affected? How likely is it that other implementations will need this fix? Are there potential users who won't be able to make the fix themselves? Who won't be able to recompile and just have to live with the reference-only equivalence? How likely is it that other code will break subtly if the fix is not made? It's not an easy decision either way, actually.

Though purists might be appalled at the fast and loose approach to correctness outlined above, pragmatism and deadlines play a huge role in software development. The only way to avoid missing deadlines is to have fallback plans to ensure that the code is clean as soon as possible rather than immediately as a more stringent process would demand.

And thus ends the cautionary tale of making assumptions about how objects are compared and how frameworks are made.



  1. Which we are (The Quino Metadata framework from Encodo Systems AG).

  2. This possibility actually didn't occur to me until I started writing this blog post, which just goes to show how important it is to document and continually think about the code your write/have written.

v1.5.0.0: Upgraded to .NET 4.0 and VS2010

The summary below describes major new features, items of note and breaking changes. The full list of issues is also available for those with access to the Encodo issue tracker.

Highlights

  • Improved support for transient data
  • Added Object graph/walker support
  • Moved to .NET 4.0/VS 2010

Breaking changes

No known breaking changes