3 4 5 6 7 8 9 10 11 12 13
ASP.Net MVC Areas

After some initial skepticism regarding Areas, I now use them more and more when building new Web-Applications using ASP.Net MVC. Therefore, I decided to cover some of my thoughts and experiences in a blog post so others may get some inspiration out of it.

Before we start, here's a link to a general introduction to the area feature of MVC. Check out this article if you are not yet familiar with Areas.

Furthermore, this topic is based on MVC 5 and C# 4 but may also apply to older versions too as Areas are not really a new thing and were first introduced with MVC 2.

Introduction

Areas are intended to structure an MVC web application. Let's say you're building a line-of-business application. You may want to separate your modules on one hand from each other and on the other hand from central pieces of your web application like layouts, HTML helpers, etc.

Therefore an area should be pretty much self-contained and should not have too much interaction with other areas -- as little as possible -- as otherwise the dependencies between the modules and their implementation grows more and more, undercutting separation, and resulting in less maintainability.

How one draw the borders of each area depends on the need of the application and company. For example, modules can be separated by functionality or by developer/developer-team. In our line-of-business application, we may have an area for "Customer Management", one for "Order Entry", one for "Bookkeeping" and one for "E-Banking" as they have largely separate functionality and will likely be built by different developers or even teams.

The nice thing about areas -- besides modularization/organization of the app -- is that they are built in MVC and therefore are supported by most tools like Visual Studio, R# and most libraries. On the negative side, I can count the lack of good support in the standard HtmlHelpers as one need to specify the area as a routing-object property and use the name of the area as a string. There is no dedicated parameter for the area. But to put that downside into perspective, this is only needed when making an URL to another area than the current one.

Application modularization using Areas

In my point of view, modularization using areas has two major advantages. The first one is the separation from other parts of the application and the second is the fact that area-related files are closer together in the Solution Explorer.

The separation -- apart from the usual separation advantages -- is helpful for reviews as the reviewer can easily see what has changed within the area and what changed on the core level of the application and therefore needs an even-closer look. Another point of the separation is that for larger applications and teams it results in fewer merge conflicts when pushing to the central code repository as each team has its own playground. Last but not least, its nice for me as an application-developer because I know that when I make changes to my area only I will not break other developers' work.

As I am someone who uses the Solution Explorer a lot, I like the fact that with areas I normally have to scroll and search less and have a good overview of the folder- and file-tree of the feature-set I am currently working on. This happens because I moved all area-related stuff into the area itself and leave the general libraries, layouts, base-classes and helpers outside. This results in a less cluttered folder-tree for my areas, where I normally spend the majority of my time developing new features.

Tips and tricks

  • Move all files related to the area into the area itself including style-sheets (CSS, LESS, SASS) and client-side scripts (Javascript, TypeScript).
  • Configure bundles for your area or even for single pages within your area in the area itself. I do this in an enhanced area-registration file.
  • Enhance the default area registration to configure more aspects of your area.When generating links in global views/layouts use add the area="" routing attribute so the URL always points to the central stuff instead being area-relative.

For example: if your application uses '@Html.ActionLink()' in your global _layout.cshtml, use:

@Html.ActionLink("Go to home", "Index", "Home", new { area = "" });

Area Registration / Start Up

And here is a sample of one of my application's area registrations:

public class BookkeepingAreaRegistration : AreaRegistration
{
  public override string AreaName
  {
    get { return "Bookkeeping"; }
  }

  public override void RegisterArea(AreaRegistrationContext context)
  {
    RegisterRoutes(context);
    RegisterBundles(BundleTable.Bundles);
  }

  private void RegisterRoutes(AreaRegistrationContext context)
  {
    if (context == null) { throw new ArgumentNullException("context"); }

    context.MapRoute(
      "Bookkeeping_default",
      "Bookkeeping/{controller}/{action}/{id}",
      new { controller = "Home", action = "Index", id = UrlParameter.Optional }
    );
  }

  private void RegisterBundles(BundleCollection bundles)
    {
      if (bundles == null) { throw new ArgumentNullException("bundles"); }

      // Bookings Bundles
      bundles.Add(new ScriptBundle("~/bundles/bookkeeping/booking")
          .Include("~/Areas/bookkeeping/Scripts/booking.js"
      ));

      bundles.Add(new StyleBundle("~/bookkeeping/css/booking")
          .Include("~/Areas/bookkeeping/Content/booking.css"));

      // Account Overview Bundle
      ...
  }
}

As you can see in this example, I enhanced the area registration a little so area-specific bundles are registered in the area-registration too. I try to place all areas-specific start-up code in here.

Folder Structure

As I wrote in one of the tips, I strongly recommend storing all area-related files within the area's folder. This includes style-sheets, client-side scripts (JS, TypeScript), content, controller, views, view-models, view-model builders, HTML Helpers, etc. My goal is to make an area self-contained so all work can be done within the area's folder.

So the folder structure of my MVC Apps look something like this:

  • App_Start
  • Areas
    • Content
    • Controllers
    • Core
    • Models
      • Builders
    • Scripts
    • Views
  • bin
  • Content
  • Controllers
  • HtmlHelpers
  • Models
    • Builders
  • Scripts
  • Views

As you can see, each area is something like a mini-application within the main application.

Add-ons using Areas deployed as NuGet packages

Beside structuring an entire MVC Application, another nice usage is for hosting "add-on's" in their own area.

For example, lately I wrote a web-user-interface for the database-schema migration of our meta-data framework Quino. Instead of pushing it all in old-school web-resources and do binary deployment of them, I built it as an area. This area I've packed into a NuGet package (.nupkg) and published it to our local NuGet repo.

Applications which want to use the web-based schema-migration UI can just install that package using the NuGet UI or console. The package will then add the area with all the sources I wrote and because the area-registration is called by MVC automatically, it's ready to go without any manual action required. If I publish an update to the NuGet package applications can get these as usual with NuGet. A nice side-effect of this deployment is that the web application contains all the sources so developers can have a look at it if they like. It doesn't just include some binary files. Another nice thing is that the add-on can define its own bundles which get hosted the same way as the MVC app does its own bundles. No fancy web-resources and custom bundling and minification is needed.

To keep conflicts to a minimum with such add-on areas, the name should be unique and the area should be self-contained as written above.

Is Encodo a .NET/C# company?

Encodo has never been about maintaining or establishing a monoculture in either operating system, programming language or IDE. Pragmatism drives our technology and environment choices.1

Choosing technology

Each project we work on has different requirements and we choose the tools and technologies that fit best. A good fit involves considering:

  • What exists in the project already?
  • How much work needs to be done?
  • What future directions could the project take?
  • How maintainable is the solution/are the technologies?
  • How appropriate are various technologies?
  • What do our developers know how to do best?
  • What do the developers who will maintain the project know best? What are they capable of?
  • Is there framework code available that would help?

History: Delphi and Java

When we started out in 2005, we'd also spent years writing frameworks and highly generic software. This kind of software is not really a product per se, but more of a highly configurable programmable "engine", in which other programmers would write their actual end-user applications.

A properly trained team can turn around products very quickly using this kind of approach. It is not without its issues, though: maintaining a framework involves a lot of work, especially producing documentation, examples and providing support. While this is very interesting work, it can be hard to make lucrative, so we decided to move away from this business and focus on creating individual products.

Still, we stuck to the programming environment and platform that we knew best2 (and that our customers were requesting): we developed software mostly in Delphi for projects that we already had.3 For new products, we chose Java.

Why did we choose Java as our "next" language? Simply because Java satisfied a lot of the requirements outlined above. We were moving into web development and found Delphi's offerings lacking, both in the IDE as well as the library support. So we moved on to using Eclipse with Jetty. We evaluated several common Java development libraries and settled on Hibernate for our ORM and Tapestry for our web framework (necessitating HiveMind as our IOC).

History: .NET

A few years later, we were faced with the stark reality that developing web applications on Java (at the time) was fraught with issues, the worst of which was extremely slow development-turnaround times. We found ourselves incapable of properly estimating how long it would take to develop a project. We accept that this may have been our fault, of course, but the reality was that (1 )we were having trouble making money programming Java and (2) we weren't having any fun anymore.

We'd landed a big project that would be deployed on both the web and Windows desktops, with an emphasis on the Windows desktop clients. At this point, we needed to reƫvaluate: such a large project required a development language, runtime and IDE strong on the Windows Desktop. It also, in our view, necessitated a return to highly generic programming, which we'd moved away from for a while.

Our evaluation at the time included Groovy/Grails/Gtk, Python/Django/Gtk, Java/Swing/SWT/Web framekworks, etc. We made the decision based on various factors (tools, platform suitability, etc.) and moved to .NET/C# for developing our metadata framework Quino, upon which we would build the array of applications required for this big project.

Today (2014)

We're still developing a lot of software in C# and .NET but also have a project that's built entirely in Python.4 We're not at all opposed to a suggestion by a customer that we add services to their Java framework on another project, because that's what's best there.

We've had some projects that run on a Linux/Mono stack on dedicated hardware. For that project, we made a build-server infrastructure in Linux that created the embedded OS with our software in it.

Most of our infrastructure runs on Linux with a few Windows VMs where needed to host or test software. We use PostgreSql wherever we can and MS-SQL when the customer requires it.5

We've been doing a lot of web projects lately, which means the usual client-side mix of technology (JS/CSS/HTML). We use jQuery, but prefer Knockout for data-binding. We've evaluated the big libraries -- Angular, Backbone, Ember -- and found them to be too all-encompassing for our needs.

We've evaluated both Dart and TypeScript to see if those are useful yet. We've since moved to TypeScript for all of our projects but are still keeping an eye on Dart.

We use LESS instead of pure CSS. We've used SCSS as well, but prefer LESS. We're using Bootstrap in some projects but find it to be too restrictive, especially where we can use Flexbox for layout on modern browsers.

And, with the web comes development, support and testing for iOS and other mobile devices, which to some degree necessitates a move from pure .NET/C# and toward a mix.

We constantly reƫvaluate our tools, as well. We use JetBrains WebStorm instead of Visual Studio for some tasks: it's better at finding problems in JavaScript and LESS. We also use PhpStorm for our corporate web site, including these blogs. We used the Java-based Jenkins build server for years but moved to JetBrains TeamCity because it better supports the kind of projects we need to build.

Conclusion

The description above is meant to illustrate flexibility, not chaos. We are quite structured and, again, pragmatic in our approach.

Given the choice, we tend to work in .NET because we have the most experience and supporting frameworks and software for it. We use .NET/C# because it's the best choice for many of the projects we have, but we are most definitely not a pure Microsoft development shop.

I hope that gives you a better idea of Encodo's attitude toward software development.



  1. If it's not obvious, we employ the good kind of pragmatism, where we choose the best tool for the job and the situation, not the bad kind, founded in laziness and unwillingness to think about complex problems. Just so we're clear.

  2. Remo had spent most of his career working with Borland's offerings, whereas I had started our with Borland's Object Pascal before moving on to the first version of Delphi, then Microsoft C++ and MFC for many years. After that came the original version of ASP.NET with the "old" VB/VBScript and, finally, back to Delphi at Opus Software.

  3. We were actually developing on Windows using Delphi and then deploying on Linux, doing final debugging with Borland's Linux IDE, Kylix. The software to be deployed on Linux was headless, which made it much easier to write cross-platform code.

  4. For better or worse; we inherited a Windows GUI in Python, which is not very practical, but I digress

  5. Which is almost always, unfortunately.

Should you return `null` or an empty list?

I've seen a bunch of articles addressing this topic of late, so I've decided to weigh in.

The reason we frown on returning null from a method that returns a list or sequence is that we want to be able to freely use these sequences or lists with in a functional manner.

It seems to me that the proponents of "no nulls" are generally those who have a functional language at their disposal and the antagonists do not. In functional languages, we almost always return sequences instead of lists or arrays.

In C# and other functional languages, we want to be able to do this:

var names = GetOpenItems()
  .Where(i => i.OverdueByTwoWeeks)
  .SelectMany(i => i.GetHistoricalAssignees()
    .Select(a => new { a.FirstName, a.LastName })
  );

foreach (var name in names)
{
  Console.WriteLine("{1}, {0}", name.FirstName, name.LastName);
}

If either GetHistoricalAssignees() or GetOpenItems() might return null, then we'd have to write the code above as follows instead:

var openItems = GetOpenItems();
if (openItems != null)
{
  var names = openItems
  .Where(i => i.OverdueByTwoWeeks)
  .SelectMany(i => (i.GetHistoricalAssignees() ?? Enumerable.Empty<Person>())
    .Select(a => new { a.FirstName, a.LastName })
  );

  foreach (var name in names)
  {
    Console.WriteLine("{1}, {0}", name.FirstName, name.LastName);
  }
}

This seems like exactly the kind of code we'd like to avoid writing, if possible. It's also the kind of code that calling clients are unlikely to write, which will lead to crashes with NullReferenceExceptions. As we'll see below, there are people that seem to think that's perfectly OK. I am not one of those people, but I digress.

The post, Is it Really Better to 'Return an Empty List Instead of null'? / Part 1 by Christian Neumanns serves as a good example of an article that seems to be providing information but is just trying to distract people into accepting it as a source of genuine information. He introduces his topic with the following vagueness.

If we read through related questions in Stackoverflow and other forums, we can see that not all people agree. There are many different, sometimes truly opposite opinions. For example, the top rated answer in the Stackoverflow question Should functions return null or an empty object? (related to objects in general, not specifically to lists) tells us exactly the opposite:

Returning null is usually the best idea ...

The statement "we can see that not all people agree" is a tautology. I would split the people into groups of those whose opinions we should care about and everyone else. The statement "There are many different, sometimes truly opposite opinions" is also tautological, given the nature of the matter under discussion -- namely, a question that can only be answered as "yes" or "no". Such questions generally result in two camps with diametrically opposed opinions.

As the extremely long-winded pair of articles writes: sometimes you can't be sure of what an external API will return. That's correct. You have to protect against those with ugly, defensive code. But don't use that as an excuse to produce even more methods that may return null. Otherwise, you're just part of the problem.

The second article Is it Really Better to 'Return an Empty List Instead of null'? - Part 2 by Christian Neumanns includes many more examples.

I just don't know what to say about people that write things like "Bugs that cause NullPointerExceptions are usually easy to debug because the cause and effect are short-distanced in space (i.e. location in source code) and time." While this is kind of true, it's also even more true that you can't tell the difference between such an exception being caused by a savvy programmer who's using it to his advantage and a non-savvy programmer whose code is buggy as hell.

He has a ton of examples that try to distinguish between a method that returns an empty sequence being different from a method that cannot properly answer a question. This is a concern and a very real distinction to make, but the answer is not to return null to indicate nonsensical input. The answer is to throw an exception.

The method providing the sequence should not be making decisions about whether an empty sequence is acceptable for the caller. For sequences that cannot logically be empty, the method should throw an exception instead of returning null to indicate "something went wrong".

A caller may impart semantic meaning to an empty result and also throw an exception (as in his example with a cycling team that has no members). If the display of such a sequence on a web page is incorrect, then that is the fault of the caller, not of the provider of the sequence.

  • If data is not yet available, but should be, throw an exception
  • If data is not available but the provider isn't qualified to decide, return an empty sequence
  • If the caller receives an empty sequence and knows that it should not be empty, then it is responsible for indicating an error.

That there exists calling code that makes assumptions about return values that are incorrect is no reason to start returning values that will make calling code crash with a NullPointerException.

All of his examples are similar: he tries to make the pure-data call to retrieve a sequence of elements simultaneously validate some business logic. That's not a good idea. If this is really necessary, then the validity check should go in another method.

The example he cites for getting the amount from a list of PriceComponents is exactly why most aggregation functions in .NET throw an exception when the input sequence is empty. But that's a much better way of handling it -- with a precise exception -- than by returning null to try to force an exception somewhere in the calling code.

But the upshot for me is: I am not going to write code that, when I call it, forces me to litter other code with null-checks. That's just ridiculous.

Optimizing data access for high-latency networks: part IV

imageIn the previous three articles, we sped up the opening of the calendar in Encodo's time-tracking product Punchclock. We showed how we reduced the number of queries from one very slow query per person to a single very fast query for all people at once.

Because we're talking about latency in these articles, we'd also like to clear away a few other queries that aren't related to time entries but are still slowing things down.

Lazy-loading unneeded values

In particular, the queries that "Load values" for person objects look quite suspicious. These queries don't take a lot of time to execute but they will definitely degrade performance in high-latency networks.1

image

As we did before, we can click on one of these queries to show the query that's being loaded. In the screenshot below, we see that the person's picture is being loaded for each person in the drop-down list.

image

We're not showing pictures in the drop-down list, though, so this is an extravagant waste of time. On a LAN, we hardly notice how wasteful we are with queries; on a WAN, the product will feel...sluggish.

What is a load-group?

In order to understand the cause of these queries, you must first know that Quino allows a developer to put metadata properties into different load-groups. A load-group has the following behavior: If the value for a property in a load-group is requested on an object, the values for all of the properties in the load-group are retrieved with a single query and set on that object.

The default load-group of an object's metadata determine the values that are initially retrieved and applied to objects materialized by the ORM.

The metadata for a person puts the "picture" property of a person into a separate load-group so that the value is not loaded by default when objects of type peron are loaded from the data driver. With this setting, business logic avoids downloading a lot of unwanted picture data by default.

Business logic that needs the pictures can either explicitly include the picture in the query or let the value be lazy-loaded by the ORM when it is accessed. The proper solution depends on the situation.

Lazy-loaded property values

As before, we can check the stack trace of the query to figure out which application component is triggering the call. In this case, the culprit is the binding list that we are using to attach the list of people to the drop-down control.

The binding list binds the values for all of the properties in a metaclass (e.g. "person"), triggering a lazy load when it accesses the "picture" property. To avoid the lazy-load, we can create a wrapper of the default metadata for a person and remove/hide the property so that the binding list will no longer access it.

This is quite easy2, as shown in the code below.

var personMetaClass = new WrapMetaClass(Person.Metadata);
personMetaClass.Properties.Remove(Person.MetaProperties.Picture);
var query = new Query(personMetaClass);

With this simple fix, the binding list no longer knows about the picture property, doesn't retrieve values for that property and therefore no longer triggers any queries to lazily load the pictures from the database for each person object.

The screenshot of the statistics window below shows us that we were successful. We have two main queries: one for the list of people to show in the drop-down control and one for the time entries to show in the calendar.

image

Final version

For completeness, here's the code that Punchclock is using in the current version of Quino (1.11).

var personMetaClass = new WrapMetaClass(Person.Metadata);
personMetaClass.Properties.Remove(Person.MetaProperties.Picture);

var accessToolkit = new PostgreSqlMetaDatabase().AccessToolkit;

var query = new Query(personMetaClass);
query.CustomCommandText = new CustomCommandText();
query.CustomCommandText.SetSection(
  CommandTextSections.Where, 
  CommandTextAction.Replace,
  string.Format(
    "EXISTS (SELECT id FROM {0} WHERE {1} = {2})", 
    accessToolkit.GetName(TimeEntry.Metadata), 
    accessToolkit.GetField(TimeEntry.MetaProperties.PersonId), 
    accessToolkit.GetField(Person.MetaProperties.Id)
  )>
);
var people = Session.GetList<Person>(query);

Future, improved version

Once we fix the bug in the WhereExists join type mentioned in the previous article and add the fluent methods for constructing wrappers mentioned in the footnote below, the code will be as follows:

var personMetaClass = 
  Person.Metadata.
  Wrap().
  RemoveProperty(Person.MetaProperties.Picture);

var people = 
  Session.GetList<Person>(
    new Query(personMetaClass).
    Join(Person.MetaRelations.TimeEntries, JoinType.WhereExists).
    Query
  );

This concludes our investigation into performance issues with Quino and Punchclock.


image

var personMetaClass = 
  Person.Metadata.
  Wrap().
  RemoveProperty(Person.MetaProperties.Picture);

var query = new Query(personMetaClass);

But you'll have to wait for Quino 1.12 for that.


  1. You may have noticed that these calls to "load values" are technically lazy-loaded but don't seem to be marked as such in the screenshots. This was a bug in the statistics viewer that I discovered and addressed while writing this article.

  2. This is a rather old API and hasn't been touched with the "fluent" wand that we've applied to other parts of the Quino API. A nicer way of writing it would be to create extension methods called Wrap() and RemoveProperty that return the wrapper class, like so:

v1.12.0: .Improvements to data-provider statistics and Windows 8.1 fixes

The summary below describes major new features, items of note and breaking changes. The full list of issues is also available for those with access to the Encodo issue tracker.

Highlights

imageimage

  • Window 8.1: fixed culture-handling for en-US and de-CH that is broken in Windows 8.1 (QNO-4534, QNO-4553)
  • Data-provider statistics: improved the WinForm-based statistics form (QNO-4231, QNO-4545, QNO-4546)
  • Data driver: bug fixes and improvements (QNO-4538, QNO-4554, QNO-4551)
  • Image-handling: the Encodo and Quino libraries now use the Windows Imaging Components instead of System.Drawing (QNO-4536)
  • Standard forms: updated the standard WinForm about window and splash screen to use Encodo web-site CI (QNO-4529)

Breaking changes

  • No known breaking changes.
Optimizing data access for high-latency networks: part III

imageIn the previous articles, we partially addressed a performance problem in the calendar of Encodo's time-tracking product, Punchclock. While we managed to drastically reduce the amount of time taken by each query (>95% time saved), we were still executing more queries than strictly necessary.

The query that we're trying to optimized further is shown below.

var people =
  Session.GetList<Person>().
  Where(p => Session.GetCount(p.TimeEntries.Query) > 0).
  ToList();

This query executes one query to get all the people and then one query per person to get the number of time entries per person. Each of these queries by itself is very fast. High latency will cause them to be slow. In order to optimize further, there's really nothing for it but to reduce the number of queries being executed.

Let's think back to what we're actually trying to accomplish: We want to get all people who have at least one time entry. Can't we get the database to do that for us? Some join or existence check or something? How about the code below?

var people = 
  Session.GetList<Person>(
    Session.CreateQuery<Person>().
    Join(Person.MetaRelations.TimeEntries, JoinType.WhereExists).
    Query
  );

What's happening in the code above? We're still getting a list of people but, instead of manipulating the related TimeEntries for each person locally, we're joining the TimeEntries relation with the Quino query Join() method and changing the join type from the default All to the restrictive WhereExists. This sounds like exactly what we want to happen! There is no local evaluation or manipulation with Linq and, with luck, Quino will be able to map this to a single query on the database.

This is the best possible query: it's purely declarative and will be executed as efficiently as the back-end knows how.

There's just one problem: the WhereExists join type is broken in Quino 1.11.

Never fear, though! We can still get it to work, but we'll have to do a bit of work until the bug is fixed in Quino 1.12. The code below builds on lessons learned in the earlier article, Mixing your own SQL into Quino queries: part 2 of 2 to use custom query text to create the restriction instead of letting Quino do it.

var accessToolkit = new PostgreSqlMetaDatabase().AccessToolkit;

var query = Session.CreateQuery<Person>();
query.CustomCommandText = new CustomCommandText();
query.CustomCommandText.SetSection(
  CommandTextSections.Where, 
  CommandTextAction.Replace,
  string.Format(
    "EXISTS (SELECT id FROM {0} WHERE {1} = {2})", 
    accessToolkit.GetName(TimeEntry.Metadata), 
    accessToolkit.GetField(TimeEntry.MetaProperties.PersonId), 
    accessToolkit.GetField(Person.MetaProperties.Id)
  )
);
var people = Session.GetList<Person>(query);

A look at the statistics is very encouraging:

image

We're down to one 29ms query for the people and an even quicker query for all the relevant time entries.1 We can see our query text appears embedded in the SQL generated by Quino, just as we expected.

There are a few other security-related queries that execute very quickly and hardly need optimization.

We've come much farther in this article and we're almost done. In the next article, we'll quickly clean up a few other queries that are showing up in the statistics and that have been nagging us since the beginning.



  1. The time-entry query is not representative because my testing data set didn't include time entries for the current day and I was too lazy to page around to older data.

Optimizing data access for high-latency networks: part II

imageIn the previous article, we discussed a performance problem in the calendar of Encodo's time-tracking product, Punchclock.

Instead of guessing at the problem, we profiled the application using the database-statistics window available to all Quino applications.1 We quickly discovered that most of the slowdown stems from the relatively innocuous line of code shown below.

var people = 
  Session.GetList<Person>().
  Where(p => p.TimeEntries.Any()).
  ToList();

First things first: what does the code do?

Before doing anything else, we should establish what the code does. Logically, it retrieves a list of people in the database who have recorded at least one time entry.

The first question we should ask at this point is: does the application even need to do this? The answer in this case is 'yes'. The calendar includes a drop-down control that lets the user switch between the calendars for different users. This query returns the people to show in this drop-down control.

With the intent and usefulness of the code established, let's dissect how it is accomplishing the task.

  1. The Session.GetList<Person>() portion retrieves a list of all people from the database
  2. The Where() method is applied locally for each object in the list2
  3. For a given person, the list of TimeEntries is accessed
  4. This access triggers a lazy load of the list
  5. The Any() method is applied to the full list of time entries
  6. The ToList() method creates a list of all people who match the condition

Though the line of code looks innocuous enough, it causes a huge number of objects to be retrieved, materialized and retained in memory -- simply in order to check whether there is at least one object.

This is a real-world example of a performance problem that can happen to any developer. Instead of blaming the developer who wrote this line of code, its more important to stay vigilant to performance problems and to have tools available to quickly and easily find them.

Stop creating all of the objects

The first solution I came up with3 was to stop creating objects that I didn't need. A good way of doing this and one that was covered in Quino: partially-mapped queries is to use cursors instead of lists. Instead of using the generated list TimeEntries, the following code retrieves a cursor on that list's query and materializes at most one object for the sub-query.

var people = Session.GetList<Person>().Select(p =>
{
  using (var cursor = Session.CreateCursor<TimeEntry>(p.TimeEntries.Query))[^4]
  {
    return cursor.Any();
  }
}).ToList();

A check of the database statistics shows improvement, as shown below.

image

Just by using cursors, we've managed to reduce the execution time for each query by about 75%.4 Since all we're interested in finding out is whether there is at least one time entry for a person, we could also ask the database to count objects rather than to return them. That should be even faster. The following code is very similar to the example above but, instead of getting a cursor based on the TimeEntries query, it gets the count.

var people =
  Session.GetList<Person>().
  Where(p => Session.GetCount(p.TimeEntries.Query) > 0).
  ToList();

How did we do? A check of the database statistics shows even more improvement, as shown below.

image

We're now down to a few dozen milliseconds for all of our queries, so we're done, right? A 95% reduction in query-execution time should be enough.

Unfortunately, we're still executing just as many queries as before, even though we're taking far less time to execute them. This is better, but still not optimal. In high-latency situations, the user is still likely to experience a significant delay when opening the calendar since each query's execution time is increased by the latency of the connection. In a local network, the latency is negligible; on a WAN, we still have a problem.

In the next article, well see if we can't reduce the number of queries being executed.


Anything formulated with the query API is guaranteed to be executed by the data provider (even if it must be evaluated locally) and anything formulated with Linq is naturally evaluated locally. In this way, the code is clear in what is sent to the server and what is evaluated locally. Quino only very, very rarely issues an "unmappable query" exception, unlike EF, which occasionally requires contortions until you've figured out which C# formulation of a particular expression can be mapped by EF.


  1. This series of articles shows the statistics window as it appears in Winforms applications. The data-provider statistics are also available in Quino web applications as a Glimpse plug-in.

  2. It is important for users of the Microsoft Entity Framework (EF) to point out that Quino does not have a Linq-to-Sql mapper. That means that any Linq expressions like Where() are evaluated locally instead of being mapped to the database. There are various reasons for this but the main one is that we ended up preferring a strict boundary between the mappable query API and the local evaluation API.

  3. Well, the first answer I'm going to pretend I came up with. I actually thought of another answer first, but then quickly discovered that Quino wasn't mapping that little-used feature correctly. I added an issue to tackle that problem at a later date and started looking for workarounds. That fix will be covered in the next article in this series.

  4. Please ignore the fact that we also dropped 13 person queries. This was not due to any fix that we made but rather that I executed the test slightly differently...and was too lazy to make a new screenshot. The 13 queries are still being executed and we'll tackle those in the last article in this series.

Optimizing data access for high-latency networks: part I

imagePunchclock is Encodo's time-tracking and invoicing tool. It includes a calendar to show time entries (shown to the left). Since the very first versions, it hasn't opened very quickly. It was fast enough for most users, but those who worked with Punchclock over the WAN through our VPN have reported that it often takes many seconds to open the calendar. So we have a very useful tool that is not often used because of how slowly it opens.

That the calendar opens slowly in a local network and even more slowly in a WAN indicates that there is not only a problem with executing many queries but also with retrieving too much data.

Looking at query statistics

This seemed like a solvable problem, so I fired up Punchclock in debug mode to have a look at the query-statistics window.

To set up the view shown below, I did the following:

  1. Start your Quino application (Punchclock in this case) in debug mode (so that the statistics window is available)
  2. Open the statistics window from the debug menu
  3. Reset the statistics to clear out anything logged during startup
  4. Group the grid by "Meta Class"
  5. Open the calendar to see what kind of queries are generated
  6. Expand the "TimeEntry" group in the grid to show details for individual queries

image

I marked a few things on the screenshot. It's somewhat suspicious that there are 13 queries for data of type "Person", but we'll get to that later. Much more suspicious is that there are 52 queries for time entries, which seems like quite a lot considering we're showing a calendar for a single user. We would instead expect to have a single query. More queries would be OK if there were good reasons for them, but I feel comfortable in deciding that 52 queries is definitely too many.

A closer look at the details for the time-entry queries shows very high durations for some of them, ranging from a tenth of a second to nearly a second. These queries are definitely the reason the calendar window takes so long to load.

Why are these queries taking so long?

If I select one of the time-entry queries and show the "Query Text" tab (see screenshot below), I can see that it retrieves all time entries for a single person, one after another. There are almost six years of historical data in our Punchclock database and some of our employees have been around for all of them.1 That's a lot of time entries to load.

image

I can also select the "Stack Trace" tab to see where the call originated in my source code. This feature lets me pinpoint the program component that is causing these slow queries to be executed.

image

As with any UI-code stack, you have to be somewhat familiar with how events are handled and dispatched. In this stack, we can see how a MouseUp command bubbled up to create a new form, then a new control and finally, to trigger a call to the data provider during that control's initialization. We don't have line numbers but we see that the call originates in a lambda defined in the DynamicSchedulerControl constructor.

The line of code that I pinpoint as the culprit is shown below.

var people = Session.GetList<Person>().Where(p => p.TimeEntries.Any()).ToList();

This looks like a nicely declarative way of getting data, but to the trained eye of a Quino developer, it's clear what the problem is.

In the next couple of articles, we'll take a closer look at what exactly the problem is and how we can improve the speed of this query. We'll also take a look at how we can improve the Quino query API to make it harder for code like the line above to cause performance problems.



  1. Encodo just turned nine years old, but we used a different time-entry system for the first couple of years. If you're interested in our time-entry software history, here it is:

     1. 06.2005 -- Start off with Open Office spreadsheets
     2. 04.2007 -- Switch to a home-grown, very lightweight time tracker based on an older framework we'd written (Punchclock 1.0)
     3. 08.2008 -- Start development of Quino
     4. 04.2010 -- Initial version of Punchclock 2.0; start dogfooding Quino
    

Working with EF Migrations and branches

The version of EF Migrations discussed in this article is 5.0.20627. The version of Quino is less relevant: the features discussed have been supported for years. For those in a hurry, there is a tl;dr near the end of the article.

We use Microsoft Entity Framework (EF) Migrations in one of our projects where we are unable to use Quino. We were initially happy to be able to automate database-schema changes. After using it for a while, we have decidedly mixed feelings.

As developers of our own schema migration for the Quino ORM, we're always on the lookout for new and better ideas to improve our own product. If we can't use Quino, we try to optimize our development process in each project to cause as little pain as possible.

EF Migrations and branches

We ran into problems in integrating EF Migrations into a development process that uses feature branches. As long as a developer stays on a given branch, there are no problems and EF functions relatively smoothly.1

However, if a developer switches to a different branch -- with different migrations -- EF Migrations is decidedly less helpful. It is, in fact, quite cryptic and blocks progress until you figure out what's going on.

Assume the following not-uncommon situation:

  • The project is created in the master branch
  • The project has an initial migration BASE
  • Developers A and B migrate their databases to BASE
  • Developer A starts branch feature/A and includes migration A in her database
  • Developer B starts branch feature/B and includes migration B in his database

We now have the situation in which two branches have different code and each has its own database schema. Switching from one branch to another with Git quickly and easily addresses the code differences. The database is, unfortunately, a different story.

Let's assume that developer A switches to branch feature/B to continue working there. The natural thing for A to do is to call "update-database" from the Package Manager Console2. This yields the following message -- all-too-familiar to EF Migrations developers.

image

Unable to update database to match the current model because there are pending changes and automatic migration is disabled. Either write the pending changes to a code-based migration or enable automatic migration. [...]

This situation happens regularly when working with multiple branches. It's even possible to screw up a commit within a single branch, as illustrated in the following real-world example.

  • Add two fields to an existing class
  • Generate a migration with code that adds two fields
  • Migrate the database
  • Realize that you don't need one of the two fields
  • Remove the C# code from the migration for that field
  • Tests run green
  • Commit everything and push it

As far as you're concerned, you committed a single field to the model. When your co-worker runs that migration, it will be applied, but EF Migrations immediately thereafter complains that there are pending model changes to make. How can that be?

Out-of-sync migrations != outdated database

Just to focus, we're actually trying to get real work done, not necessarily debug EF Migrations. We want to answer the following questions:

  1. Why is EF Migrations having a problem updating the schema?
  2. How do I quickly and reliably update my database to use the current schema if EF Migrations refuses to do it?

The underlying reason why EF Migrations has problems is that it does not actually know what the schema of the database is. It doesn't read the schema from the database itself, but relies instead on a copy of the EF model that it stored in the database when it last performed a successful migration.

That copy of the model is also stored in the resource file generated for the migration. EF Migrations does this so that the migration includes information about which changes it needs to apply and about the model to which the change can be applied.

If the model stored in the database does not match the model stored with the migration that you're trying to apply, EF Migrations will not update the database. This is probably for the best, but leads us to the second question above: what do we have to do to get the database updated?

Generate a migration for those "pending changes"

The answer has already been hinted at above: we need to fix the model stored in the database for the last migration.

Let's take a look at the situation above in which your colleague downloaded what you thought was a clean commit.

From the Package Manager Console, run add-migration foo to scaffold a migration for the so-called "pending changes" that EF Migrations detected. That's interesting: EF Migrations thinks that your colleague should generate a migration to drop the column that you'd only temporarily added but never checked in.

That is, the column isn't in his database, it's not in your database, but EF Migrations is convinced that it was once in the model and must be dropped.

How does EF Migrations even know about a column that you added to your own database but that you removed from the code before committing? What dark magic is this?

The answer is probably obvious: you did check in the change. The part that you can easily remove (the C# code) is only half of the migration. As mentioned above, the other part is a binary chunk stored in the resource file associated with each migration. These BLOBS are stored in the table _MigrationHistory table in the database.

imageimage

How to fix this problem and get back to work

Here's the tl;dr: generate a "fake" migration, remove all of the C# code that would apply changes to the database (shown below) and execute update-database from the Package Manager Console.

image

This may look like it does exactly nothing. What actually happens is that it includes the current state of the EF model in the binary data for the last migration applied to the database (because you just applied it).

Once you've applied the migration, delete the files and remove them from the project. This migration was only generated to fix your local database; do not commit it.

Everything's cool now, right?

Applying the fix above doesn't mean that you won't get database errors. If your database schema does not actually match the application model, EF will crash when it assumes fields or tables are available which do not exist in your database.

Sometimes, the only way to really clean up a damaged database -- especially if you don't have the code for the migrations that were applied there3 -- is to remove the misapplied migrations from your database, undo all of the changes to the schema (manually, of course) and then generate a new migration that starts from a known good schema.

Conclusions and comparison to Quino

The obvious answer to the complaint "it hurts when I do this" is "stop doing that". We would dearly love to avoid these EF Migrations-related issues but developing without any schema-migration support is even more unthinkable.

We'd have to create upgrade scripts manually or would have to maintain scripts to generate a working development database and this in each branch. When branches are merged, the database-upgrade scripts have to be merged and tested as well. This would be a significant addition to our development process, has maintainability and quality issues and would probably slow us down even more.

And we're certainly not going to stop developing with branches, either.

We were hoping to avoid all of this pain by using EF Migrations. That EF Migrations makes us think of going back to manual schema migration is proof that it's not nearly as elegant a solution as our own Quino schema migration, which never gave us these problems.

Quino actually reads the schema in the database and compares that model directly against the current application model. The schema migrator generates a custom list of differences that map from the current schema to the desired schema and applies them. There is user intervention but it's hardly ever really required. This is an absolute godsend during development where we can freely switch between branches without any hassle.4

Quino doesn't recognize "upgrade" versus "downgrade" but instead applies "changes". This paradigm has proven to be a much better fit for our agile, multi-branch style of development and lets us focus on our actual work rather than fighting with tools and libraries.



  1. EF Migrations as we use it is tightly bound to SQL Server. Just as one example, the inability of SQL Server to resolve cyclic cascade dependencies is in no way shielded by EF Migrations. Though the drawback originates in SQL Server, EF Migrations simply propagates it to the developer, even though it purports to provide an abstraction layer. Quino, on the other hand, does the heavy lifting of managing triggers to circumvent this limitation.

  2. As an aside, this is a spectacularly misleading name for a program feature. It should just be called "Console".

  3. I haven't ever been able to use the Downgrade method that is generated with each migration, but perhaps someone with more experience could explain how to properly apply such a thing. If that doesn't work, the method outlined above is your only fallback.

  4. The aforementioned database-script maintenance or having only very discrete schema-update points or maintaining a database per branch and switching with configuration files or using database backups or any other schemes that end up distracting you from working.

Question to consider when designing APIs: Part II

In the previous article, we listed a lot of questions that you should continuously ask yourself when you're writing code. Even when you think you're not designing anything, you're actually making decisions that will affect either other team members or future versions of you.

In particular, we'd like to think about how we can reconcile a development process that involves asking so many questions and taking so many facets into consideration with YAGNI.

Designing != Implementing

The implication of this principle is, that if you aren't going to need something, then there's no point in even thinking about it. While it's absolutely commendable to adopt a YAGNI attitude, not building something doesn't mean not thinking about it and identifying potential pitfalls.

A feature or design concept can be discussed within a time-box. Allocate a fixed, limited amount of time to determine whether the feature or design concept needs to be incorporated, whether it would be nice to incorporate it or possibly to jettison it if it's too much work and isn't really necessary.

The overwhelming majority of time wasted on a feature is in the implementation, debugging, testing, documentation and maintenance of it, not in the design. Granted, a long design phase can be a time-sink -- especially a "perfect is the enemy of the good" style of design where you're completely blocked from even starting work. With practice, however, you'll learn how to think about a feature or design concept (e.g. extensibility) without letting it ruin your schedule.

If you don't try to anticipate future needs at all while designing your API, you may end up preventing that API from being extended in directions that are both logical and could easily have been anticipated. If the API is not extensible, then it will not be used and may have to be rewritten in the future, losing more time at that point rather than up front. This is, however, only a consideration you must make. It's perfectly acceptable to decide that you currently don't care at all and that a feature will have to be rewritten at some point in the future.

You can't do this kind of cost-benefit analysis and risk-management if you haven't taken time to identify the costs, benefits or risks.

Document your process

At Encodo, we encourage the person who's already spent time thinking about this problem to simply document the drawbacks and concessions and possible ideas in an issue-tracker entry that is linked to the current implementation. This allows future users, maintainers or extenders of the API to be aware of the thought process that underlies a feature. It can also help to avoid misunderstandings about what the intended audience and coverage of an API are.

The idea is to eliminate assumptions. A lot of time can be wasted when maintenance developers make incorrect assumptions about the intent of code.

If you don't have time to do any of this, then you can write a quick note in a task list that you need to more fully document your thoughts on the code you're writing. And you should try to do that soon, while the ideas are still relatively fresh in your mind. If you don't have time to think about what you're doing even to that degree, then you're doing something wrong and need to get organized better.

That is, you if you can't think about the code you're writing and don't have time to document your process, even minimally, then you shouldn't be writing that code. Either that, or you implicitly accept that others will have to clean up your mess. And "others" includes future versions of you. (E.g. the you who, six months from now, is muttering, "who wrote this crap?!?")

Be Honest about Hacking

As an example, we can consider how we go from a specific feature in the context of a project to thinking about where the functionality could fit in to a suite of products -- that may or may not yet exist. And remember, we're only thinking about these things. And we're thinking about them for a limited time -- a time-box. You don't want to prevent your project from moving forward, but you also don't want to advance at all costs.

Advancing in an unstructured way is called hacking and, while it can lead to a short-term win, it almost always leads to short-to-medium term deficits. You can still write code that is hacked and looks hacked, if that is the highest current priority, but you're not allowed to forget that you did so. You must officially designate what you're doing as a hot-zone of hacking so that the Hazmat team can clean it up later, if needed.

A working prototype that is hacked together just so it works for the next demonstration is great as long as you don't think that you can take it into production without doing the design and documentation work that you initially skipped.

If you fail to document the deficits that prevent you from taking a prototype to production, then how will you address those deficits? It will cost you much more time and pain to determine the deficits after the fact. Not only that, but unless you do a very good job, it is your users that will most likely be finding deficits -- in the form of bugs.

If your product is just a hacked mess of spaghetti code with no rhyme or reason, another developer will be faster and produce more reliable code by just starting over. Trying to determine the flaws, drawbacks and hacks through intuition and reverse-engineering is slower and more error-prone than just starting with a clean slate. Developers on such a project will not be able to save time -- and money -- by building on what you've already made.

A note on error-handling

Not to be forgotten is a structured approach to error-handling. The more "hacked" the code, the more stringent the error-checking should be. If you haven't had time yet to write or test code sufficiently, then that code shouldn't be making broad decisions about what it thinks are acceptable errors.

Fail early, fail often. Don't try to make a hacked mess of code bullet-proof by catching all errors in an undocumented manner. Doing so is deceptive to testers of the product as well as other developers.

If you're building a demo, make sure the happy path works and stick to it during the demo. If you do have to break this rule, add the hacks to a demo-specific branch of the code that will be discarded later.

Working with a documented project

If, however, the developer can look at your code and sees accompanying notes (either in an issue tracker, as TODOs in the code or some other form of documentation), that developer knows where to start fixing the code to bring it to production quality.

For example, it's acceptable to configure an application in code as long as you do it in a central place and you document that the intent is to move the configuration to an external source when there's time. If a future developer finds code for support for multiple database connections and tests that are set to ignore with a note/issue that says "extend to support multiple databases", that future developer can decide whether to actually implement the feature or whether to just discard it because it has been deprecated as a requirement.

Without documentation or structure or an indication which parts of the code were thought-through and which are considered to be hacked, subsequent developers are forced to make assumptions that may not be accurate. They will either assume that hacked code is OK or that battle-tested code is garbage. If you don't inform other developers of your intent when your're writing the code -- best done with documentation, tests and/or a cleanly designed API -- then it might be discarded or ignored, wasting even more time and money.

If you're on a really tight time-budget and don't have time to document your process correctly, then write a quick note that you think the design is OK or the code is OK, but tell your future self or other developers what they're looking at. It will only take you a few minutes and you'll be glad you did -- and so will they.