Optimizing data access for high-latency networks: part II

imageIn the previous article, we discussed a performance problem in the calendar of Encodo's time-tracking product, Punchclock.

Instead of guessing at the problem, we profiled the application using the database-statistics window available to all Quino applications.1 We quickly discovered that most of the slowdown stems from the relatively innocuous line of code shown below.

var people = 
  Where(p => p.TimeEntries.Any()).

First things first: what does the code do?

Before doing anything else, we should establish what the code does. Logically, it retrieves a list of people in the database who have recorded at least one time entry.

The first question we should ask at this point is: does the application even need to do this? The answer in this case is 'yes'. The calendar includes a drop-down control that lets the user switch between the calendars for different users. This query returns the people to show in this drop-down control.

With the intent and usefulness of the code established, let's dissect how it is accomplishing the task.

  1. The Session.GetList<Person>() portion retrieves a list of all people from the database
  2. The Where() method is applied locally for each object in the list2
  3. For a given person, the list of TimeEntries is accessed
  4. This access triggers a lazy load of the list
  5. The Any() method is applied to the full list of time entries
  6. The ToList() method creates a list of all people who match the condition

Though the line of code looks innocuous enough, it causes a huge number of objects to be retrieved, materialized and retained in memory -- simply in order to check whether there is at least one object.

This is a real-world example of a performance problem that can happen to any developer. Instead of blaming the developer who wrote this line of code, its more important to stay vigilant to performance problems and to have tools available to quickly and easily find them.

Stop creating all of the objects

The first solution I came up with3 was to stop creating objects that I didn't need. A good way of doing this and one that was covered in Quino: partially-mapped queries is to use cursors instead of lists. Instead of using the generated list TimeEntries, the following code retrieves a cursor on that list's query and materializes at most one object for the sub-query.

var people = Session.GetList<Person>().Select(p =>
  using (var cursor = Session.CreateCursor<TimeEntry>(p.TimeEntries.Query))[^4]
    return cursor.Any();

A check of the database statistics shows improvement, as shown below.


Just by using cursors, we've managed to reduce the execution time for each query by about 75%.4 Since all we're interested in finding out is whether there is at least one time entry for a person, we could also ask the database to count objects rather than to return them. That should be even faster. The following code is very similar to the example above but, instead of getting a cursor based on the TimeEntries query, it gets the count.

var people =
  Where(p => Session.GetCount(p.TimeEntries.Query) > 0).

How did we do? A check of the database statistics shows even more improvement, as shown below.


We're now down to a few dozen milliseconds for all of our queries, so we're done, right? A 95% reduction in query-execution time should be enough.

Unfortunately, we're still executing just as many queries as before, even though we're taking far less time to execute them. This is better, but still not optimal. In high-latency situations, the user is still likely to experience a significant delay when opening the calendar since each query's execution time is increased by the latency of the connection. In a local network, the latency is negligible; on a WAN, we still have a problem.

In the next article, well see if we can't reduce the number of queries being executed.

Anything formulated with the query API is guaranteed to be executed by the data provider (even if it must be evaluated locally) and anything formulated with Linq is naturally evaluated locally. In this way, the code is clear in what is sent to the server and what is evaluated locally. Quino only very, very rarely issues an "unmappable query" exception, unlike EF, which occasionally requires contortions until you've figured out which C# formulation of a particular expression can be mapped by EF.

  1. This series of articles shows the statistics window as it appears in Winforms applications. The data-provider statistics are also available in Quino web applications as a Glimpse plug-in.

  2. It is important for users of the Microsoft Entity Framework (EF) to point out that Quino does not have a Linq-to-Sql mapper. That means that any Linq expressions like Where() are evaluated locally instead of being mapped to the database. There are various reasons for this but the main one is that we ended up preferring a strict boundary between the mappable query API and the local evaluation API.

  3. Well, the first answer I'm going to pretend I came up with. I actually thought of another answer first, but then quickly discovered that Quino wasn't mapping that little-used feature correctly. I added an issue to tackle that problem at a later date and started looking for workarounds. That fix will be covered in the next article in this series.

  4. Please ignore the fact that we also dropped 13 person queries. This was not due to any fix that we made but rather that I executed the test slightly differently...and was too lazy to make a new screenshot. The 13 queries are still being executed and we'll tackle those in the last article in this series.

Optimizing data access for high-latency networks: part I

imagePunchclock is Encodo's time-tracking and invoicing tool. It includes a calendar to show time entries (shown to the left). Since the very first versions, it hasn't opened very quickly. It was fast enough for most users, but those who worked with Punchclock over the WAN through our VPN have reported that it often takes many seconds to open the calendar. So we have a very useful tool that is not often used because of how slowly it opens.

That the calendar opens slowly in a local network and even more slowly in a WAN indicates that there is not only a problem with executing many queries but also with retrieving too much data.

Looking at query statistics

This seemed like a solvable problem, so I fired up Punchclock in debug mode to have a look at the query-statistics window.

To set up the view shown below, I did the following:

  1. Start your Quino application (Punchclock in this case) in debug mode (so that the statistics window is available)
  2. Open the statistics window from the debug menu
  3. Reset the statistics to clear out anything logged during startup
  4. Group the grid by "Meta Class"
  5. Open the calendar to see what kind of queries are generated
  6. Expand the "TimeEntry" group in the grid to show details for individual queries


I marked a few things on the screenshot. It's somewhat suspicious that there are 13 queries for data of type "Person", but we'll get to that later. Much more suspicious is that there are 52 queries for time entries, which seems like quite a lot considering we're showing a calendar for a single user. We would instead expect to have a single query. More queries would be OK if there were good reasons for them, but I feel comfortable in deciding that 52 queries is definitely too many.

A closer look at the details for the time-entry queries shows very high durations for some of them, ranging from a tenth of a second to nearly a second. These queries are definitely the reason the calendar window takes so long to load.

Why are these queries taking so long?

If I select one of the time-entry queries and show the "Query Text" tab (see screenshot below), I can see that it retrieves all time entries for a single person, one after another. There are almost six years of historical data in our Punchclock database and some of our employees have been around for all of them.1 That's a lot of time entries to load.


I can also select the "Stack Trace" tab to see where the call originated in my source code. This feature lets me pinpoint the program component that is causing these slow queries to be executed.


As with any UI-code stack, you have to be somewhat familiar with how events are handled and dispatched. In this stack, we can see how a MouseUp command bubbled up to create a new form, then a new control and finally, to trigger a call to the data provider during that control's initialization. We don't have line numbers but we see that the call originates in a lambda defined in the DynamicSchedulerControl constructor.

The line of code that I pinpoint as the culprit is shown below.

var people = Session.GetList<Person>().Where(p => p.TimeEntries.Any()).ToList();

This looks like a nicely declarative way of getting data, but to the trained eye of a Quino developer, it's clear what the problem is.

In the next couple of articles, we'll take a closer look at what exactly the problem is and how we can improve the speed of this query. We'll also take a look at how we can improve the Quino query API to make it harder for code like the line above to cause performance problems.

  1. Encodo just turned nine years old, but we used a different time-entry system for the first couple of years. If you're interested in our time-entry software history, here it is:

     1. 06.2005 -- Start off with Open Office spreadsheets
     2. 04.2007 -- Switch to a home-grown, very lightweight time tracker based on an older framework we'd written (Punchclock 1.0)
     3. 08.2008 -- Start development of Quino
     4. 04.2010 -- Initial version of Punchclock 2.0; start dogfooding Quino

Working with EF Migrations and branches

The version of EF Migrations discussed in this article is 5.0.20627. The version of Quino is less relevant: the features discussed have been supported for years. For those in a hurry, there is a tl;dr near the end of the article.

We use Microsoft Entity Framework (EF) Migrations in one of our projects where we are unable to use Quino. We were initially happy to be able to automate database-schema changes. After using it for a while, we have decidedly mixed feelings.

As developers of our own schema migration for the Quino ORM, we're always on the lookout for new and better ideas to improve our own product. If we can't use Quino, we try to optimize our development process in each project to cause as little pain as possible.

EF Migrations and branches

We ran into problems in integrating EF Migrations into a development process that uses feature branches. As long as a developer stays on a given branch, there are no problems and EF functions relatively smoothly.1

However, if a developer switches to a different branch -- with different migrations -- EF Migrations is decidedly less helpful. It is, in fact, quite cryptic and blocks progress until you figure out what's going on.

Assume the following not-uncommon situation:

  • The project is created in the master branch
  • The project has an initial migration BASE
  • Developers A and B migrate their databases to BASE
  • Developer A starts branch feature/A and includes migration A in her database
  • Developer B starts branch feature/B and includes migration B in his database

We now have the situation in which two branches have different code and each has its own database schema. Switching from one branch to another with Git quickly and easily addresses the code differences. The database is, unfortunately, a different story.

Let's assume that developer A switches to branch feature/B to continue working there. The natural thing for A to do is to call "update-database" from the Package Manager Console2. This yields the following message -- all-too-familiar to EF Migrations developers.


Unable to update database to match the current model because there are pending changes and automatic migration is disabled. Either write the pending changes to a code-based migration or enable automatic migration. [...]

This situation happens regularly when working with multiple branches. It's even possible to screw up a commit within a single branch, as illustrated in the following real-world example.

  • Add two fields to an existing class
  • Generate a migration with code that adds two fields
  • Migrate the database
  • Realize that you don't need one of the two fields
  • Remove the C# code from the migration for that field
  • Tests run green
  • Commit everything and push it

As far as you're concerned, you committed a single field to the model. When your co-worker runs that migration, it will be applied, but EF Migrations immediately thereafter complains that there are pending model changes to make. How can that be?

Out-of-sync migrations != outdated database

Just to focus, we're actually trying to get real work done, not necessarily debug EF Migrations. We want to answer the following questions:

  1. Why is EF Migrations having a problem updating the schema?
  2. How do I quickly and reliably update my database to use the current schema if EF Migrations refuses to do it?

The underlying reason why EF Migrations has problems is that it does not actually know what the schema of the database is. It doesn't read the schema from the database itself, but relies instead on a copy of the EF model that it stored in the database when it last performed a successful migration.

That copy of the model is also stored in the resource file generated for the migration. EF Migrations does this so that the migration includes information about which changes it needs to apply and about the model to which the change can be applied.

If the model stored in the database does not match the model stored with the migration that you're trying to apply, EF Migrations will not update the database. This is probably for the best, but leads us to the second question above: what do we have to do to get the database updated?

Generate a migration for those "pending changes"

The answer has already been hinted at above: we need to fix the model stored in the database for the last migration.

Let's take a look at the situation above in which your colleague downloaded what you thought was a clean commit.

From the Package Manager Console, run add-migration foo to scaffold a migration for the so-called "pending changes" that EF Migrations detected. That's interesting: EF Migrations thinks that your colleague should generate a migration to drop the column that you'd only temporarily added but never checked in.

That is, the column isn't in his database, it's not in your database, but EF Migrations is convinced that it was once in the model and must be dropped.

How does EF Migrations even know about a column that you added to your own database but that you removed from the code before committing? What dark magic is this?

The answer is probably obvious: you did check in the change. The part that you can easily remove (the C# code) is only half of the migration. As mentioned above, the other part is a binary chunk stored in the resource file associated with each migration. These BLOBS are stored in the table _MigrationHistory table in the database.


How to fix this problem and get back to work

Here's the tl;dr: generate a "fake" migration, remove all of the C# code that would apply changes to the database (shown below) and execute update-database from the Package Manager Console.


This may look like it does exactly nothing. What actually happens is that it includes the current state of the EF model in the binary data for the last migration applied to the database (because you just applied it).

Once you've applied the migration, delete the files and remove them from the project. This migration was only generated to fix your local database; do not commit it.

Everything's cool now, right?

Applying the fix above doesn't mean that you won't get database errors. If your database schema does not actually match the application model, EF will crash when it assumes fields or tables are available which do not exist in your database.

Sometimes, the only way to really clean up a damaged database -- especially if you don't have the code for the migrations that were applied there3 -- is to remove the misapplied migrations from your database, undo all of the changes to the schema (manually, of course) and then generate a new migration that starts from a known good schema.

Conclusions and comparison to Quino

The obvious answer to the complaint "it hurts when I do this" is "stop doing that". We would dearly love to avoid these EF Migrations-related issues but developing without any schema-migration support is even more unthinkable.

We'd have to create upgrade scripts manually or would have to maintain scripts to generate a working development database and this in each branch. When branches are merged, the database-upgrade scripts have to be merged and tested as well. This would be a significant addition to our development process, has maintainability and quality issues and would probably slow us down even more.

And we're certainly not going to stop developing with branches, either.

We were hoping to avoid all of this pain by using EF Migrations. That EF Migrations makes us think of going back to manual schema migration is proof that it's not nearly as elegant a solution as our own Quino schema migration, which never gave us these problems.

Quino actually reads the schema in the database and compares that model directly against the current application model. The schema migrator generates a custom list of differences that map from the current schema to the desired schema and applies them. There is user intervention but it's hardly ever really required. This is an absolute godsend during development where we can freely switch between branches without any hassle.4

Quino doesn't recognize "upgrade" versus "downgrade" but instead applies "changes". This paradigm has proven to be a much better fit for our agile, multi-branch style of development and lets us focus on our actual work rather than fighting with tools and libraries.

  1. EF Migrations as we use it is tightly bound to SQL Server. Just as one example, the inability of SQL Server to resolve cyclic cascade dependencies is in no way shielded by EF Migrations. Though the drawback originates in SQL Server, EF Migrations simply propagates it to the developer, even though it purports to provide an abstraction layer. Quino, on the other hand, does the heavy lifting of managing triggers to circumvent this limitation.

  2. As an aside, this is a spectacularly misleading name for a program feature. It should just be called "Console".

  3. I haven't ever been able to use the Downgrade method that is generated with each migration, but perhaps someone with more experience could explain how to properly apply such a thing. If that doesn't work, the method outlined above is your only fallback.

  4. The aforementioned database-script maintenance or having only very discrete schema-update points or maintaining a database per branch and switching with configuration files or using database backups or any other schemes that end up distracting you from working.

Question to consider when designing APIs: Part II

In the previous article, we listed a lot of questions that you should continuously ask yourself when you're writing code. Even when you think you're not designing anything, you're actually making decisions that will affect either other team members or future versions of you.

In particular, we'd like to think about how we can reconcile a development process that involves asking so many questions and taking so many facets into consideration with YAGNI.

Designing != Implementing

The implication of this principle is, that if you aren't going to need something, then there's no point in even thinking about it. While it's absolutely commendable to adopt a YAGNI attitude, not building something doesn't mean not thinking about it and identifying potential pitfalls.

A feature or design concept can be discussed within a time-box. Allocate a fixed, limited amount of time to determine whether the feature or design concept needs to be incorporated, whether it would be nice to incorporate it or possibly to jettison it if it's too much work and isn't really necessary.

The overwhelming majority of time wasted on a feature is in the implementation, debugging, testing, documentation and maintenance of it, not in the design. Granted, a long design phase can be a time-sink -- especially a "perfect is the enemy of the good" style of design where you're completely blocked from even starting work. With practice, however, you'll learn how to think about a feature or design concept (e.g. extensibility) without letting it ruin your schedule.

If you don't try to anticipate future needs at all while designing your API, you may end up preventing that API from being extended in directions that are both logical and could easily have been anticipated. If the API is not extensible, then it will not be used and may have to be rewritten in the future, losing more time at that point rather than up front. This is, however, only a consideration you must make. It's perfectly acceptable to decide that you currently don't care at all and that a feature will have to be rewritten at some point in the future.

You can't do this kind of cost-benefit analysis and risk-management if you haven't taken time to identify the costs, benefits or risks.

Document your process

At Encodo, we encourage the person who's already spent time thinking about this problem to simply document the drawbacks and concessions and possible ideas in an issue-tracker entry that is linked to the current implementation. This allows future users, maintainers or extenders of the API to be aware of the thought process that underlies a feature. It can also help to avoid misunderstandings about what the intended audience and coverage of an API are.

The idea is to eliminate assumptions. A lot of time can be wasted when maintenance developers make incorrect assumptions about the intent of code.

If you don't have time to do any of this, then you can write a quick note in a task list that you need to more fully document your thoughts on the code you're writing. And you should try to do that soon, while the ideas are still relatively fresh in your mind. If you don't have time to think about what you're doing even to that degree, then you're doing something wrong and need to get organized better.

That is, you if you can't think about the code you're writing and don't have time to document your process, even minimally, then you shouldn't be writing that code. Either that, or you implicitly accept that others will have to clean up your mess. And "others" includes future versions of you. (E.g. the you who, six months from now, is muttering, "who wrote this crap?!?")

Be Honest about Hacking

As an example, we can consider how we go from a specific feature in the context of a project to thinking about where the functionality could fit in to a suite of products -- that may or may not yet exist. And remember, we're only thinking about these things. And we're thinking about them for a limited time -- a time-box. You don't want to prevent your project from moving forward, but you also don't want to advance at all costs.

Advancing in an unstructured way is called hacking and, while it can lead to a short-term win, it almost always leads to short-to-medium term deficits. You can still write code that is hacked and looks hacked, if that is the highest current priority, but you're not allowed to forget that you did so. You must officially designate what you're doing as a hot-zone of hacking so that the Hazmat team can clean it up later, if needed.

A working prototype that is hacked together just so it works for the next demonstration is great as long as you don't think that you can take it into production without doing the design and documentation work that you initially skipped.

If you fail to document the deficits that prevent you from taking a prototype to production, then how will you address those deficits? It will cost you much more time and pain to determine the deficits after the fact. Not only that, but unless you do a very good job, it is your users that will most likely be finding deficits -- in the form of bugs.

If your product is just a hacked mess of spaghetti code with no rhyme or reason, another developer will be faster and produce more reliable code by just starting over. Trying to determine the flaws, drawbacks and hacks through intuition and reverse-engineering is slower and more error-prone than just starting with a clean slate. Developers on such a project will not be able to save time -- and money -- by building on what you've already made.

A note on error-handling

Not to be forgotten is a structured approach to error-handling. The more "hacked" the code, the more stringent the error-checking should be. If you haven't had time yet to write or test code sufficiently, then that code shouldn't be making broad decisions about what it thinks are acceptable errors.

Fail early, fail often. Don't try to make a hacked mess of code bullet-proof by catching all errors in an undocumented manner. Doing so is deceptive to testers of the product as well as other developers.

If you're building a demo, make sure the happy path works and stick to it during the demo. If you do have to break this rule, add the hacks to a demo-specific branch of the code that will be discarded later.

Working with a documented project

If, however, the developer can look at your code and sees accompanying notes (either in an issue tracker, as TODOs in the code or some other form of documentation), that developer knows where to start fixing the code to bring it to production quality.

For example, it's acceptable to configure an application in code as long as you do it in a central place and you document that the intent is to move the configuration to an external source when there's time. If a future developer finds code for support for multiple database connections and tests that are set to ignore with a note/issue that says "extend to support multiple databases", that future developer can decide whether to actually implement the feature or whether to just discard it because it has been deprecated as a requirement.

Without documentation or structure or an indication which parts of the code were thought-through and which are considered to be hacked, subsequent developers are forced to make assumptions that may not be accurate. They will either assume that hacked code is OK or that battle-tested code is garbage. If you don't inform other developers of your intent when your're writing the code -- best done with documentation, tests and/or a cleanly designed API -- then it might be discarded or ignored, wasting even more time and money.

If you're on a really tight time-budget and don't have time to document your process correctly, then write a quick note that you think the design is OK or the code is OK, but tell your future self or other developers what they're looking at. It will only take you a few minutes and you'll be glad you did -- and so will they.

Questions to consider when designing APIs: Part I

A big part of an agile programmer's job is API design. In an agile project, the architecture is defined from on high only in broad strokes, leaving the fine details of component design up to the implementer. Even in projects that are specified in much more detail, implementers will still find themselves in situations where they have to design something.

This means that programmers in an agile team have to be capable of weighing the pros and cons of various approaches in order to avoid causing performance, scalability, maintenance or other problems as the API is used and evolves.

When designing an API, we consider some of the following aspects. This is not meant to be a comprehensive list, but should get you thinking about how to think about the code you're about to write.

Reusing Code

  • Will this code be re-used inside the project?
  • How about outside of the project?
  • If the code might be used elsewhere, where does that need lie on the time axis?
  • Do other projects already exist that could use this code?
  • Are there already other implementations that could be used?
  • If there are implementations, then are they insufficient?
  • Or perhaps not sufficiently encapsulated for reuse as written?
  • How likely is it that there will be other projects that need to do the same thing?
  • If another use is likely, when would the other project or projects need your API?

Organizing Code

  • Where should the API live in the code?
  • Is your API local to this class?
  • Is it private?
  • Protected?
  • Are you making it public in an extension method?
  • Or internal?
  • Which namespace should it belong to?
  • Which assembly?

Testing Code

  • What about testability?
  • How can the functionality be tested?

Even if you don't have time to write tests right now, you should still build your code so that it can be tested. It's possible that you won't be writing the tests. Instead, you should prepare the code so that others can use it.

It's also possible that a future you will be writing the tests and will hate you for having made it so hard to automate testing.

Managing Dependencies

  • Is multi-threading a consideration?
  • Does the API manage state?
  • What kind of dependencies does the API have?
  • Which dependencies does it really need?
  • Is the API perhaps composed of several aspects?
  • With a core aspect that is extended by others?
  • Can core functionality be extracted to avoid making an API that is too specific?

Documenting Code

  • How do callers use the API?
  • What are the expected values?
  • Are these expectations enforced?
  • What is the error mechanism?
  • What guarantees does the API make?
  • Is the behavior of the API enforced?
  • Is it at least documented?
  • Are known drawbacks documented?


This is a very important one and involves how your application handles situations outside of the design.

  • If you handle externally provided data, then you have to handle extant cases
  • Are you going to log errors?
  • In which format?
  • Is there a standard logging mechanism?
  • How are you going to handle and fix persistent errors?
  • Are you even going to handle weird cases?
  • Or are you going to fail early and fail often?
  • For which errors should your code even responsible?
  • How does your chosen philosophy (and you should be enforcing contracts) fit with the other code in the project?

Fail fast; enforce contracts

While we're on the subject of error-handling, I want to emphasize that this is one of the most important parts of API design, regardless of which language or environment you use.1

Add preconditions for all method parameters; verify them as non-null and verify ranges. Do not catch all exceptions and log them or -- even worse -- ignore them. This is even more important in environments -- I'm looking at you client-side web code in general and JavaScript in particular -- where the established philosophy is to run anything and to never rap a programmer on the knuckles for having written really knuckle-headed code.

You haven't tested the code, so you don't know what kind of errors you're going to get. If you ignore everything, then you'll also ignore assertions, contract violations, null-reference exceptions and so on. The code will never be improved if it never makes a noise. It will just stay silently crappy until someone notices a subtle logical error somewhere and must painstakingly track it down to your untested code.

You might say that production code shouldn't throw exceptions. This is true, but we're explicitly not talking about production code here. We're talking about code that has few to no tests and is acknowledged to be incomplete. If you move code like this into production, then it's better to crash than to silently corrupt data or impinge the user experience.

A crash will get attention and the code may even be fixed or improved. If you write code that will crash on all but the "happy path" and it never crashes? That's great. Do not program preemptively defensively in fresh code. If you have established code that interfaces with other (possibly external) components and you sometimes get errors that you can't work around in any other way, then it's OK to catch and log those exceptions rather than propagating them. At least you tried.

In the next article, we'll take a look at how all of these questions and considerations can at all be reconciled with YAGNI. Spoiler alert: we think that they can.

  1. I recently read Erlang and code style by Jesper L. Andersen, which seems to have less to do with programming Erlang and much more to do with programming properly. The advice contained in it seems to be only for Erlang programmers, but the idea of strictly enforcing APIs between software components is neither new nor language-specific.

REST API Status codes (400 vs. 500)

In a project that we're working on, we're consuming REST APIs delivered by services built by another team working for the same customer. We had a discussion about what were appropriate error codes to return for various situations. The discussion boiled down to: should a service return a 500 error code or a 400 error code when a request cannot be processed?

I took a quick look at the documentation for a couple of the larger REST API providers and they are using the 500 code only for catastrophic failure and using the 400 code for anything related to query-input validation errors.

Microsoft Azure Common REST API Error Codes

Code 400:

  • The requested URI does not represent any resource on the server.
  • One of the request inputs is out of range.
  • One of the request inputs is not valid.
  • A required query parameter was not specified for this request.
  • One of the query parameters specified in the request URI is not supported.
  • An invalid value was specified for one of the query parameters in the request URI.

Code 500:

  • The server encountered an internal error. Please retry the request.
  • The operation could not be completed within the permitted time.
  • The server is currently unable to receive requests. Please retry your request.

Twitter Error Codes & Responses

Code 400:

The request was invalid or cannot be otherwise served. An accompanying error message will explain further.

Code 500:

Something is broken. Please post to the group so the Twitter team can investigate.

REST API Tutorial HTTP Status Codes

Code 400:

General error when fulfilling the request would cause an invalid state. Domain validation errors, missing data, etc. are some examples.

Code 500:

A generic error message, given when no more specific message is suitable. The general catch-all error when the server-side throws an exception. Use this only for errors that the consumer cannot address from their endnever return this intentionally.

REST HTTP status codes

For input validation failure: 400 Bad Request + your optional description. This is suggested in the book "RESTful Web Services".

Dealing with improper disposal in WCF clients

There's an old problem in generated WCF clients in which the Dispose() method calls Close() on the client irrespective of whether there was a fault. If there was a fault, then the method should call Abort() instead. Failure to do so causes another exception, which masks the original exception. Client code will see the subsequent fault rather than the original one. A developer running the code in debug mode will have be misled as to what really happened.

You can see WCF Clients and the "Broken" IDisposable Implementation by David Barrett for a more in-depth analysis, but that's the gist of it.

This issue is still present in the ClientBase implementation in .NET 4.5.1. The linked article shows how you can add your own implementation of the Dispose() method in each generated client. An alternative is to use a generic adaptor if you don't feel like adding a custom dispose to every client you create.1

**public class** SafeClient<T> : IDisposable
  **where** T : ICommunicationObject, IDisposable
  **public** SafeClient(T client)
    **if** (client == **null**) { **throw new** ArgumentNullException("client"); }

    Client = client;
  **public** T Client { **get**; **private set**; }

  **public void** Dispose()

  **protected virtual void** Dispose(**bool** disposing)
    **if** (disposing)
      **if** (Client != **null**)
        **if** (Client.State == CommunicationState.Faulted) 

        Client = **default**(T);

To use your WCF client safely, you wrap it in the class defined above, as shown below.

**using** (**var** safeClient = **new** SafeClient<SystemLoginServiceClient>(**new** SystemLoginServiceClient(...)))
  **var** client = safeClient.Client;
  // Work with "client"

If you can figure out how to initialize your clients without passing parameters to the constructor, you could slim it down by adding a "new" generic constraint to the parameter T in SafeClient and then using the SafeClient as follows:

**using** (**var** safeClient = **new** SafeClient<SystemLoginServiceClient>())
  **var** client = safeClient.Client;
  // Work with "client"

  1. The code included in this article is a sketch of a solution and has not been tested. It does compile, though.

OpenBSD takes on OpenSSL

imageMuch of the Internet has been affected by the Heartbleed vulnerability in the widely used OpenSSL server-side software. The bug effectively allows anyone to collect random data from the memory of machines running the affected software, which was about 60% of encrypted sites worldwide. A massive cleanup effort ensued, but the vulnerability has been in the software for two years, so there's no telling how much information was stolen in the interim.

The OpenSSL software is used not only to encrypt HTTPS connections to web servers but also to generate the certificates that undergird those connections as well as many PKIs. Since data could have been stolen over a period of two years, it should be assumed that certificates, usernames and passwords have been stolen as well. Pessimism is the only way to be sure.1

In fact, any data that was loaded into memory on a server running a pre-Heartbleed version of the OpenSSL software is potentially compromised.

How to respond

We should all generate new certificates, ensuring that the root certificate from which we generate has also been re-generated and is clean. We should also choose new passwords for all affected sites. I use LastPass to manage my passwords, which makes it much easier to use long, complicated and most importantly unique passwords. If you're not already using a password manager, now would be a good time to start.

And this goes especially for those who tend to reuse their password on different sites. If one of those sites is cracked, then the hacker can use that same username/password combination on other popular sites and get into your stuff everywhere instead of just on the compromised site.

Forking OpenSSL

Though there are those who are blaming open-source software, we should instead blame ourselves for using software of unknown quality to run our most trusted connections. That the software was designed and built without the required quality controls is a different issue. People are going to write bad software. If you use their free software and it ends up not being as secure as advertised, you have to take at least some of the blame on yourself.

Instead, the security experts and professionals who've written so many articles and done so many reviews over the years touting the benefits of Open SSL should take more of the blame. They are the ones who misused their reputations by touting poorly written software to which they had source-code access, but were too lazy to perform a serious evaluation.

An advantage of open-source software is that we can at least pinpoint exactly when a bug appeared. Another is that the entire codebase is available to all, so others can jump in and try to fix it. Sure, it would have been nice if the expert security programmers of the world had jumped in earlier, but better late than never.

The site OpenSSL Rampage follows the efforts of the OpenBSD team to refactor and modernize the OpenSSL codebase. They are documenting their progress live on Tumblr, which collects commit messages, tweets, blog posts and official security warnings that result from their investigations and fixes.

They are working on a fork and are making radical changes, so it's unlikely that the changes will be taken up in the official OpenSSL fork but perhaps a new TLS/SSL tool will be available soon.2

VMS and custom memory managers

The messages tell tales of support for extinct operating systems like VMS, whose continued support makes for much more complicated code to support current OSs. This complexity, in turn, hides further misuses of malloc as well as misuses of custom buffer-allocation schemes that the OpenSSL team came up with because "malloc is too slow". Sometimes memory is freed twice for good measure.

The article Today's bugs have BRANDS? Be still my bleeding heart [logo] by Verity Stob has a (partially) humorous take on the most recent software errors that have reared their ugly heads. As also mentioned in that article, the Heartbleed Explained by Randall Munroe cartoon shows the Heartbleed issue well, even for non-technical people.

Lots o' cruft

This is all sounds horrible and one wonders how the software runs at all. Don't worry: the code base contains a tremendous amount of cruft that is never used. It is compiled and still included, but it acts as a cozy nest of code that is wrapped around the actual code.

There are vast swaths of script files that haven't been used for years, that can build versions of the software under compilers and with options that haven't been seen on this planet since before .. well, since before Tumblr or Facebook. For example, there's no need to retain a forest of macros at the top of many header files for the Metrowerks compiler for PowerPC on OS9. No reason at all.

There are also incompatibly licensed components in regular use as well as those associated with components that don't seem to be used anymore.

Modes and options and platforms: oh my!

There are compiler options for increasing resiliency that seem to work. Turning these off, however, yields an application that crashes immediately. There are clearly no tests for any of these modes. OpenSSL sounds like a classically grown system that has little in the way of code conventions, patterns or architecture. There seems to be no one who regularly cleans out and decides which code to keep and which to make obsolete. And, even when code is deemed obsolete, it remains in the code base over a decade later.

Security professionals wrote this?

This is to say nothing of how their encryption algorithm actually works. There are tales on that web site of the OpenSSL developers desperately having tried to keep entropy high by mixing in the current time every once in a while. Or even mixing in bits of the private key for good measure.

A lack of discipline (or skill)

The current OpenSSL codebase seems to be a minefield for security reviewers or for reviewers of any kind. A codebase like this is also terrible for new developers, the onboarding of which you want to encourage in such a widely used, distributed, open-source project.

Instead, the current state of the code says: don't touch, you don't know what to change or remove because clearly the main developers don't know either. The last person who knew may have died or left the project years ago.

It's clear that the code has not been reviewed in the way that it should be. Code on this level and for this purpose needs good developers/reviewers who constantly consider most of the following points during each review:

  • Correctness (does the code do what it should? Does it do it in an acceptable way?)
  • Patterns (does this code invent its own way of doing things?)
  • Architecture (is this feature in the right module?)
  • Security implications
  • Performance
  • Memory leaks/management (as long as they're still using C, which they honestly shouldn't be)
  • Supported modes/options/platforms
  • Third-party library usage/licensing
  • Automated tests (are there tests for the new feature or fix? Do existing tests still run?)
  • Comments/documentation (is the new code clear in what it does? Any tips for those who come after?)
  • Syntax (using braces can be important)

Living with OpenSSL (for now)

It sounds like it is high time that someone does what the BSD team is doing. A spring cleaning can be very healthy for software, especially once it's reached a certain age. That goes double for software that was blindly used by 60% of the encrypted web sites in the world.

It's wonderful that OpenSSL exists. Without it, we wouldn't be as encrypted as we are. But the apparent state of this code bespeaks of failure to manage on all levels. The developers of software this important must be of higher quality. They must be the best of the best, not just anyone who read about encryption on Wikipedia and "wants to help". Wanting to help is nice, but you have to know what you're doing.

OpenSSL will be with us for a while. It may be crap code and it may lack automated tests, but it has been manually (and possibly regression-) tested and used a lot, so it has earned a certain badge of reliability and predictability. The state of the code means only that future changes are riskier, not necessarily that the current software is not usable.

Knowing that the code is badly written should make everyone suspicious of patches -- which we now know are likely to break something in that vast pile of C code -- but not suspicious of the officially supported versions from Debian and Ubuntu (for example). Even if the developer team of OpenSSL doesn't test a lot (or not automatically for all options, at any rate -- they may just be testing the "happy path"), the major Linux distros do. So there's that comfort, at least.

  1. As Ripley so famously put it in the movie Aliens: "I say we take off and nuke the entire site from orbit. It's the only way to be sure."

  2. It will, however, be quite a while before the new fork is as battle-tested as OpenSSL.

The Internet of Things

This article originally appeared on earthli News and has been cross-posted here.

The article Smart TVs, smart fridges, smart washing machines? Disaster waiting to happen by Peter Bright discusses the potential downsides to having a smart home1: namely our inability to create smart software for our mediocre hardware. And once that software is written and spread throughout dozens of devices in your home, it will function poorly and quickly be taken over by hackers because "[h]ardware companies are generally bad at writing softwareand bad at updating it."

And, should hackers fail to crack your stove's firmware immediately, for the year or two where the software works as designed, it will, in all likelihood, "[...] be funneling sweet, sweet, consumer analytics back to the mothership as fast as it can", as one commentator on that article put it.

Manufacturers aren't in business to make you happy

Making you happy isn't even incidental to their business model now that monopolies have ensured that there is nowhere you can turn to get better service. Citing from the article above:

These devices will inevitably be abandoned by their manufacturers, and the result will be lots of "smart" functionalityfridges that know what we buy and when, TVs that know what shows we watchall connected to the Internet 24/7, all completely insecure.

Manufacturers almost exclusively design hardware with extremely short lifetimes, hewing to planned obsolescence. While this a great capitalist strategy, it is morally repugnant to waste so many resources and so much energy to create gadgets that will break in order to force consumers to buy new gadgets. Let's put that awful aspect of our civilization to the side for a moment and focus on other consequences.

These same manufacturers are going to take this bulletproof strategy to appliances that have historically had much longer lifetimes. They will also presumably take their extremely lackluster reputation for updating firmware and software into this market. The software will be terrible to begin with, it will be full of security holes and it will receive patches for only about 10% of its expected lifetime. What could possibly go wrong?

Either the consumer will throw away a perfectly good appliance in order to upgrade the software or the appliance will be an upstanding citizen of one, if not several, botnets. Or perhaps other, more malicious services will be funneling information about you and your household to others, all unbeknownst to you.

People are the problem2

These are not scare tactics; this is an inevitability. People have proven themselves to be wildly incapable of comprehending the devices that they already have. They have no idea how they work and have only vague ideas of what they're giving up. It might as well be magic to them. To paraphrase the classic Arthur C. Clarke citation: "Any sufficiently advanced technology is indistinguishable from magic" especially for a sufficiently technically oblivious audience.

Start up a new smart phone and try to create your account on it. Try to do so without accidentally giving away the keys to your data-kingdom. It is extremely difficult to do, even if you are technically savvy and vigilant.

Most people just accept any conditions, store everything everywhere, use the same terribly insecure password for everything and don't bother locking down privacy options, even if available. Their data is spread around the world in dozens of places and they've implicitly given away perpetual licenses to anything they've ever written or shot or created to all of the big providers.

They are sheep ready to be sheared by not only the companies they thought they could trust, but also by national spy agencies and technically adept hackers who've created an entire underground economy fueled by what can only be called deliberate ignorance, shocking gullibility and a surfeit of free time and disposable income.

The Internet of Things

The Internet of Things is a catch-phrase that describes a utopia where everything is connected to everything else via the Internet and a whole universe of new possibilities explode out of this singularity that will benefit not only mankind but the underlying effervescent glory that forms the strata of existence.

The article Ars readers react to Smart fridges and the sketchy geography of normals follows up the previous article and includes the following comment:

What I do want, is the ability to check what's in my fridge from my phone while I'm out in the grocery store to see if there's something I need.

That sounds so intriguing, doesn't it? How great would that be? The one time a year that you actually can't remember what you put in your refrigerator. On the other hand, how the hell can your fridge tell what you have? What are the odds that this technology will even come close to functioning as advertised? Would it not be more reasonable for your grocery purchases to go to a database and for you to tell that database when you've actually used or thrown out ingredients? Even if your fridge was smart, you'd have to wire up your dry-goods pantry in a similar way and commit to only storing food in areas that are under surveillance.

The commentator went on to write,

I do agree that security is a huge, huge issue, and one that needs to be addressed. But I really don't see how resisting the "Internet of things" is the longterm solution. The way technology seems to be trending, this is an inevitability, not a could be.

Resisting the "Internet of things" is not being proposed as the long-term solution. It is being proposed as a short- to medium-term solution because the purveyors of this shining vision of nirvana have proven themselves time and again to be utterly incapable of actually delivering the panaceas that they promise in a stream of consumption-inducing fraud. Instead, they consistently end up lining their own pockets while we all fritter away even more precious waking time ministering to the retarded digital children that they've birthed from their poisoned loins and foisted upon us.

Stay out of it, for now

Hand-waving away the almost-certain security catastrophe as if it can be easily solved is extremely disingenuous. This is not a world that anyone really wants to take part in until the security problems are solved. You do not want to be an early adopter here. And you most especially do not want to do so by buying the cheapest, most-discounted model available as people are also wont to do. Stay out of the fight until the later rounds: remove the SIM card, shut off Internet connectivity where it's not needed and shut down Bluetooth.

The best-case scenario is that early adopters will have their time wasted. Early rounds of software promise to be a tremendous time-suck for all involved. Managing a further herd of purportedly more efficient and optimized devices is a sucker's game. The more you buy, the less likely you are to be in charge of what you do with your free time.

As it stands, we already fight with our phones, begging them to connect to inadequate data networks and balky WLANs. We spend inordinate amounts of time trying to trick their garbage software into actually performing any of its core services. Failing that -- which is an inevitability -- we simply live with the mediocrity, wasting our time every day babysitting gadgets and devices and software that are supposed to be working for us.

Instead, it is we who end up performing the same monotonous and repetitive tasks dozens of times every day because the manufacturers have -- usually in a purely self-interested and quarterly revenue-report driven rush to market -- utterly failed to test the basic functions of their devices. Subsequent software updates do little to improve this situation, generally avoiding fixes for glaring issues in favor of adding social-network integration or some other marketing-driven hogwash.

Avoiding this almost-certain clusterf*#k does not make you a Luddite. It makes you a realist, an astute observer of reality. There has never been a time in history when so much content and games and media has been at the fingertips of anyone with a certain standard of living. At the same time, though, we seem to be so bedazzled by this wonder that we ignore the glaring and wholly incongruous dreadfulness of the tools that we are offered to navigate, watch and curate it.

If you just use what you're given without complaint, then things will never get better. Stay on the sidelines and demand better -- and be prepared to wait for it.

  1. Or a smart car or anything smart that works perfectly well without being smart.

  2. To be clear: the author is not necessarily excluding himself here. It's not easy to turn on, tune in and drop out, especially when your career is firmly in the tech world. It's also not easy to be absolutely aware of what you're giving up in as you make use of the myriad of interlinked services offered to you every day.

Mixing your own SQL into Quino queries: part 2 of 2

In the first installment, we covered the basics of mixing custom SQL with ORM-generated queries. We also took a look at a solution that uses direct ADO database access to perform arbitrarily complex queries.

In this installment, we will see more elegant techniques that make use of the CustomCommandText property of Quino queries. We'll approach the desired solution in steps, proceeding from attempt #1 -- attempt #5.

tl;dr: Skip to attempt #5 to see the final result without learning why it's correct.

Attempt #1: Replacing the entire query with custom SQL

An application can assign the CustomCommandText property of any Quino query to override some of the generated SQL. In the example below, we override all of the text, so that Quino doesn't generate any SQL at all. Instead, Quino is only responsible for sending the request to the database and materializing the objects based on the results.

public void TestExecuteCustomCommand()
  var people = Session.GetList<Person>();

  people.Query.CustomCommandText = new CustomCommandText
    Text = @"
FROM punchclock__person WHERE lastname = 'Rogers'"

  Assert.That(people.Count, Is.EqualTo(9));

This example solves two of the three problems outlined above:

  • It uses only a single query.
  • It will work with a remote application server (although it makes assumptions about the kind of SQL expected by the backing database on that server).
  • But it is even more fragile than the previous example as far as hard-coded SQL goes. You'll note that the fields expected by the object-materializer have to be explicitly included in the correct order.

Let's see if we can address the third issue by getting Quino to format the SELECT clause for us.

Attempt #2: Generating the SELECT clause

The following example uses the AccessToolkit of the IQueryableDatabase to format the list of properties obtained from the metadata for a Person. The application no longer makes assumptions about which properties are included in the select statement, what order they should be in or how to format them for the SQL expected by the database.

public virtual void TestExecuteCustomCommandWithStandardSelect()
  var people = Session.GetList<Person>();

  var accessToolkit = DefaultDatabase.AccessToolkit;
  var properties = Person.Metadata.DefaultLoadGroup.Properties;
  var fields = properties.Select(accessToolkit.GetField);

  people.Query.CustomCommandText = new CustomCommandText
    Text = string.Format(
      @"SELECT ALL {0} FROM punchclock__person WHERE lastname = 'Rogers'",

  Assert.That(people.Count, Is.EqualTo(9));

This example fixes the problem with the previous one but introduces a new problem: it no longer works with a remote application because it assumes that the client-side driver is a database with an AccessToolkit. The next example addresses this problem.

Attempt #3: Using a hard-coded AccessToolkit

The version below uses a hard-coded AccessToolkit so that it doesn't rely on the external data driver being a direct ADO database. It still makes an assumption about the database on the server but that is usually quite acceptable because the backing database for most applications rarely changes.1

public void TestCustomCommandWithPostgreSqlSelect()
  var people = Session.GetList<Person>();

  var accessToolkit = new PostgreSqlMetaDatabase().AccessToolkit;
  var properties = Person.Metadata.DefaultLoadGroup.Properties;
  var fields = properties.Select(accessToolkit.GetField);

  people.Query.CustomCommandText = new CustomCommandText
    Text = string.Format(
      @"SELECT ALL {0} FROM punchclock__person WHERE lastname = 'Rogers'",

  Assert.That(people.Count, Is.EqualTo(9));

We now have a version that satisfies all three conditions to a large degree. The application uses only a single query and the query works with both local databases and remoting servers. It still makes some assumptions about database-schema names (e.g. "punchclock__person" and "lastname"). Let's see if we can clean up some of these as well.

Attempt #4: Replacing only the where clause

Instead of replacing the entire query text, an application can replace individual sections of the query, letting Quino fill in the rest of the query with its standard generated SQL. An application can append or prepend text to the generated SQL or replace it entirely. Because the condition for our query is so simple, the example below replaces the entire WHERE clause instead of adding to it.

public void TestCustomWhereExecution()
  var people = Session.GetList<Person>();

  people.Query.CustomCommandText = new CustomCommandText();
    "lastname = 'Rogers'"

  Assert.That(people.Count, Is.EqualTo(9));

That's much nicer -- still not perfect, but nice. The only remaining quibble is that the identifier lastname is still hard-coded. If the model changes in a way where that property is renamed or removed, this code will continue to compile but will fail at run-time. This is a not insignificant problem if your application ends up using these kinds of queries throughout its business logic.

Attempt #5: Replacing the where clause with generated field names

In order to fix this query and have a completely generic query that fails to compile should anything at all change in the model, we can mix in the technique that we used in attempts #2 and #3: using the AccessToolkit to format fields for SQL. To make the query 100% statically checked, we'll also use the generated metadata -- LastName -- to indicate which property we want to format as SQL.

public void TestCustomWhereExecution()
  var people = Session.GetList<Person>();

  var accessToolkit = new PostgreSqlMetaDatabase().AccessToolkit;
  var lastNameField = accessToolkit.GetField(Person.MetaProperties.LastName);

  people.Query.CustomCommandText = new CustomCommandText();
    string.Format("{0} = 'Rogers'", lastNameField)

  Assert.That(people.Count, Is.EqualTo(9));

The query above satisfies all of the conditions we outlined above. it's clear that the condition is quite simple and that real-world business logic will likely be much more complex. For those situations, the best approach is to fall back to using the direct ADO approach mixed with using Quino facilities like the AccessToolkit as much as possible to create a fully customized SQL text.

Many thanks to Urs for proofreading and suggestions on overall structure.

  1. If an application needs to be totally database-agnostic, then it will need to do some extra legwork that we won't cover in this post.