1 2 3 4 5 6 7 8 9 10 11
Schema migration in Quino 1.13

Quino is a metadata framework for .NET. It provides a means of defining an application-domain model in the form of metadata objects. Quino also provides many components and support libraries that work with that metadata to automate many services and functions. A few examples are an ORM, schema migration, automatically generated user interfaces and reporting tools.

The schema-migration tool

The component we're going to discuss is the automated schema-migration for databases. A question that recently came up with a customer was: what do all of the options mean in the console-based schema migrator?

Here's the menu you'll see in the console migrator:

Advanced Options
(1) Show migration plan
(2) Show significant mappings
(3) Show significant mappings with unique ids
(4) Show all mappings
(5) Show all mappings with unique ids

Main Options
(R) Refresh status
(M) Migrate database
(C) Cancel

The brief summary is:

  • The only action that actually makes changes is (M)
  • Option (1) is the only advanced option you will every likely use; use this to show the changes that were detected

The other advanced options are more for debugging the migration recommendation if something looks wrong. In order to understand what that means, we need to know what the migrator actually does.

image

  1. Provide the application model as input
  2. Import a model from the database as input
  3. Generate a mapping between the two models
  4. Create a migration plan to update the database to reflect the application model
  5. Generate a list of commands that can be applied to the database to enact the plan
  6. Execute the commands against the database

The initial database-import and final command-generation parts of migration are very database-specific. The determination of differences is also partially database-specific (e.g. some databases do not allow certain features so there is no point in detecting a difference that cannot ever be repaired). The rest of the migration logic is database-independent.

Gathering data for migration

The migrator works with two models: the target model and a source model

  • The target model is provided as part of the application and is usually loaded from a core assembly.
  • The source model is imported from the database schema by the "import handler"

Given these two models, the "mapping builder" creates a mapping. In the current implementation of Quino, there is no support for allowing the user to adjust mapping before a migration plan is built from it. However, it would be possible to allow the user to verify and possibly adjust the mapping. Experience has shown that this is not necessary. Anytime we thought we needed to adjust the mapping, the problem was instead that the target model had been configured incorrectly. That is, each time we had an unexpected mapping, it led us directly to a misconfiguration in the model.

The options to show mappings are used to debug exactly such situations. Before we talk about mapping, though, we should talk about what we mean by "unique ids". Every schema-relevant bit of metadata in a Quino model is associated with a unique id, in the form of a Guid and called a "MetaId" in Quino.

Importing a model from a database

What happens during when the import handler generates a model?

The importer runs in two phases:

  1. Extract the "raw model" from the database schema
  2. Enhance the "raw model" with data pulled from the application-specific Quino metadata table in the same database

A Quino application named "demo" will have the following schema:

  • All modeled tables are named "demo__*"
  • The metadata table is named "demometadata__elementdescription"

The migrator reads the following information into a "raw model"

  • Tables => MetaClasses
  • Fields/Columns => MetaProperties
  • Indexes => MetaIndexes
  • Foreign Keys => MetaPaths

If there is no further information in the database, then the mapper will have to use the raw model only. If, however, the database was created or is being maintained by Quino, then there is additional information stored in the metadata table mentioned above. The importer enhanced the raw model with this information, in order to improve mapping and difference-recognition. The metadata table contains all of the Quino modeling information that is not reflected in a standard database schema (e.g. the aforementioned MetaId).

The data available in this table is currently:

  • SchemaIdentifier: the identifier used in the raw model/database schema
  • Identifier: the actual identifier of the metadata element that corresponds to the element identified by the SchemaIdentifier
  • MetaId: the unique id for the metadata element
  • ObjectType: the type of metadata (one of: class, property, index, path, model)
  • ParentMetaId: the unique id of the metadata element that is the logical parent of this one; only allowed to be empty for elements with ObjectType equal to "model"
  • Data: Custom data associated with the element, as key/value pairs
  • DataVersion: Identifies the format type of the "Data" element (1.0.0.0 corresponds to CSV)

For each schema element in the raw model, the importer does the following:

  1. Looks up the data associated with that SchemaIdentifier and ObjectType (e.g. "punchclock__person" and "class")
  2. Updates the "Identifier"
  3. Sets the "MetaId"
  4. Loads the key/value pairs from the Data field and applies that data to the element

Generating a mapping

At this point, the imported model is ready and we can create a mapping between it and the application model. The imported model is called the source model while the application model is called the target model because we're migrating the "source" to match the "target".

We generate a mapping by iterating the target model:

  1. Find the corresponding schema element in the source model using MetaIds1
  2. If an element can be found, create a mapping for those two elements
  3. If no element can be found, create a mapping with the target element. This will cause the element to be created in the database.
  4. For all elements in the source model that have no corresponding element in the target model, create a mapping with only the source element. This will cause the element to be dropped from the database.

Creating a migration plan

The important decisions have already been made in the mapping phase. At this point, the migrator just generates a migration plan, which is a list of differences that must be addressed in order to update the database to match the target model.

  • If the mapping has a source and target element
    • Create a difference if the element has been renamed
    • Create a difference if the element has been altered (e.g. a property has a different type or is now nullable; an index has new properties or is no longer unique; etc.) If the mapping has only a source, generate a difference that the element is unneeded and should be dropped.
  • If the mapping has only a target, generate a difference that the element is missing and should be created.

This is the plan that is shown to the user by the various migration tools available with Quino.2

The advanced console-migrator commands

At this point, we can now understand what the advanced console-migrator commands mean. Significant mappings are those mappings which correspond to a difference in the database (create, drop, rename or alter).

  • Show significant mappings: show significant mappings to see more detail about the names on each side
  • Show significant mappings with unique ids: same as above, but also include the MetaIds for each side. Use this to debug when you suspect that you might have copy/pasted a MetaId incorrectly or inadvertently moved one.
  • Show all mappings: Same detail level as the first option, but with all mappings, including those that are 100% matches
  • Show all mappings with unique ids: same as above, but with MetaIds

As already stated, the advanced options are really there to help a developer see why the migrator might be suggesting a change that doesn't correspond to expectations.

Generating commands for the plan

At this point, the migrator displays the list of differences that will be addressed by the migrator if the user chooses to proceed.

What happens when the user proceeds? The migrator generates database-specific commands that, when executed against the database, will modify the schema of the database.3

Commands are executed for different phases of the migration process. The phases are occasionally extended but currently comprise the following.

  • Initialize: perform any required initialization before doing anything to the schema
  • DropConstraintsAndIndexes: drop all affected constraints and indexes that would otherwise prevent the desired modification of the elements involved in the migration.
  • AddUpdateOrRenameSchema: Create new tables, columns and indexes and perform any necessary renaming. The changes in this phase are non-destructive
  • UpdateData: Perform any necessary data updates before any schema elements are removed. This is usually the phase in which custom application code is executed, to copy existing data from other tables and fields before they are dropped in the next phase. For example, if there is a new required 1--1 relation, the custom code might analyze the other data in the rows of that table to determine which value that row should have for the new foreign key.
  • DropSchema: Drop any unneeded schema elements and data
  • CreatePrimaryKeys: Create primary keys required by the schema. This includes both new primary keys as well as reestablishing primary keys that were temporarily dropped in the second phase.
  • CreateConstraintsAndIndexes: Create constraints and indexes required by the schema. This includes both new constraints and indexes as well as reestablishing constraints and indexes that were temporarily dropped in the second phase.
  • UpdateMetadata: Update the Quino-specific metadata table for the affected elements.

Executing the migration plan

The commands are then executed and the results logged.

Afterward, the schema is imported again, to verify that there are no differences between the target model and the database. In some (always rarer) cases, there will still be differences, in which case, you can execute the new migration plan to repair those differences as well.

In development, this works remarkably well and often, without further intervention.

Fixing failed migrations

In some cases, there is data in the database that, while compatible with the current database schema, is incompatible with the updated schema. This usually happens when a new property or constraint is introduced. For example, a new required property is added that does not have a default value or a new unique index is added which existing data violates.

In these cases, there are two things that can be done:

  • Either the database data is cleaned up in a way that makes it compatible with the target schema4
  • Or the developer must add custom logic to the metadata elements involved. This usually means that the developer must set a default value on a property. In rarer cases, the developer must attach logic to the affected metadata (e.g. the property or index that is causing the issue) that runs during schema migration to create new data or copy it from elsewhere in order to ensure that constraints are satisfied when they are reestablished at the end of the migration.

In general, it's strongly advised to perform a migration against a replica of the true target database (e.g. a production database) in order to guarantee that all potential data situations have been anticipated with custom code, if necessary.

Quino Migration versus EF Migrations

It's important to point out that Quino's schema migration is considerably different from that employed by EF (which it picked up from the Active Migrations in Ruby, often used with Ruby on Rails). In those systems, the developer generates specific migrations to move from one model version to another. There is a clear notion of upgrading versus downgrading. Quino only recognizes migrating from an arbitrary model to another arbitrary model. This makes Quino's migration exceedingly friendly when moving between development branches, unlike EF, whose deficiencies in this area have been documented.



  1. The default is to use only MetaIds. There is a mode in which identifiers are used as a fallback but it is used only for tools that import schemas that were not generated by Quino. Again, if the Quino metadata table hasn't been damaged, this strict form of mapping will work extremely well.

  2. The Winform and Web user interfaces for Quino both include built-in feedback for interacting with the schema migration. There are also two standalone tools to migrate database schemas: a Winform application and a Windows console application.

  3. The form of these commands is currently a mix of SQL and custom C# code. A future feature of the migration will be to have all commands available as SQL text so that the commands, instead of being executed directly, could be saved as a file and reviewed and executed by DBAs instead of letting the tool do it. We're not quite there yet, but proceeding nicely.

  4. This is generally what a developer does with his or her local database. The data contained therein can usually be more or less re-generated. If there is a conflict during migration, a developer can determine whether custom code is necessary or can sometimes determine that the data situation that causes the problem isn't something that comes up in production anyway and just remove the offending elements or data until the schema migration succeeds.

Optimizing data access for high-latency networks: part IV

imageIn the previous three articles, we sped up the opening of the calendar in Encodo's time-tracking product Punchclock. We showed how we reduced the number of queries from one very slow query per person to a single very fast query for all people at once.

Because we're talking about latency in these articles, we'd also like to clear away a few other queries that aren't related to time entries but are still slowing things down.

Lazy-loading unneeded values

In particular, the queries that "Load values" for person objects look quite suspicious. These queries don't take a lot of time to execute but they will definitely degrade performance in high-latency networks.1

image

As we did before, we can click on one of these queries to show the query that's being loaded. In the screenshot below, we see that the person's picture is being loaded for each person in the drop-down list.

image

We're not showing pictures in the drop-down list, though, so this is an extravagant waste of time. On a LAN, we hardly notice how wasteful we are with queries; on a WAN, the product will feel...sluggish.

What is a load-group?

In order to understand the cause of these queries, you must first know that Quino allows a developer to put metadata properties into different load-groups. A load-group has the following behavior: If the value for a property in a load-group is requested on an object, the values for all of the properties in the load-group are retrieved with a single query and set on that object.

The default load-group of an object's metadata determine the values that are initially retrieved and applied to objects materialized by the ORM.

The metadata for a person puts the "picture" property of a person into a separate load-group so that the value is not loaded by default when objects of type peron are loaded from the data driver. With this setting, business logic avoids downloading a lot of unwanted picture data by default.

Business logic that needs the pictures can either explicitly include the picture in the query or let the value be lazy-loaded by the ORM when it is accessed. The proper solution depends on the situation.

Lazy-loaded property values

As before, we can check the stack trace of the query to figure out which application component is triggering the call. In this case, the culprit is the binding list that we are using to attach the list of people to the drop-down control.

The binding list binds the values for all of the properties in a metaclass (e.g. "person"), triggering a lazy load when it accesses the "picture" property. To avoid the lazy-load, we can create a wrapper of the default metadata for a person and remove/hide the property so that the binding list will no longer access it.

This is quite easy2, as shown in the code below.

var personMetaClass = new WrapMetaClass(Person.Metadata);
personMetaClass.Properties.Remove(Person.MetaProperties.Picture);
var query = new Query(personMetaClass);

With this simple fix, the binding list no longer knows about the picture property, doesn't retrieve values for that property and therefore no longer triggers any queries to lazily load the pictures from the database for each person object.

The screenshot of the statistics window below shows us that we were successful. We have two main queries: one for the list of people to show in the drop-down control and one for the time entries to show in the calendar.

image

Final version

For completeness, here's the code that Punchclock is using in the current version of Quino (1.11).

var personMetaClass = new WrapMetaClass(Person.Metadata);
personMetaClass.Properties.Remove(Person.MetaProperties.Picture);

var accessToolkit = new PostgreSqlMetaDatabase().AccessToolkit;

var query = new Query(personMetaClass);
query.CustomCommandText = new CustomCommandText();
query.CustomCommandText.SetSection(
  CommandTextSections.Where, 
  CommandTextAction.Replace,
  string.Format(
    "EXISTS (SELECT id FROM {0} WHERE {1} = {2})", 
    accessToolkit.GetName(TimeEntry.Metadata), 
    accessToolkit.GetField(TimeEntry.MetaProperties.PersonId), 
    accessToolkit.GetField(Person.MetaProperties.Id)
  )>
);
var people = Session.GetList<Person>(query);

Future, improved version

Once we fix the bug in the WhereExists join type mentioned in the previous article and add the fluent methods for constructing wrappers mentioned in the footnote below, the code will be as follows:

var personMetaClass = 
  Person.Metadata.
  Wrap().
  RemoveProperty(Person.MetaProperties.Picture);

var people = 
  Session.GetList<Person>(
    new Query(personMetaClass).
    Join(Person.MetaRelations.TimeEntries, JoinType.WhereExists).
    Query
  );

This concludes our investigation into performance issues with Quino and Punchclock.


image

var personMetaClass = 
  Person.Metadata.
  Wrap().
  RemoveProperty(Person.MetaProperties.Picture);

var query = new Query(personMetaClass);

But you'll have to wait for Quino 1.12 for that.


  1. You may have noticed that these calls to "load values" are technically lazy-loaded but don't seem to be marked as such in the screenshots. This was a bug in the statistics viewer that I discovered and addressed while writing this article.

  2. This is a rather old API and hasn't been touched with the "fluent" wand that we've applied to other parts of the Quino API. A nicer way of writing it would be to create extension methods called Wrap() and RemoveProperty that return the wrapper class, like so:

v1.12.0: .Improvements to data-provider statistics and Windows 8.1 fixes

The summary below describes major new features, items of note and breaking changes. The full list of issues is also available for those with access to the Encodo issue tracker.

Highlights

imageimage

  • Window 8.1: fixed culture-handling for en-US and de-CH that is broken in Windows 8.1 (QNO-4534, QNO-4553)
  • Data-provider statistics: improved the WinForm-based statistics form (QNO-4231, QNO-4545, QNO-4546)
  • Data driver: bug fixes and improvements (QNO-4538, QNO-4554, QNO-4551)
  • Image-handling: the Encodo and Quino libraries now use the Windows Imaging Components instead of System.Drawing (QNO-4536)
  • Standard forms: updated the standard WinForm about window and splash screen to use Encodo web-site CI (QNO-4529)

Breaking changes

  • No known breaking changes.
Optimizing data access for high-latency networks: part III

imageIn the previous articles, we partially addressed a performance problem in the calendar of Encodo's time-tracking product, Punchclock. While we managed to drastically reduce the amount of time taken by each query (>95% time saved), we were still executing more queries than strictly necessary.

The query that we're trying to optimized further is shown below.

var people =
  Session.GetList<Person>().
  Where(p => Session.GetCount(p.TimeEntries.Query) > 0).
  ToList();

This query executes one query to get all the people and then one query per person to get the number of time entries per person. Each of these queries by itself is very fast. High latency will cause them to be slow. In order to optimize further, there's really nothing for it but to reduce the number of queries being executed.

Let's think back to what we're actually trying to accomplish: We want to get all people who have at least one time entry. Can't we get the database to do that for us? Some join or existence check or something? How about the code below?

var people = 
  Session.GetList<Person>(
    Session.CreateQuery<Person>().
    Join(Person.MetaRelations.TimeEntries, JoinType.WhereExists).
    Query
  );

What's happening in the code above? We're still getting a list of people but, instead of manipulating the related TimeEntries for each person locally, we're joining the TimeEntries relation with the Quino query Join() method and changing the join type from the default All to the restrictive WhereExists. This sounds like exactly what we want to happen! There is no local evaluation or manipulation with Linq and, with luck, Quino will be able to map this to a single query on the database.

This is the best possible query: it's purely declarative and will be executed as efficiently as the back-end knows how.

There's just one problem: the WhereExists join type is broken in Quino 1.11.

Never fear, though! We can still get it to work, but we'll have to do a bit of work until the bug is fixed in Quino 1.12. The code below builds on lessons learned in the earlier article, Mixing your own SQL into Quino queries: part 2 of 2 to use custom query text to create the restriction instead of letting Quino do it.

var accessToolkit = new PostgreSqlMetaDatabase().AccessToolkit;

var query = Session.CreateQuery<Person>();
query.CustomCommandText = new CustomCommandText();
query.CustomCommandText.SetSection(
  CommandTextSections.Where, 
  CommandTextAction.Replace,
  string.Format(
    "EXISTS (SELECT id FROM {0} WHERE {1} = {2})", 
    accessToolkit.GetName(TimeEntry.Metadata), 
    accessToolkit.GetField(TimeEntry.MetaProperties.PersonId), 
    accessToolkit.GetField(Person.MetaProperties.Id)
  )
);
var people = Session.GetList<Person>(query);

A look at the statistics is very encouraging:

image

We're down to one 29ms query for the people and an even quicker query for all the relevant time entries.1 We can see our query text appears embedded in the SQL generated by Quino, just as we expected.

There are a few other security-related queries that execute very quickly and hardly need optimization.

We've come much farther in this article and we're almost done. In the next article, we'll quickly clean up a few other queries that are showing up in the statistics and that have been nagging us since the beginning.



  1. The time-entry query is not representative because my testing data set didn't include time entries for the current day and I was too lazy to page around to older data.

Optimizing data access for high-latency networks: part II

imageIn the previous article, we discussed a performance problem in the calendar of Encodo's time-tracking product, Punchclock.

Instead of guessing at the problem, we profiled the application using the database-statistics window available to all Quino applications.1 We quickly discovered that most of the slowdown stems from the relatively innocuous line of code shown below.

var people = 
  Session.GetList<Person>().
  Where(p => p.TimeEntries.Any()).
  ToList();

First things first: what does the code do?

Before doing anything else, we should establish what the code does. Logically, it retrieves a list of people in the database who have recorded at least one time entry.

The first question we should ask at this point is: does the application even need to do this? The answer in this case is 'yes'. The calendar includes a drop-down control that lets the user switch between the calendars for different users. This query returns the people to show in this drop-down control.

With the intent and usefulness of the code established, let's dissect how it is accomplishing the task.

  1. The Session.GetList<Person>() portion retrieves a list of all people from the database
  2. The Where() method is applied locally for each object in the list2
  3. For a given person, the list of TimeEntries is accessed
  4. This access triggers a lazy load of the list
  5. The Any() method is applied to the full list of time entries
  6. The ToList() method creates a list of all people who match the condition

Though the line of code looks innocuous enough, it causes a huge number of objects to be retrieved, materialized and retained in memory -- simply in order to check whether there is at least one object.

This is a real-world example of a performance problem that can happen to any developer. Instead of blaming the developer who wrote this line of code, its more important to stay vigilant to performance problems and to have tools available to quickly and easily find them.

Stop creating all of the objects

The first solution I came up with3 was to stop creating objects that I didn't need. A good way of doing this and one that was covered in Quino: partially-mapped queries is to use cursors instead of lists. Instead of using the generated list TimeEntries, the following code retrieves a cursor on that list's query and materializes at most one object for the sub-query.

var people = Session.GetList<Person>().Select(p =>
{
  using (var cursor = Session.CreateCursor<TimeEntry>(p.TimeEntries.Query))[^4]
  {
    return cursor.Any();
  }
}).ToList();

A check of the database statistics shows improvement, as shown below.

image

Just by using cursors, we've managed to reduce the execution time for each query by about 75%.4 Since all we're interested in finding out is whether there is at least one time entry for a person, we could also ask the database to count objects rather than to return them. That should be even faster. The following code is very similar to the example above but, instead of getting a cursor based on the TimeEntries query, it gets the count.

var people =
  Session.GetList<Person>().
  Where(p => Session.GetCount(p.TimeEntries.Query) > 0).
  ToList();

How did we do? A check of the database statistics shows even more improvement, as shown below.

image

We're now down to a few dozen milliseconds for all of our queries, so we're done, right? A 95% reduction in query-execution time should be enough.

Unfortunately, we're still executing just as many queries as before, even though we're taking far less time to execute them. This is better, but still not optimal. In high-latency situations, the user is still likely to experience a significant delay when opening the calendar since each query's execution time is increased by the latency of the connection. In a local network, the latency is negligible; on a WAN, we still have a problem.

In the next article, well see if we can't reduce the number of queries being executed.


Anything formulated with the query API is guaranteed to be executed by the data provider (even if it must be evaluated locally) and anything formulated with Linq is naturally evaluated locally. In this way, the code is clear in what is sent to the server and what is evaluated locally. Quino only very, very rarely issues an "unmappable query" exception, unlike EF, which occasionally requires contortions until you've figured out which C# formulation of a particular expression can be mapped by EF.


  1. This series of articles shows the statistics window as it appears in Winforms applications. The data-provider statistics are also available in Quino web applications as a Glimpse plug-in.

  2. It is important for users of the Microsoft Entity Framework (EF) to point out that Quino does not have a Linq-to-Sql mapper. That means that any Linq expressions like Where() are evaluated locally instead of being mapped to the database. There are various reasons for this but the main one is that we ended up preferring a strict boundary between the mappable query API and the local evaluation API.

  3. Well, the first answer I'm going to pretend I came up with. I actually thought of another answer first, but then quickly discovered that Quino wasn't mapping that little-used feature correctly. I added an issue to tackle that problem at a later date and started looking for workarounds. That fix will be covered in the next article in this series.

  4. Please ignore the fact that we also dropped 13 person queries. This was not due to any fix that we made but rather that I executed the test slightly differently...and was too lazy to make a new screenshot. The 13 queries are still being executed and we'll tackle those in the last article in this series.

Optimizing data access for high-latency networks: part I

imagePunchclock is Encodo's time-tracking and invoicing tool. It includes a calendar to show time entries (shown to the left). Since the very first versions, it hasn't opened very quickly. It was fast enough for most users, but those who worked with Punchclock over the WAN through our VPN have reported that it often takes many seconds to open the calendar. So we have a very useful tool that is not often used because of how slowly it opens.

That the calendar opens slowly in a local network and even more slowly in a WAN indicates that there is not only a problem with executing many queries but also with retrieving too much data.

Looking at query statistics

This seemed like a solvable problem, so I fired up Punchclock in debug mode to have a look at the query-statistics window.

To set up the view shown below, I did the following:

  1. Start your Quino application (Punchclock in this case) in debug mode (so that the statistics window is available)
  2. Open the statistics window from the debug menu
  3. Reset the statistics to clear out anything logged during startup
  4. Group the grid by "Meta Class"
  5. Open the calendar to see what kind of queries are generated
  6. Expand the "TimeEntry" group in the grid to show details for individual queries

image

I marked a few things on the screenshot. It's somewhat suspicious that there are 13 queries for data of type "Person", but we'll get to that later. Much more suspicious is that there are 52 queries for time entries, which seems like quite a lot considering we're showing a calendar for a single user. We would instead expect to have a single query. More queries would be OK if there were good reasons for them, but I feel comfortable in deciding that 52 queries is definitely too many.

A closer look at the details for the time-entry queries shows very high durations for some of them, ranging from a tenth of a second to nearly a second. These queries are definitely the reason the calendar window takes so long to load.

Why are these queries taking so long?

If I select one of the time-entry queries and show the "Query Text" tab (see screenshot below), I can see that it retrieves all time entries for a single person, one after another. There are almost six years of historical data in our Punchclock database and some of our employees have been around for all of them.1 That's a lot of time entries to load.

image

I can also select the "Stack Trace" tab to see where the call originated in my source code. This feature lets me pinpoint the program component that is causing these slow queries to be executed.

image

As with any UI-code stack, you have to be somewhat familiar with how events are handled and dispatched. In this stack, we can see how a MouseUp command bubbled up to create a new form, then a new control and finally, to trigger a call to the data provider during that control's initialization. We don't have line numbers but we see that the call originates in a lambda defined in the DynamicSchedulerControl constructor.

The line of code that I pinpoint as the culprit is shown below.

var people = Session.GetList<Person>().Where(p => p.TimeEntries.Any()).ToList();

This looks like a nicely declarative way of getting data, but to the trained eye of a Quino developer, it's clear what the problem is.

In the next couple of articles, we'll take a closer look at what exactly the problem is and how we can improve the speed of this query. We'll also take a look at how we can improve the Quino query API to make it harder for code like the line above to cause performance problems.



  1. Encodo just turned nine years old, but we used a different time-entry system for the first couple of years. If you're interested in our time-entry software history, here it is:

     1. 06.2005 -- Start off with Open Office spreadsheets
     2. 04.2007 -- Switch to a home-grown, very lightweight time tracker based on an older framework we'd written (Punchclock 1.0)
     3. 08.2008 -- Start development of Quino
     4. 04.2010 -- Initial version of Punchclock 2.0; start dogfooding Quino
    

Mixing your own SQL into Quino queries: part 2 of 2

In the first installment, we covered the basics of mixing custom SQL with ORM-generated queries. We also took a look at a solution that uses direct ADO database access to perform arbitrarily complex queries.

In this installment, we will see more elegant techniques that make use of the CustomCommandText property of Quino queries. We'll approach the desired solution in steps, proceeding from attempt #1 -- attempt #5.

tl;dr: Skip to attempt #5 to see the final result without learning why it's correct.

Attempt #1: Replacing the entire query with custom SQL

An application can assign the CustomCommandText property of any Quino query to override some of the generated SQL. In the example below, we override all of the text, so that Quino doesn't generate any SQL at all. Instead, Quino is only responsible for sending the request to the database and materializing the objects based on the results.

[Test]
public void TestExecuteCustomCommand()
{
  var people = Session.GetList<Person>();

  people.Query.CustomCommandText = new CustomCommandText
  {
    Text = @"
SELECT ALL 
""punchclock__person"".""id"", 
""punchclock__person"".""companyid"", 
""punchclock__person"".""contactid"", 
""punchclock__person"".""customerid"", 
""punchclock__person"".""initials"", 
""punchclock__person"".""firstname"", 
""punchclock__person"".""lastname"", 
""punchclock__person"".""genderid"", 
""punchclock__person"".""telephone"", 
""punchclock__person"".""active"", 
""punchclock__person"".""isemployee"", 
""punchclock__person"".""birthdate"", 
""punchclock__person"".""salary"" 
FROM punchclock__person WHERE lastname = 'Rogers'"
  };

  Assert.That(people.Count, Is.EqualTo(9));
}

This example solves two of the three problems outlined above:

  • It uses only a single query.
  • It will work with a remote application server (although it makes assumptions about the kind of SQL expected by the backing database on that server).
  • But it is even more fragile than the previous example as far as hard-coded SQL goes. You'll note that the fields expected by the object-materializer have to be explicitly included in the correct order.

Let's see if we can address the third issue by getting Quino to format the SELECT clause for us.

Attempt #2: Generating the SELECT clause

The following example uses the AccessToolkit of the IQueryableDatabase to format the list of properties obtained from the metadata for a Person. The application no longer makes assumptions about which properties are included in the select statement, what order they should be in or how to format them for the SQL expected by the database.

[Test]
public virtual void TestExecuteCustomCommandWithStandardSelect()
{
  var people = Session.GetList<Person>();

  var accessToolkit = DefaultDatabase.AccessToolkit;
  var properties = Person.Metadata.DefaultLoadGroup.Properties;
  var fields = properties.Select(accessToolkit.GetField);

  people.Query.CustomCommandText = new CustomCommandText
  {
    Text = string.Format(
      @"SELECT ALL {0} FROM punchclock__person WHERE lastname = 'Rogers'",
      fields.FlattenToString()
    )
  };

  Assert.That(people.Count, Is.EqualTo(9));
}

This example fixes the problem with the previous one but introduces a new problem: it no longer works with a remote application because it assumes that the client-side driver is a database with an AccessToolkit. The next example addresses this problem.

Attempt #3: Using a hard-coded AccessToolkit

The version below uses a hard-coded AccessToolkit so that it doesn't rely on the external data driver being a direct ADO database. It still makes an assumption about the database on the server but that is usually quite acceptable because the backing database for most applications rarely changes.1

[Test]
public void TestCustomCommandWithPostgreSqlSelect()
{
  var people = Session.GetList<Person>();

  var accessToolkit = new PostgreSqlMetaDatabase().AccessToolkit;
  var properties = Person.Metadata.DefaultLoadGroup.Properties;
  var fields = properties.Select(accessToolkit.GetField);

  people.Query.CustomCommandText = new CustomCommandText
  {
    Text = string.Format(
      @"SELECT ALL {0} FROM punchclock__person WHERE lastname = 'Rogers'",
      fields.FlattenToString()
    )
  };

  Assert.That(people.Count, Is.EqualTo(9));
}

We now have a version that satisfies all three conditions to a large degree. The application uses only a single query and the query works with both local databases and remoting servers. It still makes some assumptions about database-schema names (e.g. "punchclock__person" and "lastname"). Let's see if we can clean up some of these as well.

Attempt #4: Replacing only the where clause

Instead of replacing the entire query text, an application can replace individual sections of the query, letting Quino fill in the rest of the query with its standard generated SQL. An application can append or prepend text to the generated SQL or replace it entirely. Because the condition for our query is so simple, the example below replaces the entire WHERE clause instead of adding to it.

[Test]
public void TestCustomWhereExecution()
{
  var people = Session.GetList<Person>();

  people.Query.CustomCommandText = new CustomCommandText();
  people.Query.CustomCommandText.SetSection(
    CommandTextSections.Where, 
    CommandTextAction.Replace, 
    "lastname = 'Rogers'"
  );

  Assert.That(people.Count, Is.EqualTo(9));
}

That's much nicer -- still not perfect, but nice. The only remaining quibble is that the identifier lastname is still hard-coded. If the model changes in a way where that property is renamed or removed, this code will continue to compile but will fail at run-time. This is a not insignificant problem if your application ends up using these kinds of queries throughout its business logic.

Attempt #5: Replacing the where clause with generated field names

In order to fix this query and have a completely generic query that fails to compile should anything at all change in the model, we can mix in the technique that we used in attempts #2 and #3: using the AccessToolkit to format fields for SQL. To make the query 100% statically checked, we'll also use the generated metadata -- LastName -- to indicate which property we want to format as SQL.

[Test]
public void TestCustomWhereExecution()
{
  var people = Session.GetList<Person>();

  var accessToolkit = new PostgreSqlMetaDatabase().AccessToolkit;
  var lastNameField = accessToolkit.GetField(Person.MetaProperties.LastName);

  people.Query.CustomCommandText = new CustomCommandText();
  people.Query.CustomCommandText.SetSection(
    CommandTextSections.Where, 
    CommandTextAction.Replace, 
    string.Format("{0} = 'Rogers'", lastNameField)
  );

  Assert.That(people.Count, Is.EqualTo(9));
}

The query above satisfies all of the conditions we outlined above. it's clear that the condition is quite simple and that real-world business logic will likely be much more complex. For those situations, the best approach is to fall back to using the direct ADO approach mixed with using Quino facilities like the AccessToolkit as much as possible to create a fully customized SQL text.

Many thanks to Urs for proofreading and suggestions on overall structure.



  1. If an application needs to be totally database-agnostic, then it will need to do some extra legwork that we won't cover in this post.

Mixing your own SQL into Quino queries: part 1 of 2

The Quino ORM1 manages all CrUD -- Create, Update, Delete -- operations for your application. This basic behavior is generally more than enough for standard user interfaces. When a user works with a single object in a window and saves it, there really isn't that much to optimize.

Modeled methods

A more complex editing process may include several objects at once and perhaps trigger events that create additional auditing objects. Even in these cases, there are still only a handful of save operations to execute. To keep the architecture clean, an application is encouraged to model these higher-level operations with methods in the metadata (modeled methods).

The advantage to using modeled methods is that they can be executed in an application server as well as locally in the client. When an application uses a remote application server rather than a direct connection to a database, modeled methods are executed in the service layer and therefore have much less latency to the database.

When Quino's query language isn't enough

If an application needs even more optimization, then it may be necessary to write custom SQL -- or even to use stored procedures to move the query into the database. Mixing SQL with an ORM can be a tricky business. It's even more of a challenge with an ORM like that in Quino, which generates the database schema and shields the user from tables, fields and SQL syntax almost entirely.

What are the potential pitfalls when using custom query text (e.g. SQL) with Quino?

  • Schema element names: An application needs to figure out the names of database objects like table and columns. It would be best not to hard-code them so that when the model changes, the custom code will be automatically updated.

    • If the query is in a stored procedure, then the database may ensure that the code is updated or at least checked when the schema changes.2
    • If the query is in application code, then care can be taken to keep that query in-sync with the model
  • Materialization: In particular, the selected fields in a projection must match the expectations of the ORM exactly so that it can materialize the objects properly. We'll see how to ensure this in examples below.

There are two approaches to executing custom code:

  • ADO: Get a reference to the underlying ADO infrastructure to execute queries directly without using Quino at all. With this approach, Quino can still help an application retrieve properly configured connections and commands.
  • CustomCommandText: An application commonly adds restrictions and sorts to the IQuery object using expressions, but can also add text directly to enhance or replace sections of the generated query.

All of the examples below are taken directly from the Quino test suite. Some variables -- like DefaultDatabase -- are provided by the Quino base testing classes but their purpose, types and implementation should be relatively obvious.

Using ADO directly

You can use the AdoDataConnectionTools to get the underlying ADO connection for a given Session so that any commands you execute are guaranteed to be executed in the same transactions as are already active on that session. If you use these tools, your ADO code will also automatically use the same connection parameters as the rest of your application without having to use hard-coded connection strings.

The first example shows a test from the Quino framework that shows how easy it is to combine results returned from another method into a standard Quino query.

[Test]
public virtual void TestExecuteAdoDirectly()
{
  var ids = GetIds().ToList();
  var people = Session.GetList<Person>();

  people.Query.Where(Person.MetaProperties.Id, ExpressionOperator.In, ids);

  Assert.That(people.Count, Is.EqualTo(9));
}

The ADO-access code is hidden inside the call to GetIds(), the implementation for which is shown below. Your application can get the connection for a session as described above and then create commands using the same helper class. If you call CreateCommand() directly on the ADO connection, you'll have a problem when running inside a transaction on SQL Server. The SQL Server ADO implementation requires that you assign the active transaction object to each command. Quino takes care of this bookkeeping for you if you use the helper method.

private IEnumerable<int> GetIds()
{
  using (var helper = AdoDataConnectionTools.GetAdoConnection(Session, "Name"))
  {
    using (var command = helper.CreateCommand())
    {
      command.AdoCommand.CommandText = 
        @"SELECT id FROM punchclock__person WHERE lastname = 'Rogers'";

      using (var reader = command.AdoCommand.ExecuteReader())
      {
        while (reader.Read())
        {
          yield return reader.GetInt32(0);
        }
      }
    }
  }
}

There are a few drawbacks to this approach:

  • Your application will make two queries instead of one.
  • The hard-coded SQL will break if you make model changes that affect those tables and fields.
  • The ADO approach only works if the application has a direct connection to the database. An application that uses ADO will not be able to switch to an application-server driver without modification.

In the second part, we will improve on this approach by using the CustomCommandText property of a Quino query. This will allow us to use only a single query. We will also improve maintainability by reducing the amount of code that isn't checked by the compiler (e.g. the SQL text above).

Stay tuned for part 2, coming soon!

Many thanks to Urs for proofreading and suggestions on overall structure.



  1. This article uses features of Quino that will only become available in version 1.12. Almost all of the examples will also work in earlier versions but the AdoDataConnectionTools is not available until 1.12. The functionality of this class can, however, be back-ported if necessary.

  2. More likely, though, is that the Quino schema migration will be prevented from applying updates if there are custom stored procedures that use tables and columns that need to be changed.

v1.11.0: .Improvements to local evaluation & remoting

The summary below describes major new features, items of note and breaking changes. The full list of issues is also available for those with access to the Encodo issue tracker.

Highlights

  • Local evaluation: improved support for local evaluation in combination with remoting and, sorting, limits and offsets (QNO-4330, QNO-3655, QNO-4224)
  • Many other bug fixes and minor improvements

Breaking changes

  • No known breaking changes.
Quino: efficiency, hinting and local sorting

In Quino: partially-mapped queries we took a look at how Quino seamlessly maps as much as possible to the database, while handling unmappable query components locally as efficiently as possible.

Correctness is more important than efficiency

As efficiently as possible can be a bit of a weasel statement. We saw that partial application of restrictions could significantly reduce the data returned. And we saw that efficient handling of that returned data could minimize the impact on both performance and memory, keeping in mind, of course, that the primary goal is correctness.

However, as we saw in the previous article, it's still entirely possible that even an optimally mapped query will result in an unacceptable memory-usage or performance penalty. In these cases, we need to be able to hint or warn the developer that something non-optimal is occurring. It would also be nice if the developer could indicate whether or not queries with such deficiencies should even be executed.

When do things slow down?

Why would this be necessary? Doesn't the developer have ultimate control over which queries are called? The developer has control over queries in business-logic code. But recall that the queries that we are using are somewhat contrived in order to keep things simple. Quino is a highly generic metadata framework: most of the queries are constructed by standard components from expressions defined in the metadata.

For example, the UI may piece together a query from various sources in order to retrieve the data for a particular view. In such cases, the developer has less direct control to "repair" queries with hand-tuning. Instead, the developer has to view the application holistically and make repairs in the metadata. This is one of many reasons why Quino has local evaluation and does not simply throw an exception for partially mapped queries, as EF does.

Debugging data queries

imageIt is, in general, far better to continue working while executing a possibly sub-optimal and performance-damaging query than it is to simply crash out. Such behavior would increase the testing requirements for generated UIs considerably. Instead, the UI always works and the developer can focus on optimization and fine-tuning in the model, using tools like the Statistics Viewer, shown to the left.

imageThe statistics viewer shows all commands executed in an application, with a stack trace, messages (hints/warnings/info) and the original query and mapped SQL/remote statement for each command. The statistics are available for SQL-based data drivers, but also for remoting drivers for all payload types (including JSON).

The screenshot above is for the statistics viewer for Winform applications; we've also integrated statistics into web applications using Glimpse, a plugin architecture for displaying extra information for web-site developers. The screenshot to the right shows a preview-release version that will be released with Quino 1.11 at the end of March.

Sorting is all or nothing

One place where an application can run into efficiency problems is when the sort order for entities is too complex to map to the server.

If a single restriction cannot be mapped to the database, we can map all of the others and evaluate the unmappable ones locally. What happens if a single sort cannot be mapped to the database? Can we do the same thing? Again, to avoid being too abstract, let's start with an example.

var query = Session.GetQuery<Person>();
query
  .Where(Person.Fields.LastName, ExpressionOperator.StartsWith[1], "M")
  .OrderBy(Person.Fields.LastName)
  .OrderBy(Person.Fields.FirstName)
  .Join(Person.Relations.Company).WhereEqual(Company.Fields.Name, "IBM");

Assert.That(Session.GetList(query).Count, Is.Between(100, 120));

Both of these sorts can be mapped to the server so the performance and memory hit is very limited. The ORM will execute a single query and will return data for and create about 100 objects.

Now, let's replace one of the mappable sorts with something unmappable:

var query = Session.GetQuery<Person>();
query
  .Where(Person.Fields.LastName, ExpressionOperator.StartsWith[1], "M")
  .OrderBy(new DelegateExpression(c => c.GetObject<Person>().FirstName)
  .OrderBy(Person.Fields.LastName)
  .Join(Person.Relations.Company).WhereEqual(Company.Fields.Name, "IBM");

Assert.That(Session.GetList(query).Count, Is.Between(100, 120));

What's happening here? Instead of being able to map both sorts to the database, now only one can be mapped. Or can it? The primary sort can't be mapped, so there's obviously no point in mapping the secondary sort. Instead, all sorting must be applied locally.

What if we had been able to map the primary sort but not the secondary one? Then we could have the database apply the primary sort, returning the data partially ordered. We can apply the remaining sort in memory...but that won't work, will it? If we only applied the secondary sort in memory, then the data would end up sort only by that value. It turns out that, unlike restrictions, sorting is all-or-nothing. If we can't map all sorts to the database, then we have to apply them all locally.1

In this case, the damage is minimal because the restrictions can be mapped and guarantee that only about 100 objects are returned. Sorting 100 objects locally isn't likely to show up on the performance radar.

Still, sorting is a potential performance-killer: as soon as you stray from the path of standard sorting, you run the risk of either:

  • Choosing a sort that is mappable but not covered by an index on the database
  • Choosing a sort that is unmappable and losing out on index-optimized sorting on the database

In the next article, we'll discuss how we can extract slices from a result set -- using limit and offset -- and what sort of effect this can have on performance in partially mapped queries.



  1. The mapper also doesn't bother adding any ordering to the generated query if at least one ordering is unmappable. There's no point in wasting time on the database with a sort that will be re-applied locally.