Encodo C# Handbook 7.17 -- Using System.Linq

  Subscribe
3/19/2011 - Marco (updated on 11/13/2017)

I'm currently revising the Encodo C# Handbook to update it for the last year's worth of programming experience at Encodo, which includes a lot more experience with C# 4.0 features like optional parameters, dynamic types and more. The following is an expanded section on working with Linq. A final draft should be available by the middle of April or so.

7.17 -- Using System.Linq

When using Linq expressions, be careful not to sacrifice legibility or performance simply in order to use Linq instead of more common constructs. For example, the following loop sets a property for those elements in a list where a condition holds.

foreach (var pair in Data)
{
  if (pair.Value.Property is IMetaRelation)
  {
    pair.Value.Value = null;
  }
}

This seems like a perfect place to use Linq; assuming an extension method ForEach(this IEnumerable<T>), we can write the loop above using the following Linq expression:

Data.Where(pair => pair.Value.Property is IMetaRelation).ForEach(pair => pair.Value.Value = null);

This formulation, however, is more difficult to read because the condition and the loop are now buried in a single line of code, but a more subtle performance problem has been introduced as well. We have made sure to evaluate the restriction (Where) first so that we iterate the list (with ForEach) with as few elements as possible, but we still end up iterating twice instead of once. This could cause performance problems in border cases where the list is large and a large number of elements satisfy the condition.

7.17.1 -- Lazy Evaluation

Linq is mostly a blessing, but you always have to keep in mind that Linq expressions are evaluated lazily. Therefore, be very careful when using the Count() method because it will iterate over the entire collection (if the backing collection is of base type IEnumerable<T>). Linq is optimized to check the actual backing collection, so if the IEnumerable<T> you have is a list and the count is requested, Linq will use the Count property instead of counting elements naively.

A few concrete examples of other issues that arise due to lazy evaluation are illustrated below.

7.17.2 -- Capturing Unstable Variables/Access to Modified Closure

You can accidentally change the value of a captured variable before the sequence is evaluated. Since ReSharper will complain about this behavior even when it does not cause unwanted side-effects, it is important to understand which cases are actually problematic.

var data = new[] { "foo", "bar", "bla" };
var otherData = new[] { "bla", "blu" };
var overlapData = new List<string>();

foreach (var d in data)
{
  if (otherData.Where(od => od == d).Any())
  {
    overlapData.Add(d);
  }
}

// We expect one element in the overlap, bla
Assert.AreEqual(1, overlapData.Count);

The reference to the variable d will be flagged by ReSharper and marked as an access to a modified closure. This is a reminder that a variable referencedor capturedby the lambda expressionclosurewill have the last value assigned to it rather than the value that was assigned to it when the lambda was created. In the example above, the lambda is created with the first value in the sequence, but since we only use the lambda once, and then always before the variable has been changed, we dont have to worry about side-effects. ReSharper can only detect that a variable referenced in a closure is being changed within the scope that it checks and letting you know so you can verify that there are no unwanted side-effects.

Even though there isnt a problem, you can rewrite the foreach-statement above as the following code, eliminating the Access to modified closure warning.

var overlapData = data.Where(d => otherData.Where(od => od == d).Any()).ToList();

The example above was tame in that the program ran as expected despite capturing a variable that was later changed. The following code, however, will not run as expected:

var data = new[] { "foo", "bar", "bla" };
var otherData = new[] { "bla", "blu" };
var overlapData = new List<string>();

var threshold = 2;
var results = data.Where(d => d.Length == threshold);
var overlapData = data.Where(d => otherData.Where(od => od == d).Any());
if (overlapData.Any())
{
  threshold += 1;
}

// All elements are three characters long, so we expect no matches
Assert.AreEqual(0, results.Count());

Here we have a problem because the closure is evaluated after a local variable that it captured has been modified, resulting in unexpected behavior. Whereas its possible that this is exactly what you intended, its not a recommended coding style. Instead, you should move the calculation that uses the lambda after any code that changes variables that it capture:

var threshold = 2;
var overlapData = data.Where(d => otherData.Where(od => od == d).Any());
if (overlapData.Any())
{
  threshold += 1;
}
var results = data.Where(d => d.Length == threshold);

This is probably the easiest way to get rid of the warning and make the code clearer to read.

Sign up for our Newsletter