Anything Counts

by casperOne 27. March 2009 14:26

Daniel Cazzulino has an entry about how LINQ can be used to transform nasty-looking code expressions like embedded for loops into something much more readable (and by extension, maintainable).

In it, he transforms this piece of code:

private bool IsPresentationModel(CodeClass2 baseClass)
{
foreach (CodeClass2 pc in baseClass.PartialClasses)
{
foreach (CodeAttribute2 attr in pc.Attributes)
{
if (attr.FullName == typeof(GeneratedCodeAttribute).FullName &&
attr.Value.Contains(ThisAssembly.Title))
{
return true;
}
}
}

foreach (CodeAttribute2 attr in baseClass.Attributes)
{
if (attr.FullName == typeof(GeneratedCodeAttribute).FullName &&
attr.Value.Contains(ThisAssembly.Title))
{
return true;
}
}

return false;
}
To this:
private bool IsPresentationModel(CodeClass2 baseClass)
{
return baseClass.PartialClasses
.OfType<CodeClass2>()
.Select(cc => cc.Attributes)
.SelectMany(ce => ce.OfType<CodeAttribute2>())
.Concat(baseClass.Attributes.OfType<CodeAttribute2>())
.Where(attr =>
attr.FullName == typeof(GeneratedCodeAttribute).FullName &&
attr.Value.Contains(ThisAssembly.Title))
.Count() != 0;
}

All-around, it’s a much better solution, easier to read, easier to maintain, easier to understand.

But there’s a problem with it.

It’s in the last line:

.Count() != 0;

Here, Daniel is trying to check for the existence of an item, the actual count of items is irrelevant.

The documentation for the Count method on the Enumerable class states:

Returns the number of elements in a sequence.

Knowing that an IEnumerable<T> implementation only represents the sequence of elements, and nothing more, it’s safe to surmise that the only way to get the count of the number of elements in the sequence is to iterate through the entire sequence.  This is an O(N) operation, which for high values of N, can be pricey.

Now, it’s true that the actual implementation of the Count method checks whether or not the IEnumerable<T> implementation also implements ICollection<T>.  If so, it returns the value from the Count property on ICollection<T> instead of iterating through every element.  This reduces the method to an O(1) operation.

Unfortunately, that’s an implementation detail.

And we all know that coding against implementation details, except when absolutely necessary, is a bad, bad, BAD (say it again with me, BAD) idea.

Fortunately, the Enumerable class offers a solution in this case, the Any method.  From the documentation:

Determines whether a sequence contains any elements.

Which is the semantic equivalent of checking the number of elements in the sequence to zero.

As stated before, because IEnumerable<T> is representative only of the sequence of elements (and nothing more), the simplest way to implement this would be to iterate for just a single element.  If an element is returned, then the method returns true and false otherwise.  Since only one element is involved, this is an O(1) operation, which is always better than the O(N) operation that the Enumerable.Count method is.

So what have we learned here?

  • DO use the Enumerable.Any extension method to check for the existence of elements in the sequence.
  • DO NOT use Enumerable.Count extension method in comparisons against zero, as the following are semantically equivalent:
    • sequence.Count() == 0
    • !sequence.Any()
  • DO NOT use the Enumerable.Count extension method in comparisons against the “not zero” condition, as the following are semantically equivalent:
    • sequence.Count != 0
    • sequence.Any()

Tags: , , , , , ,

programming