Anything Counts

by casperOne 27. March 2009 14:26

Daniel Cazzulino has an entry about how LINQ can be used to transform nasty-looking code expressions like embedded for loops into something much more readable (and by extension, maintainable).

In it, he transforms this piece of code:

private bool IsPresentationModel(CodeClass2 baseClass)
{
foreach (CodeClass2 pc in baseClass.PartialClasses)
{
foreach (CodeAttribute2 attr in pc.Attributes)
{
if (attr.FullName == typeof(GeneratedCodeAttribute).FullName &&
attr.Value.Contains(ThisAssembly.Title))
{
return true;
}
}
}

foreach (CodeAttribute2 attr in baseClass.Attributes)
{
if (attr.FullName == typeof(GeneratedCodeAttribute).FullName &&
attr.Value.Contains(ThisAssembly.Title))
{
return true;
}
}

return false;
}
To this:
private bool IsPresentationModel(CodeClass2 baseClass)
{
return baseClass.PartialClasses
.OfType<CodeClass2>()
.Select(cc => cc.Attributes)
.SelectMany(ce => ce.OfType<CodeAttribute2>())
.Concat(baseClass.Attributes.OfType<CodeAttribute2>())
.Where(attr =>
attr.FullName == typeof(GeneratedCodeAttribute).FullName &&
attr.Value.Contains(ThisAssembly.Title))
.Count() != 0;
}

All-around, it’s a much better solution, easier to read, easier to maintain, easier to understand.

But there’s a problem with it.

It’s in the last line:

.Count() != 0;

Here, Daniel is trying to check for the existence of an item, the actual count of items is irrelevant.

The documentation for the Count method on the Enumerable class states:

Returns the number of elements in a sequence.

Knowing that an IEnumerable<T> implementation only represents the sequence of elements, and nothing more, it’s safe to surmise that the only way to get the count of the number of elements in the sequence is to iterate through the entire sequence.  This is an O(N) operation, which for high values of N, can be pricey.

Now, it’s true that the actual implementation of the Count method checks whether or not the IEnumerable<T> implementation also implements ICollection<T>.  If so, it returns the value from the Count property on ICollection<T> instead of iterating through every element.  This reduces the method to an O(1) operation.

Unfortunately, that’s an implementation detail.

And we all know that coding against implementation details, except when absolutely necessary, is a bad, bad, BAD (say it again with me, BAD) idea.

Fortunately, the Enumerable class offers a solution in this case, the Any method.  From the documentation:

Determines whether a sequence contains any elements.

Which is the semantic equivalent of checking the number of elements in the sequence to zero.

As stated before, because IEnumerable<T> is representative only of the sequence of elements (and nothing more), the simplest way to implement this would be to iterate for just a single element.  If an element is returned, then the method returns true and false otherwise.  Since only one element is involved, this is an O(1) operation, which is always better than the O(N) operation that the Enumerable.Count method is.

So what have we learned here?

  • DO use the Enumerable.Any extension method to check for the existence of elements in the sequence.
  • DO NOT use Enumerable.Count extension method in comparisons against zero, as the following are semantically equivalent:
    • sequence.Count() == 0
    • !sequence.Any()
  • DO NOT use the Enumerable.Count extension method in comparisons against the “not zero” condition, as the following are semantically equivalent:
    • sequence.Count != 0
    • sequence.Any()

Tags: , , , , , ,

programming

Desirable Attributes

by casperOne 10. February 2009 18:41

The documentation for .NET Framework Core Development is a little thin when it comes to guidance on writing custom attributes.  Attributes are classes (just like everything else), and in that sense all of the framework documentation on designing classes is still very relevant.  Attributes however, play an important role in that they are part of the static definition of the assembly/type/member that they are applied to.

At its core, .NET is a statically-typed runtime (although .NET 4.0 will change that with the introduction of the Dynamic Language Runtime).  Types are verified at compile-time and their definition remains constant throughout the lifetime of any application domain that they are loaded into (yes, there are dynamic types, but once those types are defined, those definitions are static as well).

Given that, it comes as no surprise that the following code doesn’t compile:

using System;
using System.Reflection;

namespace DesirableAttributes
{
    public class Person
    {
        public string Name { get; set; }
        public int Age { get; set; }
    }

    class Program
    {
        static void Main(string[] args)
        {
            // Get the type of the person.
            Type personType = typeof(Person);

            // Get the age property.
            PropertyInfo agePropertyInfo = personType.GetProperty("Age");

            // Try to change the property to be of type string.
            agePropertyInfo.PropertyType = typeof(string);
        }
    }
}

The C# compiler outputs the error “Property or indexer 'System.Reflection.PropertyInfo.PropertyType' cannot be assigned to -- it is read only”.  And true to its word, the PropertyType property is read-only.  But if you stop to think about why it is read-only, it is the manifestation of the fact that .NET is a statically-typed system; changing the definition of the type at runtime would violate basic guarantees that are made by the CLR.

That being said, the following is permissible:

using System;
using System.Linq;
using System.Reflection;

namespace DesirableAttributes
{
    [AttributeUsage(AttributeTargets.All)]
    public class HelpAttribute : Attribute
    {
        /// <summary>The url where the documentation 
        /// for the member is located.</summary>
        public string DocumentationUrl { get; set; }
    }
    
    [Help(DocumentationUrl="http://www.tempuri.org/Person.html")]
    public class Person
    {
        public string Name { get; set; }
        public int Age { get; set; }
    }

    class Program
    {
        static void Main(string[] args)
        {
            // Get the type of the person.
            Type personType = typeof(Person);

            // Get the help attribute attached to the Person class.
            HelpAttribute helpAttribute = GetHelpAttribute(personType);

            // Outputs:
            //
            // Documentation Url Before: http://www.tempuri.org/Person.html
            Console.WriteLine("Documentation Url Before: {0}", helpAttribute.DocumentationUrl);

            // Change the type definition.
            helpAttribute.DocumentationUrl = "http://www.malicious.org/Person.html";

            // Outputs:
            //
            // Documentation Url After: http://www.malicious.org/Person.html
            Console.WriteLine("Documentation Url After: {0}", helpAttribute.DocumentationUrl);
        }

        static HelpAttribute GetHelpAttribute(Type type)
        {
            // Get the help attribute attached to the Person class.
            return type.GetCustomAttributes(typeof(HelpAttribute), false).
                Cast<HelpAttribute>().FirstOrDefault();
        }
    }
}

In the code sample above, the property on the attribute is permitted to be changed, effectively changing the type definition as long as that instance of the attribute is used.  This can cause a problem if instances of an attribute with read/write properties are held for extended periods of time.

Fortunately, the CLR does not let us down completely, in that further calls to GetCustomAttributes will not persist the property values that are changed on attributes:

using System;
using System.Linq;
using System.Reflection;

namespace DesirableAttributes
{
    [AttributeUsage(AttributeTargets.All)]
    public class HelpAttribute : Attribute
    {
        /// <summary>The url where the documentation 
        /// for the member is located.</summary>
        public string DocumentationUrl { get; set; }
    }
    
    [Help(DocumentationUrl="http://www.tempuri.org/Person.html")]
    public class Person
    {
        public string Name { get; set; }
        public int Age { get; set; }
    }

    class Program
    {
        static void Main(string[] args)
        {
            // Get the type of the person.
            Type personType = typeof(Person);

            // Get the help attribute attached to the Person class.
            HelpAttribute helpAttribute = GetHelpAttribute(personType);

            // Outputs:
            //
            // Documentation Url Before: http://www.tempuri.org/Person.html
            Console.WriteLine("Documentation Url Before: {0}", helpAttribute.DocumentationUrl);

            // Change the type definition.
            helpAttribute.DocumentationUrl = "http://www.malicious.org/Person.html";

            // Get the attribute again.
            helpAttribute = GetHelpAttribute(personType);

            // Outputs:
            //
            // Documentation Url After: http://www.tempuri.org/Person.html
            Console.WriteLine("Documentation Url After: {0}", helpAttribute.DocumentationUrl);
        }

        static HelpAttribute GetHelpAttribute(Type type)
        {
            // Get the help attribute attached to the Person class.
            return type.GetCustomAttributes(typeof(HelpAttribute), false).
                Cast<HelpAttribute>().FirstOrDefault();
        }
    }
}

The output from the code sample above shows that calling GetCustomAttributes will return different instances on each call.

So what can we gather from this?

The first thing that we can gather from this is the following tenant:

DO NOT cache instances of attributes returned from calls to GetCustomAttributes.

But what about our own attributes?  What can we do to ensure that clients of attributes that we define are not changed mistakenly or maliciously?

MSIL allows for attributes to have properties on values set in the application of the attribute.  C# has syntax in the language to realize this.  Recall the earlier code sample:

[Help(DocumentationUrl="http://www.tempuri.org/Person.html")]
public class Person

Note how the DocumentationUrl property is used to specify which property is being set.  While it is a nice, simple syntax for assigning attributes, it requires the property on the attribute to have read/write accessors, which presents the original issue of allowing the type to be mutated.

This leads us to the following tenants when creating our own attributes:

DO NOT expose properties on attributes that have read AND write accessors.

DO expose properties on attributes that are read-only.

DO expose constructors that take as parameters all the values that will be exposed through the read-only properties on the attribute.

With those in mind, the HelpAttribute would be redefined as:

[AttributeUsage(AttributeTargets.All)]
public class HelpAttribute : Attribute
{
    public HelpAttribute(string documentationUrl)
        : base()
    {
        // Set the documentationUrl.
        this.documentationUrl = documentationUrl;
    }

    string documentationUrl;

    /// <summary>The url where the documentation 
    /// for the member is located.</summary>
    public string DocumentationUrl { get { return documentationUrl; } }
}

And applying it to a type would change to:

[Help("http://www.tempuri.org/Person.html")]
public class Person

In defining the attribute this way, the attribute (and by extension, the assembly/type/member it is applied to) would not change over the life of the application domain, enforcing the ideal of a statically-typed environment.

But what are the drawbacks?

(Note: None of the code samples from this point on will compile, as they are all recommendations for the C# language)

The drawback here is that for attributes with a large number of parameters, you would have to define a constructor with a large number of parameters, which can become unwieldy.  Unfortunately, there currently isn’t a solution to this right now.

To address this, I would recommend having a class who’s sole purpose is to pass values to the constructor of the attribute:

public class HelpAttributeValues
{
    public string DocumentationUrl { get; set; }
}

[AttributeUsage(AttributeTargets.All)]
public class HelpAttribute : Attribute
{
    public HelpAttribute(HelpAttributeValues values)
        : base()
    {
        // Set the values.
        this.values = values;
    }

    HelpAttributeValues values;

    /// <summary>The url where the documentation 
    /// for the member is located.</summary>
    public string DocumentationUrl { get { return values.DocumentationUrl; } }
}

And then to apply it, one would use the new object initializer syntax introduced in C# 3.0 to give the appearance of named parameters:

[Help(new HelpAttributeValues() { 
    DocumentationUrl = "http://www.tempuri.org/Person.html" })]
public class Person

Combine this with my recommendation for enhancements to the new keyword, and it could be shortened to this:

[Help(new { DocumentationUrl = "http://www.tempuri.org/Person.html" })]
public class Person

Tags: , ,

programming

New, Var, the Chicken and the Egg

by casperOne 27. January 2009 12:16

Eric Lippert's most recent blog entry proposed an interesting suggestion for enhancing the C# language:

If our goal is to remove the redundancy, I would therefore prefer to remove it the other way. Make this legal:

private static readonly Dictionary<TokenKind, string> niceNames = new()...

That is, state the type unambiguously in the declaration and then have the "new" operator be smart about figuring out what type it is constructing based on what type it is being assigned to. This would be much the same as how the lambda operator is smart about figuring out what its body means based on what it is being assigned to.

I can't tell you how much I love this.  While I absolutely love the var keyword in C#, there is a massive potential for abuse here.  It is too easy to use it to obfuscate code and use it in situations where you don’t need it (you need it when generating an anonymous type, most often for projections in LINQ).

I love it so much that I’ve already suggested it to Microsoft at least a year ago.

Last year, at the MVP Global Summit, I proposed this idea to Mads Torgersenafter his presentation on possible future enhancements to the C# language.  This conversation resulted in a very long email that was sent to Mads on the subject, which I had hoped would result in its distribution inside Microsoft for discussion.

I even made a suggestion through the Microsoft Connect site, elaborating on the language feature, as well as it’s impact (or lack thereof on existing code).  The suggestion can be found here:

https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=388649

Addtionally, I posted the link in the microsoft.public.dotnet.csharp newsgroup, asking for comments on the recommendation as well.  The full thread can be found here:

http://groups.google.com/group/microsoft.public.dotnet.languages.csharp/browse_thread/thread/f8ab748deca4737

Needless to say, I’m a bit frustrated about the situation, given that there is no reference to the email, or the suggestion, both of which predated his blog post.

Of course, I’m completely cognizant of the fact ideas are a spontaneous thing and that the same ones can sprout anywhere.

However, what is frustrating to me is that this idea is coming from a person in the organization and on the team in that organization that I submitted the idea to, and that kind of coincidence is suspicious, at best.

Oh, and I will post the full details of the idea at a later date.

Tags: , , ,

programming

A Better Implementation Pattern for IDisposable

by casperOne 4. January 2009 19:18

During the first few years of programming in .NET, I sheepishly followed the Design Guidelines for Class Library Developers in regards to implementing IDisposable in my class libraries.

On the face of things, the recommendation is solid. A base class exposing IDisposable looks like this:

// Design pattern for a base class.
public class Base : IDisposable
{
    //Implement IDisposable.
    public void Dispose()
    {
        Dispose(true);
        GC.SuppressFinalize(this);
    }

    protected virtual void Dispose(bool disposing)
    {
        if (disposing)
        {
            // Free other state (managed objects).
        }
        // Free your own state (unmanaged objects).
        // Set large fields to null.
    }

    // Use C# destructor syntax for finalization code.
    ~Base()
    {
        // Simply call Dispose(false).
        Dispose(false);
    }
}

So what’s going on here?

First, a non-virtual method is exposed on the class so derived classes don’t have the opportunity to muck up cleanup code.

Next, a protected virtual override of Dispose handles the disposal of unmanaged resources/large objects as well as any other state that needs to be managed (if not called from the finalizer, then any references to other IDisposable implementations have their Dispose methods called as well).  This allows for base classes to be able to hook into the cleanup code without burdening it with the details of implementing the finalizer or the method on IDisposable:

// Design pattern for a derived class. 
public class Derived : Base
{
    protected override void Dispose(bool disposing)
    {
        if (disposing)
        {
            // Release managed resources. 
        }
        // Release unmanaged resources. 
        // Set large fields to null. 
        // Call Dispose on your base class. 
        base.Dispose(disposing);
    }
    // The derived class does not have a Finalize method 
    // or a Dispose method with parameters because it inherits 
    // them from the base class. 
}

(Both code samples were copied from “Implementing Finalize and Dispose to Clean Up Unmanaged Resources”)

Last, the finalizer calls the same protected virtual overload of Dispose, indicating that it should only clean up unmanaged resources and large objects (not references that implement IDisposable).

The effect here is twofold.  The first is that calling Dispose has the effect of freeing any state that needs freeing at a determined point in time (usually when you are done with the resource).

The second is that if you forget to call Dispose, the garbage collector will do it for you.

The problem lies with the statement “if you forget to call Dispose.”

Pop quiz, how many of the following errors in code would be tolerated in a production environment:

  • Not hooking up an event handler for an event that code needs to react to
  • Not calling Monitor.Exit (unlikely, but a possibility if not using the lock statement or calling TryEnter) after a call to Monitor.Enter/Monitor.TryEnter
  • Not throwing an exception when an error occurs (one that is worthy of an exception, that is)
  • Not synchronizing access to a resource that is accessed by multiple threads of execution

In any of these cases, this would be considered a bug and fixed.

But why is this a bug?

If the finalizer executes (remember, the object is removed from the finalization queue if Dispose is called) then that means that the instance did not have the Dispose method called on it.  In this case, you have circumvented the entire reason for implementing IDisposable in the first place, which is to make the developer aware that attention needs to be paid to instances of the implementing type.

Microsoft exacerbates the situation with the current guidelines because it actually obfuscates the issue.  With the recommendation, if Dispose is never called then the resource is eventually freed up at some indeterminate point of time in the future.

That recommendation ends up placing this on the same level as the race-condition bug noted in the last bullet point .  Your app might run very smoothly for the most part, but at random times, you have exceptions that you can’t trace (because the resource in question is corrupted and leads to states in your application that you aren’t prepared for most likely).

In the case of not calling Dispose, this will most likely manifest itself in the form of starvation of the resource managed by the IDisposable implementation at random times (sockets, database connections, etc., etc.) and even then, if at all, depending on the usage pattern of your app.  You might never see the exception, but then (at a critical moment, of course) your app fails, and you haven’t the faintest reason why.

So what can I do to prevent this?

The bad news is that outside of your code, not much.  You have to live with the fact that everything in the framework and most other library code written follows this guideline.  If you miss cleaning up one of these resources and you run into an intermittent problem involving a resource that isn’t cleaned up properly, a great deal of painful debugging awaits you.

To add to that, I haven’t seen many static-analysis tools that catch this either.

Quite simply, there isn’t a definite way to catch this at compile-time (or to be more specific, without running code).

What about inside my code?

That is a different story.  You can simply throw an exception in the finalizer.

You are surely thinking now “woah, that’s a massively bad idea”.  Well, yes, it is.  You are going to hose your app in a major way either during the garbage collection that occurs with the IDisposable implementation which hasn’t had Dispose called on it, or on exit of the app.

But that’s the point.

Surely you aren’t going to deliver your app without running it at least once, right?  And if you are in a more formalized software development environment, you are going to run it through all sorts of unit tests, quality assurance tests, and so on.

So even if you can’t catch the error at compile time, you can catch it much earlier, and hopefully, before your app ships (assuming you have proper test coverage of your app).

Here is the code sample.  For all intents and purposes, the derived class remains the same:

// Design pattern for a derived class.
public class Derived : Base
{
    protected override void Dispose(bool disposing)
    {
        if (disposing)
        {
            // Release managed resources.
        }
        // Release unmanaged resources.
        // Set large fields to null.
        // Call Dispose on your base class.
        base.Dispose(disposing);
    }
    // The derived class does not have a Finalize method
    // or a Dispose method with parameters because it inherits
    // them from the base class.
}

The changes come in the base class:

// A better design pattern for a base class.
public class Base : IDisposable
{
    // Implement IDisposable.
    // This remains the same, this instance should not be finalized if Dispose is called.
    public void Dispose()
    {
        Dispose(true);
        GC.SuppressFinalize(this);
    }

    protected virtual void Dispose(bool disposing)
    {
        if (disposing)
        {
            // Free other state (managed objects).
        }
        // Free your own state (unmanaged objects).
        // Set large fields to null.
    }

    // The change is here, you want to throw your exception if you do not call Dispose.
    ~Base()
    {
        // Throw an exception.
throw new InvalidOperationException("Dispose method not called."); } }

Things to note:

  • The public overload of Dispose is the same.
  • The protected overload has not been changed, but only to make it easier for those that already have Dispose implementations following the original guideline.  Ideally, it would be changed to a method with no parameter (since only true will ever be passed here, there is no point in having it, as well as the conditional with the check against the parameter) and renamed (since it would conflict with the public Dispose method).  Do NOT refactor the public method into a virtual method, as that would give derived classes the ability to screw things up.
  • The exception is thrown in the finalizer.  I’d recommend resource strings for localization purposes of course.

Tags: , ,

programming