Friday, December 19, 2008

Out for Christmas Break

Well, I'll be gone for a while for Christmas Break. When I come back, I'll have a wonderful series about Exceptions. I'll go through specific exceptions and describe what causes them, and what you can do to debug them. But for now, I'll leave you with some programmer's Christmas Carols:

If Programming Languages Were Christmas Carols

Thursday, December 18, 2008

Linq Subqueries

It's a rather simple problem, but a little difficult to figure out at first with Linq. The problem of subquerying. Let's say I have a FamilyMember class, that I've defined like this:

public class FamilyMember
{
public FamilyMember(string name)
{
Name = name;
}
public string Name { get; private set; }
}


Now, let's say that I've defined a Family class like this:

public class Family
{
public Family(string state, FamilyMember husband, FamilyMember wife)
{
State = state;
Members = new List<FamilyMember>();
Members.Add(husband);
Members.Add(wife);
}

public Family(string state, FamilyMember husband, FamilyMember wife, params FamilyMember[] children) :
this(state, husband, wife)
{
Members.AddRange(children);
}

public string State { get; private set; }

public List<FamilyMember> Members { get; private set; }
}


Now, let's say I add a whole bunch of families to a list:

// Create the family members.
FamilyMember glenn = new FamilyMember("Glenn");
FamilyMember debi = new FamilyMember("Debi");

FamilyMember daniel = new FamilyMember("Daniel");
FamilyMember amanda = new FamilyMember("Amanda");
FamilyMember rachel = new FamilyMember("Rachel");
FamilyMember joshua = new FamilyMember("Joshua");

FamilyMember matthew = new FamilyMember("Matthew");
FamilyMember shannon = new FamilyMember("Shannon");
FamilyMember elizabeth = new FamilyMember("Elizabeth");
FamilyMember abigail = new FamilyMember("Abigail");

FamilyMember david = new FamilyMember("David");
FamilyMember jennifer = new FamilyMember("Jennifer");

// Create the families.
Family springMortons = new Family("TX", glenn, debi);
Family bostonMortons = new Family("MA", daniel, amanda, rachel, joshua);
Family aggieMortons = new Family("TX", matthew, shannon, elizabeth, abigail);
Family houstonMortons = new Family("TX", david, jennifer);

// Make a list of all the families
List<Family> allFamilies = new List<Family>()
{ springMortons, bostonMortons, aggieMortons, houstonMortons };


Now that I have this list, I want to pull out all of the family members located in texas whose names begin with "D". How would I do this? It seems that Linq allows for nested subqueries, and they're rather simple to use once you get the hang of it. Here's the answer:

var dPeople = from family in allFamilies
where family.State == "TX"
from member in family.Members
where member.Name.StartsWith("D")
select member;


That's it! It's as simple as making two nested from statements. The first from statement will select the families, and the second will select the individuals within the families. The where clause immediately below the family selection will filter the families, and the where clause immediately below the member selection will filter the members. The resulting dPeople is an IEnumerable<FamilyMember>, and you can easily iterate through it to access each instance.

Happy coding!

Wednesday, December 17, 2008

Resetting Default Values in the Visual Studio IDE

Resetting default values in the Visual Studio IDE is simple. Simply, right click on any property whose default value is not set, and choose "Reset" from the associated context menu. This is a lifesaver when you really want to remove an icon from a form, and it can save you a trip to the designer.cs.




Tuesday, December 16, 2008

The Joys of the Path Class

I'm beginning to think that the Path class is sadly one of the most overlooked classes out there. I've seen plenty of good developers coming up with all sorts of crazy string parsing code to split apart filenames, when the Path class is quietly sitting alone, with no friends. Because I want the Path class to be social and have as many friends as its brothers, FileInfo and DirectoryInfo, allow me to introduce you to a few of the Path class's methods:
  • Path.GetDirectoryName - Takes a string parameter, and returns a string representing only the directory name of the file. Say goodbye to filename.Substring(0, filename.LastIndexOf(@"\")), and say hello to this method.
  • Path.GetFileName - How many of you have found yourself typing string filename = path.Substring(filename.LastIndexOf(@"\") + 1); when you could have just been using this method, which nicely extracts the filename out of the whole path for you?
  • Path.GetFileNameWithoutExtension - Oh, this is a nice one. If given the string "C:\Program Files\CompanyName\Filename.txt", Path.GetFileNameWithoutExtension will return simply "Filename".
  • Path.GetTempPath - Returns a string containing the full path to the system's temporary folder.
  • Path.ChangeExtension - Will change the extension of any filename in a path, and return to you the new full path.
  • Path.Combine - One of my favorites, this will allow me to take a string like "C:\Program Files\CompanyName" and another string like "Filename.txt", and easy get "C:\Program Files\CompanyName\Filename.txt".

Again, this is sadly overlooked by many developers, but it's there, so use it when you need to.

Monday, December 15, 2008

How Your Memory Affects Code Understanding

In a 1956 article by Psychologist George Miller it was determined that the short term memory of a human being is limited to (roughly) 7 "bits" of information at a time. Beyond this, the long term memory has to kick in.

I've started thinking about this in terms of how I refactor my code. Let's say I have a piece of code that looks like this:

Console.WriteLine(string.Format("Hello {0}, {1} has called you {2} time{3} today", 

   callee.FirstName, caller.FirstName, callcount, (callcount == 1 ? "" : "s")));


While this one line describes exactly what I want to do, it's incredibly confusing and requires me to spend alot of time analyzing it to understand the purpose. While short, it's relatively unreadable, and takes a good deal of time to comprehend.

Miller also talks of how combining bits of information together with each other can help memory retention. He calls these groups of bits chunks, and states that a chunk is about as difficult to remember as a bit. This means that the average programmer can remember, in short term memory, approximately 7 items, which could be bits or chunks.

How does this apply to your code? The code I posted above is one long line. The average reader is going to read code in bits, and then mentally group those pieces of code together to create one coherent whole. This process is made more difficult when the number of bits in the code exceed the number of bits that the human mind can remember mentally. When the user then has to analyze the text to find ways to chunk the information, this can drastically slow the process of reading code. In fact, the very act of analyzing the code will take a slot or two of this short term 7-item maximum, and will replace what's already there. Here's an example of how one developer might mentally separate the bits of data in their mind:

Console.WriteLine ... string.Format ... "Hello {0}, {1} has called you {2} time ... 

    {3} ...  today" ... callee ... FirstName ... caller ... FirstName ...

    callcount ... callcount == 1 ... ? ... "" ... : ... "s"


This is too many bits of information for the average person to remember in short term memory without chunking. So the reader would then be forced to reread the code in order to chunk the bits of information, so that he or she can better understand the code. The reader may have to make multiple passes to chunk the data mentally, and this can take significant time. The reader will have to focus on a small portion of the text, group that portion, and then move to a larger portion, and group that portion, and finally put it all together. This can take significant time, and can all but destroy efficiency in code review. Here's how our reader might chunk the information:

Console.WriteLine ... string.Format ... "Hello {0}, {1} has called you {2} time ... 

    {3} today" ... callee.FirstName ... caller.FirstName ...

    callcount ... (callcount == 1 ?  "" : "s")


Now the reader has 8 distinct chunks. This is somewhat manageable on a good day (says Miller). If I'm really tired, I might have to chunk it again:

Console.WriteLine(string.Format("Hello {0}, {1} has called you {2} time{3} today" ... 

    callee.FirstName ... caller.FirstName ...

    callcount ... (callcount == 1 ?  "" : "s")


Now I've gotten the chunks of code down to 5 chunks, but it took me two passes to get there. Compare the code above to the following code:
Now the question is, would it be easier for the reader of your code to group your code in their own mind, or would it be easier for you to chunk the code for them, by setting some local variables, causing the developer to read your code in small bits. Compare the code I first posted with the following:

string helloMessage = "Hello {0}, {1} has called you {2} {3} today";

 

string timeText = "time";

if (callcount != 1)

    timeText = "times";

 

string displayText = string.Format(helloMessage , 

    callee.FirstName, caller.FirstName, callcount, timeText);

 

Console.WriteLine(displayText);


Now, while this code may be significantly longer than the original code, and while small shortcuts could be taken without significantly affecting the speed of comprehension for the code, the above code groups for the reader. It makes the comprehension of the code somewhat simpler. Also, note that I've respected the natural chunking of the words "times" and "time". I haven't separated "time" from it's plural specifier. Splitting the word would create two bits of information from one bit of information that would need to be rejoined mentally before understanding the code fully. Finally, notice that I've indented a line when declaring the displayText variable. The reason for this is that the indentation creates a natural break mentally for the reader. It makes it simpler for the developer to chunk the data, and the indentation is placed functionally at a logical location. (The top line pertains to the format string, and the bottom line pertains to the objects to be inserted into the format string.)

In conclusion, while the one-line version of the code is less verbose and has fewer lines than the later version, it has to be grouped and chunked to be understood. Doing so can easily lead into fatigue or misunderstanding of the code. It's also harder to debug. So, in general, it's a better idea to write your code like you're writing it to a five year old, than to conglomerate and nest code statements ad infinitum in an attempt to "shorten" your code.

On a side note, you can naturally avoid many of the pitfalls of long one-line statements by setting a larger font in your IDE and reducing the amount of horizontal space available in your editor window. Doing this will cause you to naturally desire shorter, more easily digested, code statements.

Friday, December 12, 2008

Understanding Lambda Expressions

C# 3.0 introduced a new feature called lambda expressions. Using this syntax, you can shorten your code immensely, however, if you've never seen it before, it can be extremely confusing. I'm going to do my best in this post to explain how they work, but first, it's important to lay the groundwork. In that spirit, let's have a history lesson...

C# 1.0: Delegates and Methods

Delegates are essentially placeholders for methods. They define incoming and outgoing parameter types to create a specific signature for a method. They can be constructed, and in their constructor, you would pass in a pointer to a method whose signature matches the signature of the delegate. For instance, take the following delegate as an example:

public delegate int GetLengthDelegate(string value);


This delegate could be constructed just as any class instance can be constructed. Let's say you have the following method:

public int GetLength(string value)

{

    return value.Length;

}


Now, I can do the following:

public void RunDelegate()

{

    GetLengthDelegate getLength = new GetLengthDelegate(this.GetLength);

    Console.WriteLine(this.GetLength("Hello World!"));

    Console.WriteLine(getLength("Hello World!"));

}


In the code above, I'm first calling GetLength without using a delegate, and then I'm calling the very same method via the use of a delegate. In both situations, I'm getting the same result, because I'm passing in the same value.

The joy of delegates, is that they can be passed into a method. Let's say I added this code:

public void PrintLengths(GetLengthDelegate del)

{

    del("Hello World!");

}


And then I changed my RunDelegate method to look like this:

public void RunDelegate()

{

    GetLengthDelegate getLength = new GetLengthDelegate(this.GetLength);

    PrintLengths(getLength);

}


Now, I'm creating a delegate called getLength that points to the method this.GetLength, and then I'm passing it into another method, called PrintLengths, which will then call the delegate.

Delegates can be combined with each other, meaning that you could set up a delegate so that when you call del("Hello World"), you could trigger off several methods, which would be called one after another. This is called combining delegates, and it's a little out of scope for our topic.

In any case, all the syntax you see above is old, and has been around since the inception of C# 1.0.

C# 2.0: Anonymous Delegates

C# 2.0 introduced anonymous methods. This enabled programmers to write methods anonymously, without actually naming the method. These anonymous methods were used to pass into the constructors of delegates, in order to allow methods that take delegates as parameters call delegates that might be too short or too simple to really merit being programmed in as a member of the class. Anonymous delegates had the following syntax:

delegate(string value) { return value.Length; }


The first thing you'll notice here is a lack of definition for the return value. This is because the C# compiler would infer the type of the return value, based on the return values coded into the delegate.

The second thing you might notice is the use of the word delegate. Yes, the C# team decided that the term delegate can be used in two contexts. Used in one context delegate defines a delegate type, and in another context it defines an anonymous method. I personally feel they could have named it something else to be less confusing, but it is what it is.

The last thing you should notice about the anonymous method above, is that it has the exact same functionality as the GetLength method. In fact, I can actually use this anonymous method in the constructor to the GetLengthDelegate. This means that I could change my RunDelegate method to this:

public void RunDelegate()

{

    GetLengthDelegate getLength =

         new GetLengthDelegate(delegate(string value) { return Value.Length; } );    PrintLengths(getLength);

}


In fact, I could even go further and declare it without explicitly calling the GetLengthDelegate constructor, nor storing the instance of the delegate in the local getLength variable.

public void RunDelegate()

{

     PrintLengths(delegate(string value) { return value.Length; });

}


As you can see, the anonymous method syntax was a great improvement over the previous way of doing things, where an actual method had to be declared.

One caveat, however, anonymous methods cannot be declared without a context. That is, you can't just type a anonymous method into another method and expect it to work. They're designed to be assigned to delegates, not called straight out.

C# 3.0: Lambda Expressions

In C# 3.0, lambda expressions were introduced. Lambda expressions are nothing but syntactical sugar built upon anonymous methods. Our hero, the GetLength method, would look like this in an anonymous method:

s => s.Length


That's it. Just like the anonymous methods, however, they need to be assigned to a delegate. Thus, you could do this to create the same method as we've had above:

GetLengthDelegate getLength = s => s.Length;


Now, your first question is going to be obvious: "What is s?" Let me put it this way, I could have also written the above line as follows:

GetLengthDelegate getLength = value => value.Length;


Do you see it now? s and value actually refer to the string parameter sent into the lambda expression. We know that this is a string parameter, because we're assigning this lambda expression to the delegate GetLengthDelegate, which expects an incoming string parameter. Therefore, the C# compiler infers the type not only of the return value, but also of the incoming parameter.

What's more, the return keyword is conspicuously absent from the lambda expression. The C# compiler is smart enough to know that the value created on the right side of the => symbol is the return value of the delegate.

So, to put it briefly, the value on the left side of the => is the incoming parameter, and the value on the right is the outgoing parameter.

Putting It All Together

In .NET 3.5, LINQ was introduced. LINQ methods allow a user to filter, search, and do all sorts of other funky things to collections of items simply by providing a set of extension methods that accept delegates. For instance, there's the Where method, which takes a delegate of type Func. Now, what is Func? Basically, it's a generic delegate whose second parameter has been defined, but the first has not been. The actual definition of Func is the following:

public delegate TResult Func<t,>(T arg);


So it's a generic delegate. This means, that declaring it as a Func would mean that you're passing in a string as the incoming parameter, and expecting a bool as the return parameter. What does this mean for the Where method?

Well, basically, the Where's return type is IEnumerable<T>, where T refers to the generic type parameter that fills in the collection off of which you call the method. So, if you have a List<string>, the return of calling Where will return an IEnumerable<string> and you should pass in a method with the same signature as Func<string>. If you have a List<int>, the return parameter will be IEnumerable<int>, and you should pass in method with the same signature as Func<int>. The Where method will call the delegate you pass in to it for every object contained in your enumerable, and will return an IEnumerable<T> that contains all the objects for which this method returns true. Maybe an example will help:

List<string> list = new List<string>();

 

list.Add("David");

list.Add("Bob");

list.Add("Michael");

 

foreach (string name in list.Where(s => s.Length == 5))

    Console.WriteLine(name);


In the example above, I'm creating a list, and populating it out with three names. "David", "Bob" and "Michael". I'm then creating an IEnumerable<T> (in this case, an IEnumerable<string> containing only the names that are 5 characters long. In my example, that would only be "David", but if I had added "Jimmy" to the list, that would also be returned. The Where method will go through each of the items in list and evaluate the lambda expression I passed into the Where statement. In other words, it will call that method on each of the items in list. The if the return of the call to the lambda expression is true, the Where method will return that value as part of it's result, thus filtering the list to include only the string values that are 5 characters long.

I hope this explanation has been enlightening if you've been wondering what lambda expressions are. There is a bit more information on lambda expressions on the MSDN Library, as well as some generally out on the internet. Nevertheless, there's no replacement to learning the C# language, other than to code in it, so pull out your Visual Studio, load up a list, and give it a go. I think you'll find that these expressions can save you alot of time, and simplify your code greatly.

Thursday, December 11, 2008

On Return Values and Readability

Recently, I’ve been doing a lot of refactoring, and the importance of variable names has been re-impressed on my mind. Even in relatively clean code, changing the names of variables can improve things dramatically. Let’s start with the common return variable named result. The following is some relatively easy to understand code that can be altered to improve readability.

public Person GetPersonById(int id)

{

    if (id < 0)

        throw new ArgumentException("Id cannot be less than 0");

 

    Person result = null;

 

    using (SqlConnection cn = new SqlConnection(Config.SqlConnectionString))

    {

        cn.Open();

        using (SqlCommand cm = cn.CreateCommand())

        {

            cm.CommandText = "Person_GetById";

            cm.CommandType = CommandType.StoredProcedure;

            using (SqlDataReader dr = cm.ExecuteReader())

            {

            if (dr.Read())

                {

                    result = new Person();

                    result.Id = dr.GetInt32(dr.GetOrdinal("Id"));

                    result.FirstName = dr.GetString(dr.GetOrdinal("FirstName"));

                    result.LastName = dr.GetString(dr.GetOrdinal("LastName"));

                }

            }

        }

    }

 

    return result;

}


Now, maybe you think I’m picking a little bit too much, but I don’t like the variable result here. It’s not that result isn’t descriptive (we obviously know throughout the code that result is going to be returned from the method), it’s that it causes a simple readability issue that could be changed. Take the last line for instance:

return result;


Although this line is straightforward and simple, it tells me absolutely nothing in and of itself about the type of result that I’m returning. In order to get that, I either have to look up a few lines at the return type of the method, or I have to look at the declaration of the variable earlier on in the code. While that doesn’t seem like a whole lot to ask of your fellow developers who are reading your code after you, using result isn’t doing them any favor when they have literally hundreds of methods that end this exact same way. Simply changing this variable name can make that single line more readable. Which looks better, the above return statement, or the following:

return person;


Obviously, the second is more descriptive. In one statement, it’s clear I’m returning a person. It’s true that I could have looked at the method declaration at the top to determine that I will be returning a person, however, that’s going to add another split second of brain power (and perhaps mouse scrolling) to glance back up to the top of the method. Multiply this several times for each method, and you'll quickly have a difficult-to-skim code base.

This leads me to my next complaint about the naming conventions in this code. There are three variables in named cn, cm, and dr respectively, and they’re all acronyms. I hate acronyms with a passion, unless they’re very well-known acronyms to the rest of the world. There’s a difference between using SQL and using cn. SQL is well-known, whereas cn is proprietary to my own mind. I might have always used cn, but another developer may not. This is not an accepted standard. Never assume that other people use the same variable naming standards and habits as you. This needs to be named something more descriptive, such as sqlConnection.

Before I show you my final code, I should tell you I decided to make one more change in the way the code is laid out. I separated the retrieval of the column ordinal numbers from the code that fetches the value of the columns. There’s two reasons for this. The first is (of course) readability. It’s easier to visually understand one line that gets the ordinal number for a column, than it is to understand one line that does that and fetches the value at the same time.

The final version looks like this:

public Person GetPersonById(int id)

{

    if (id < 0)

        throw new ArgumentException("Id cannot be less than 0");

 

    Person person = null;

 

    using (SqlConnection sqlConnection = new SqlConnection(Config.SqlConnectionString))

    {

        sqlConnection.Open();

 

        using (SqlCommand sqlCommand = sqlConnection.CreateCommand())

        {

            sqlCommand.CommandText = "Person_GetById";

            sqlCommand.CommandType = CommandType.StoredProcedure;

 

            using (SqlDataReader sqlDataReader = sqlCommand.ExecuteReader())

            {

                int idColumn = sqlDataReader.GetOrdinal("Id");

                int firstNameColumn = sqlDataReader.GetOrdinal("FirstName");

                int lastNameColumn = sqlDataReader.GetOrdinal("LastName");

 

                if (sqlDataReader.Read())

                {

                    person = new Person();

                    person.Id = sqlDataReader.GetInt32(idColumn);

                    person.FirstName = sqlDataReader.GetString(firstNameColumn);

                    person.LastName = sqlDataReader.GetString(lastNameColumn);

                }

            }

        }

    }

 

    return person;

}


Now, you may disagree with this approach. However, I believe the major advantage here is that every single line can be easily understood as a unit all to itself. I could feed the average developer any one line out of this whole block of code, and he should be able to tell me exactly what the line is doing. This makes for easy skimming and easy understandability.

Hopefully, in reading this, you’ve come to the same conclusion that I have. Naming your variables properly is of utmost importance. Here’s a couple of rules to keep in mind:

1. Don’t use result (or any of it’s variants such as rtn, returnValue, or ret. Even if the result value is a primitive type, the value should stand for something. Name your return values based on what they stand for in the business world. C# is an object-oriented language. Think in objects, and name your parameters, including your return value, appropriately.

2. Read your code aloud. Does it make sense when you read it aloud? If not, you could probably change something to make it more readable.

3. Move outside of “this-week-thinking”. We developers tend to get caught up in what we’re doing right now without concern for the future. In the excited frenzy of creativity and laying out new code, we can often forget that we’re going to have to read our own code sometime in the future, after we forget what we originally wrote. Try to remove yourself from the current situation and read the method as though you’ve never seen it before. Does it describe itself? If not, do the refactoring immediately after writing the method.

Over the next few posts, I hope to be writing more about clean code production and refactoring. I hope they help.

Friday, December 5, 2008

Something Every .NET Developer Should Do Today

Ever heard of the Microsoft Shared Source Initiative? If not, you need to check it out. Microsoft has released the source code for several of the .NET Base Class Libraries, plus some that aren't in the BCL. With the Shared Source Initiative, you can access .NET source code directly from the debugger in VS2008. Visit the link above for more information, and how to set up source code access from Visual Studio 2008. It takes all of 5 minutes, and might save you tons of headaches in the future.