Wednesday, June 24, 2009

How to Load an XmlDocument and Completely Ignore DTD

A question came up on the Forums today from someone looking to ignore the DOCTYPE tag on an XML file while loading an XML file into an XmlDocument class instance without first reading the whole file and using something like Regex to replace the element. In other words, he was looking for a fast performing solution.

The XmlDocument class loads XML files via the Load or LoadXml methods, which all ultimately convert to an XmlTextReader before reading the XML. There's one exception to this rule, however, and that's the Load overload that accepts an XmlReader.

More than this, it's the XmlReader, and not the XmlDocument that resolves DTD validation arguments. It does this by using the XmlResolver set in the XmlReaderSettings.XmlResolver property.

To solve this issue, create an instance of XmlReaderSettings, and allow DTD processing by setting ProhibitDTD to false, but then remove the ability for the XmlReader to resolve the address specified in the DOCTYPE element by setting the XmlResolver property to null. After doing this, you can safely create an XmlReader, and pass the reader into the Load method of the XmlDocument, and the XmlDocument will load the specified XML file without validating the document.

The following code assumes you have your XML file loaded into a Stream named "xmlStream".


// Create an XmlReaderSettings object.  
XmlReaderSettings settings = new XmlReaderSettings();

// Set XmlResolver to null, and ProhibitDtd to false.
settings.XmlResolver = null;
settings.ProhibitDtd = false;

// Now, create an XmlReader. This is a forward-only text-reader based
// reader of Xml. Passing in the settings will ensure that validation
// is not performed.
XmlReader reader = XmlTextReader.Create(xmlStream, settings);

// Create your document, and load the reader.
XmlDocument doc = new XmlDocument();
doc.Load(reader);

Tuesday, June 16, 2009

Forums Browser Latest Release!

I just released another version of the Forums Browser application, and this one seems relatively stable. Most of the issues from the first release are resolved at this point, and I believe I have exactly one item left on the issues list at this point.

Anyways, here's the link to the latest version:

Forums Browser

Enjoy!

Wednesday, June 10, 2009

ForumsBrowser - It's Out!

I'm pleased to announce the opening of the ForumsBrowser page on CodePlex. On it, I've added the source code, and an initial release of the software as an installable msi file.

Be forewarned, however, that this software is still in development, so it's probably going to crash on you. :) Lemme know if it does. (A complete stack trace would be wonderful if you can manage to get it :) ).

Anyways, here it is!

http://forumsbrowser.codeplex.com/

Also, let me know if you're interested in contributing.

Monday, June 8, 2009

An Updated Preview of a Forums Browser

Well, I've been working on the MSDN Forums Browser a bit more. Here's what it looks like so far. Still needs some refactoring though....

Sunday, June 7, 2009

A Recent Project I've Been Toying With

So I thought I would finally let everyone in on a recent project I've been toying with for the past few weeks. It's far from ready at this point, but I'm planning on throwing it up on CodePlex in the coming weeks to open it up to all those who would want to help me get it up and running. It's a Forums Client for MSDN. One has been attempted before by a group of people at Microsoft, but the APIs all changed and all the original developers working on the project have long since moved on to other projects. In fact, Rob Johnson, who gave me some of the information about how to access the information more simply than screen-scraping has himself already moved to another group within Microsoft.

I feel it's important that those of us who use the Forums heavily ought to be the most invested in coming up with some kind of desktop solution, as we're the ones for whom such a client would most directly benefit. I myself have written three or four clients for the MSDN Forums just for my own personal use, and have found them to save me tons of time in not having to switch windows and refresh pages. Unfortunately for my clients, the first few no longer function properly as they were mainly based on screen scraping, and the site has changed a few times in the last couple of years.

Anyways, without further ado, here's a small screenshot of what I've got so far. Again, as I've said, at this point the solution is very half-baked and not even close to ready for prime-time.

Microsoft, if you're listening, I'd love to have a way to use the Live Id Client to post responses directly from the client.

Thursday, June 4, 2009

A Brief Aside on Type Conversion Errors

There are four similar errors in C# that are related to type conversions. Three of them are compiler errors, and one of them is a runtime error. I think it's high time to clear up the differences between them, and lay them side by side so they make sense to some of you who may be confused by the differences between them.

1. CS0030 (compiler error) "Cannot convert type "Foo" to "Bar" - This error occurs when trying to explicitly cast one type to a completely unrelated type. This can be reproduced with this code:


class Foo { }
class Bar { }

class Program
{
static void Main(string[] args)
{
Foo foo = new Foo();
Bar bar = (Bar)foo;
}
}


The part that the compiler is complaining about here is specifically the "(Bar)foo" code. There is no conversion, so it complains there.

2. CS0029 (compiler error) "Cannot implicitly convert type 'Foo' to 'Bar'" - This error occurs when trying to implicitly cast one type to a completely unrelated type. This is reproduced with the following code:


class Foo { }
class Bar { }

class Program
{
static void Main(string[] args)
{
Foo foo = new Foo();
Bar bar = foo;
}
}


This one is similar to the next, however, it doesn't let you know that an explicit conversion exists, and for good reason: there's not one. The compiler here is complaining about the entirety of the second line here, mainly the conflict between the declared type of Bar and the declared type of Foo, as the two have no conversion between them.

3. CS0266 (compiler error) "Cannot implicitly convert type 'double' to 'int' An explicit conversion exists (are you missing a cast?)" - This error will only occur in C# 2.0 and following. It was added in C# 2.0 to add a hint as to whether or not an explicit conversion exists. Think of this one as CS0029 with a tip. It occurs when a casting operation exists for a particular type, but the casting operation must be an explicit cast. This error can be remedied by simply adding an explicit cast. This can be reproduced with the following couple of examples:


class Program
{
static void Main(string[] args)
{
double d = 0;
int i = foo;
}
}


And the other:


class Foo { }
class Bar : Foo { }

class Program
{
static void Main(string[] args)
{
Bar foo = new Foo();
}
}


In both of these situations, a conversion does exist, but you're simply not using it. This is used in situations where an implicit conversion doesn't exist for the type, but an explicit one does. This is a tricky error, though, because in some situations, the cast might cause the compiler to shuttup, only to make the application crash at runtime with the next exception I'll explain.

4. InvalidCastException (runtime error) "Unable to cast object of type 'Foo' to type 'Bar'" - This error occurs when a variable of a parent type is cast to a variable an inherited type, but the constructed type is actually that of the parent, or of a type inherited from the parent that is not within the inheritance tree of the type to which the cast is attempted. This can be reproduced with the following code:


class Foo { }
class Bar : Foo { }

class Program
{
static void Main(string[] args)
{
Foo foo = new Foo();
Bar bar = (Bar)foo;
}
}


The complaint happens not because of the declared type of foo, but because of it's the instantiated type. The instantiated type of Foo is Foo in this example. If I change that first line to the following the error will disappear:


Foo foo = new Bar();


The error here goes away, because I've changed the instantiated type of "foo" to Bar, so the conversion is valid, and the runtime marches onward.

For a full reference of all the compiler errors, check out this link.

Tuesday, June 2, 2009

Exploring a Hierarchy in F# with Immutability

Recently a task was given to me where I had to extract a new code file from an existing object. The requirements were to extract only the basics of a type, it's public properties on classes and public fields on enumerations. Here's the catch, however: the new code should not reference any of the existing types. What this means, is that my code is going to have to recurse through my custom types and generate those as well. This created a situation where I had to explore the entire hierarchy of types associated with one specific type, and generate a class file for each of them.

To make matters worse, there are cycles in the object model. A parent class may have a List of child class instances, and each child may hold a reference to the parent via a Parent property. Exploring the hierarchy and simply adding each and every property, then exploring each particular type as they come along would most certainly cause a stack overflow as I go back and forth between the parent and child types. So I needed the list generated to be filled with only unique values, and the output of the method should have only the unique values within the entire hierarchy.

My next challenge (which, I admit, was self-imposed) was to do this whole hierarchical search in a completely immutable way. Though the OO developer in me really wanted to simply add the types to a generic list of types, and use the handy Contains method to determine if I've already added a specific type to the list as I recurse the hierarchy, this, I felt, was not a purely "functional" solution to the problem. In the end, I came up with this generic method to explore a hierarchy and return the distinct values from within the hierarchy.


let rec ExploreHierarchy (x:'a) (f:'a -> seq<'a>) (list:list<'a>) =
let newlist = x::list
let notExists x = List.exists ((=) x) newlist |> not
let children = f x |> Seq.filter notExists |> Seq.distinct
let rec innerloop (possible:seq<'a>) (completed:list<'a>) =
match possible |> Seq.to_list with
| h::t -> if List.exists ((=) h) completed
then innerloop t completed
else ExploreHierarchy h f completed |> innerloop t
| [] -> completed
innerloop children newlist


The first argument passed into the hierarchy is the starting point to explore the hierarchy. The second argument is a function that, taking an instance of the same type as the first argument, will return a sequence of the same type. The last argument is an F# list containing the existing items so far. When this function is first called, you'll want to pass in an empty list ([]).

Inside the function, the first thing to happen is that a new list, containing x combined with the list passed in is created.

Next, a function is created to determine if an item exists or not.

Next, a distinct list of children that don't already exist in the list passed in is created. This list of children is going to be used to recursively call the innerloop function, which itself will call the ExploreHierarchy function, so it's important that the items on this list haven't already been explorered.

The innerloop function takes two lists, a possible list, and a completed list. The idea here is to populate the completed list, while removing items from the possible sequence one at a time until there are none left, at which point, the completed list is returned.

Lastly, the innerloop function is called, passing in the children, as well as the newlist.

A simple exploration might look like this. First, I have to define a function to get the children of a particular node:


let getProperties (x:Type) = 
x.GetProperties() |> Seq.map (fun x -> x.PropertyType)


Next, I could display all the types referenced by calling something like this:


ExploreHierarchy typeof<System.Windows.Forms.Form> getProperties [] |> List.iter (printfn "%A")  


Of course, my final implementation for my purposes was much more complex, involving some CodeDom and a much more sophisticated getProperties method, but I think you can see the general gist of it.

As you can see, F# is ideal for a situation like this. I've managed to create a completely generic function that explores a hierarchy and returns distinct children from any hierarchy. I think the big advantage here is the ability to treat a function as a first-class citizen. It makes the code much more succinct.

Why You Should Never Make Your Business Objects DataContracts

Once upon a time, there was a fledgling young developer. He wrote a nice, clean business object.


public class NiceCleanBusinessObject
{
public string Value { get; set; }

public string LastUpdatedBy { get; set; }

public DateTime? LastUpdatedOn { get; set; }
}


He liked to share, so one day, he decided to use WCF so his business object can be shared with other people. Not seeing a point in having a separate DataContract class from his business object, and wanting to save some time, he decorated his business objects with DataContract and DataMember attributes so he could avoid writing a translator class, which he considered an extra level of abstraction. His nice, clean business object then looked like this:


[DataContract]
public class NiceCleanBusinessObject
{
[DataMember]
public string Value { get; set; }

[DataMember]
public string LastUpdatedBy { get; set; }

[DataMember]
public DateTime? LastUpdatedOn { get; set; }
}


"That was easy!", he thought to himself, and went home for the afternoon.

The next day, his employer asked him to be sure to clear the LastUpdatedBy and LastUpdatedDate whenever the Value property was set, so he changed his business object again:


[DataContract]
public class NiceCleanBusinessObject
{
private string _value;

[DataMember]
public string Value
{
get { return _value; }
set
{
_value = value;
LastUpdatedBy = null;
LastUpdatedOn = null;
}
}

[DataMember]
public string LastUpdatedBy { get; set; }

[DataMember]
public DateTime? LastUpdatedOn { get; set; }
}


He sighed a happy sigh and went home.

The next day, his boss expresses interest in creating a holder class to transport several nice clean business objects across the wire. These business objects should be on a list in another object. Our hero creates this class:


[DataContract]
[KnownType(typeof(NiceCleanBusinessObject))]
public class NiceCleanParentObject
{
public NiceCleanParentObject()
{
NiceCleanObjects = new List<NiceCleanBusinessObject>();
}

[DataMember]
public List<NiceCleanBusinessObject> NiceCleanObjects { get; set; }
}


He's happy everything works, so he goes home.

The next day, he gets a request to create a new property, Value2. Value2 is set whenever Value is set, but can also be overridden by the user after Value is changed. But whenever Value is changed now, information from the NiceCleanParentObject must be used to set the information on Value2. So his code now looks like this:


[DataContract]
[KnownType(typeof(NiceCleanBusinessObject))]
public class NiceCleanParentObject
{
public NiceCleanParentObject()
{
NiceCleanObjects = new List<NiceCleanBusinessObject>();
}

[DataMember]
public string Value { get; set; }

[DataMember]
public List<NiceCleanBusinessObject> NiceCleanObjects { get; set; }
}

[DataContract]
public class NiceCleanBusinessObject
{
private string _value;

[DataMember]
public string Value
{
get { return _value; }
set
{
_value = value;
LastUpdatedBy = null;
LastUpdatedOn = null;
RunSomethingComplicated();
}
}

[DataMember]
public string Value2 { get; set; }

[DataMember]
public string LastUpdatedBy { get; set; }

[DataMember]
public DateTime? LastUpdatedOn { get; set; }

public NiceCleanParentObject Parent { get; set; }

private void RunSomethingComplicated()
{
Value2 = Parent.Value;
}
}


He compiles his code, and it won't run properly. Oh no! What shall he do? Every time he tries to pass the object over WCF, he gets a NullReferenceException. In hunting it down (which takes significantly longer than it should), it turns out that while serializing and deserializing a property, the business logic associated with the property actually gets executed. This is because setters and getters are actually called whenever the DataContractSerializer is running, and it's starting to affect his business object's data. He has a couple of options at this point:

1. He can serialize just the fields and automatic properties. He doesn't particularly like this idea, because other people are relying on the DataContract at this point, and relying on it to be in a particular format. This could become a little sticky for them.
2. He can create a base class that carries a flag that determines the current state of the object, whether it's being serialized or not. He decides this is a good idea.
3. He could try to serialize the parent as well. This has the advantage of giving access to the parent properties, but it's still relying on the implemented logic of the DataContractSerializer, in hoping that the Parent object's properties that are requested to be used are serialized before the List of NiceCleanBusinessObjects. Also, serializing the Parent property, when it carries a property containing a list of children creates cycles in the serialization process, because the parent serializes the children, which serializes the parent and so on so forth, so a workaround has to be made. Nevertheless, his UI architecture is depending on the fact that this Business Object self-updates itself whenever a property is set, so he still needs a reference to the parent. Our hero finds his workaround here.

So he implements numbers 2 and 3 above, and comes up with this:


[DataContract]
[KnownType(typeof(NiceCleanBusinessObject))]
public class NiceCleanParentObject
{
public NiceCleanParentObject()
{
NiceCleanObjects = new List<NiceCleanBusinessObject>();
}

[DataMember]
public string Value { get; set; }

[DataMember]
public List<NiceCleanBusinessObject> NiceCleanObjects { get; set; }
}

[DataContract, KnownType(typeof(NiceCleanBusinessObject))]
public class NiceCleanBusinessObjectBase
{
[DataMember]
protected bool _isSerializing;

[OnSerializing]
protected void OnSerializing(StreamingContext context)
{
_isSerializing = true;
}

[OnSerialized]
protected void OnSerialized(StreamingContext context)
{
_isSerializing = false;
}

[OnDeserializing]
protected void OnDeserializing(StreamingContext context)
{
_isSerializing = true;
}

[OnDeserialized]
protected void OnDeserialized(StreamingContext context)
{
_isSerializing = false;
}
}

[DataContract]
public class NiceCleanBusinessObject : NiceCleanBusinessObjectBase
{
private string _value;

[DataMember]
public string Value
{
get { return _value; }
set
{
_value = value;

if (!_isSerializing)
{
LastUpdatedBy = null;
LastUpdatedOn = null;
RunSomethingComplicated();
}
}
}

[DataMember]
public string Value2 { get; set; }

[DataMember]
public string LastUpdatedBy { get; set; }

[DataMember]
public DateTime? LastUpdatedOn { get; set; }

[DataMember]
public NiceCleanParentObject Parent { get; set; }

private void RunSomethingComplicated()
{
Value2 = Parent.Value;
}
}


Oh wow. This isn't so much of a nice clean business object anymore, now is it?

Well, I wish I could post more, but our time is short, so I'll tell you what ultimately happened. The business logic inside the business objects became so complicated, due to the fact that the data contract is the business object itself, that our hero ultimately decides that it would be better to switch to an anemic domain model, blasted by such architecture greats as Martin Fowler and several others.

Ultimately, he, and his team members have to support this model that is the result of his bad decision early on in the project, and he's had to learn a difficult lesson:

Don't make your business objects into data contracts. The translation layer is less work in the long run.