Tuesday, March 03, 2009

Serialize and Deserialize objects in .NET

I'm not sure why XML standard doesn't allow certain characters to be encoded into XML... and it causes problems.

This C# code:

XmlSerializer xs = new XmlSerializer(typeof(T));
using (MemoryStream memoryStream = new MemoryStream(StringToUTF8ByteArray(objString)))
{
obj = xs.Deserialize(memoryStream);
}

crashes with exception:
System.InvalidOperationException: There is an error in XML document (1, 50). ---> System.Xml.XmlException: ' ', hexadecimal value 0x0C, is an invalid character. Line 1, position 50.

Here's the fix and the fully working version (note that XmlTextReader is used in between MemoryStream and XmlSerializer:


[TestMethod()]
public void SerializeDeserializeObjectTest()
{
SerializeDeserializeObjectTest("test");
SerializeDeserializeObjectTest("\f");
}

private void SerializeDeserializeObjectTest(string input)
{
string serialized = Serializer.SerializeObject(input);
string deserialized = Serializer.DeserializeObject<string>(serialized);
Assert.AreEqual(input, deserialized, input);
}


public static class Serializer
{
public static string SerializeObject(Object obj)
{
MemoryStream memoryStream = new MemoryStream();
XmlSerializer xs = new XmlSerializer(obj.GetType());
XmlTextWriter xmlTextWriter = new XmlTextWriter(memoryStream, Encoding.UTF8);
xs.Serialize(xmlTextWriter, obj);
memoryStream = (MemoryStream)xmlTextWriter.BaseStream;
return UTF8ByteArrayToString(memoryStream.ToArray());
}

public static T DeserializeObject<T>(string objString)
{
Object obj = null;
XmlSerializer xs = new XmlSerializer(typeof(T));
using (MemoryStream memoryStream = new MemoryStream(StringToUTF8ByteArray(objString)))
{
XmlTextReader xtr = new XmlTextReader(memoryStream);
obj = xs.Deserialize(xtr);
}
return (T)obj;
}

private static string UTF8ByteArrayToString(byte[] characters)
{
UTF8Encoding encoding = new UTF8Encoding();
return encoding.GetString(characters);
}

private static byte[] StringToUTF8ByteArray(string xmlString)
{
UTF8Encoding encoding = new UTF8Encoding();
return encoding.GetBytes(xmlString);
}

}



Thanks to Tom Goff for XML Serialization Sorrows article.

Thanks to Andrew Gunn for XML Serialization in C# article.

5 comments:

RT said...

The Serialize method creates a string with two issues for me:

1. Creates as UTF-16 instead of UTF-8

2. Has a BOM coding at the beginning

Any thoughts on overcoming these issues?

Thanks.

RT said...

I found a solution to remove the BOM at http://baleinoid.com/whaly/tag/c/ (bottom of the page)

RT said...

Sorry...the code returns utf-8 not utf-16...I was looking at something else

Dennis Gorelik said...

RT -- thank you for the your additions.

Dennis Gorelik said...

Vilasini, it's better to have support of simple queries than none at all...

Followers

About Me

My Photo
Email me: blogger at dennisgorelik.com