Monday, October 10, 2011

Defining a file format using XML and XML Schema (XSD) in C#/Java - III

The complete implementation of XML file format is divided into the following steps:
  1. XML Schema (XSD) definition
    • Let an extension for my file type: .xef (XML example file-type)
    • Generate an XML file with data compatible to previously defined schema
  2. Validate my XML with the schema
  3. Compress huge data fragments
  4. Encrypt sensitive data
  5. Implementation in Java (Compliance with XSD 1.1)
Compressing Data
In the first post of this series we have defined an XML schema and created an example file complied with the schema while in the second post we have written C# code to manipulate this XML file and validate it with the predefined schema.

 The tag-based verbose representation of XML files was a matter of debates due to increasing overhead during serialization and transfer over networks. There were number of groups involved in addressing and effectively overcoming this problem. W3C Efficient XML Interchange (EXI) Working Group has only recently published the recommendation for binary coded XML on March, 2011. There are only few well developed tools which implemented this recommendation. Two open source projects, both programmed in Java, are actively working on it - EXIficient and OpenEX. efficientXML of Agile Delta is a popular XML compression commercial alternative.

In today's post, we will use some general compression mechanisms like Zip to compress large datasets into binary datasets, encode them to Text form and save into the XML file. The problem with such compression is that you need to rework the existing schema otherwise document might be incompatible with the schema. The following code snippet will return the compressed XML fragment as a string:

public string GetXmlFragmentForValueBulks()
{
    XmlDocument xmlDocument = new XmlDocument();
    XmlElement xmlElementCompress = xmlDocument.CreateElement("ValueBulks");
    xmlDocument.AppendChild(xmlElementCompress);
           
    m_bulk.ForEach(f =>
    {
        XmlElement xmlElementChild = xmlDocument.CreateElement("ValueBulk");
        xmlElementChild.InnerText = f.ToString("#.0", Thread.CurrentThread.CurrentUICulture);
        xmlElementCompress.AppendChild(xmlElementChild);
    });

    if (m_compress)
    {
        byte[] byteArray = Encoding.Unicode.GetBytes(xmlElementCompress.InnerXml);
        MemoryStream stream = new MemoryStream(byteArray);
        MemoryStream outStream = new MemoryStream();
        using (GZipStream compression = new GZipStream(outStream, CompressionMode.Compress, true))
        {
            compression.Write(byteArray, 0, byteArray.Length);
            compression.Close(); // we must close it
            byteArray = new byte[outStream.Length];
            outStream.Position = 0;
            outStream.Read(byteArray, 0, byteArray.Length);
        }
        outStream.Close();
        return Convert.ToBase64String(byteArray);
    }

    return xmlElementCompress.InnerXml;
}
Now load the compressed XML fragment and add it to the document in uncompressed form:
XmlDocument xmlDocument = new XmlDocument();
xmlDocument .Load(fileName);

XmlElement xmlElementChild = xmlDocument.SelectSingleNode("ValueBulks") as XmlElement;

byte[] byteArray = Convert.FromBase64String(xmlElementChild.InnerText);
MemoryStream instream = new MemoryStream();
MemoryStream stream = new MemoryStream(byteArray);
stream.Position = 0;
using (GZipStream decompression = new GZipStream(stream, CompressionMode.Decompress))
{
    decompression.CopyTo(instream);
}
byteArray = new byte[instream.Length];
instream.Position = 0;
instream.Read(byteArray, 0, byteArray.Length);
value = Encoding.Unicode.GetString(byteArray);
xmlElementChild.InnerXml = value;
The following uncompressed XML fragment (... shows for continuity):
<Settings2>
    <ValueX>30</ValueX>
    <ValueBulks Compress="false">
        <ValueBulk>.0</ValueBulk>
        <ValueBulk>1.0</ValueBulk>
        ...
        ...
        <ValueBulk>98.0</ValueBulk>
        <ValueBulk>99.0</ValueBulk>
    </ValueBulks>
</Settings2>

will be converted to the following compressed fragment:
<Settings2>
    <ValueX>30</ValueX>
    <ValueBulks Compress="true">H4sIAAAAAAAEAOy9B2A...ppD8CFQAA</ValueBulks>
</Settings2>
My upcoming post will be last in this series where I will write about how to encrypt your sensitive data and still comply with the schema.


The complete code can be downloaded from: http://keensocial.freeiz.com/blogs/xmlfileformat/xmlfileformat.zip. The project file can only be opened in Visual Studio 2010. If you have other versions of Visual Studio, you have to create a new solution and add the extracted files from the zip file to the solution.

No comments:

Post a Comment