Tuesday, October 04, 2011

Defining a file format using XML and XML Schema (XSD) in C#/Java - I

Occasionally I need a custom file type for new application. I prefer to use OLE/COM Structured Storage since it offers me a compact file-system like hierarchy with Storages and Streams to save application data in binary format. My open source application "UVFSEditor" can view a compound file along with other file formats. The storages are shown in a tree like structure at the left and the streams are displayed in a HexEditor.

For quite a long time Microsoft is not doing anything around OLE/COM structured storage and the company itself is moving away from this file format for its own applications in favor of some XML formats. I have also been looking for an alternative for new applications. XML is a viable alternative since the information can be viewed and edited with simple text editor, validated using XML schema validator, the data can easily be exported to other file formats.

I divided the complete work in the following steps:
  1. XML Schema (XSD) definition
    • Let an extension for my file type: .xef (XML example file-type)
    • Generate an XML file with data compatible to previously defined schema
  2. Validate my XML with the schema
  3. Compress huge data fragments
  4. Encrypt sensitive data
  5. Implementation in Java (Compliance with XSD 1.1)
Schema definition
A file type has a definite structure which helps the application to save and load data in a systematic way. An XML schema helps to define the type and size of data along with their relationship with each other which creates a predefined structure of a generated file. XML schema serves the same purpose as a relational schema serves for a database. The World Wide Web Consortium (W3C) has a recommendation for defining XML schema called XML Schema Definition Language (XSD). The most recent recommendation of XSD 1.1 is in candidate status. The language itself is in XML.

Let's start with my sample file-type schema:

<?xml version=“1.0“?>

<xsd:schema xmlns:xsd=“http://www.w3.org/2001/XMLSchema“>
 <xsd:element name=“Document“>
    <xsd:complexType>
     <xsd:sequence>
       <xsd:element name=“Value1“ type=“xsd:integer“></xsd:element>
       <xsd:element name=“Value2“ type=“xsd:string“></xsd:element>
       <xsd:element name=“Value3“ type=“xsd:float“></xsd:element>
       <xsd:element name=“Settings2“>
         <xsd:complexType>
           <xsd:sequence minOccurs=“0“ maxOccurs=“1“>
             <xsd:element name=“ValueX“ type=“xsd:integer“>
             </xsd:element>
             <xsd:element name=“ValueBulks“>
               <xsd:complexType>
                 <xsd:sequence>
                   <xsd:element name=“ValueBulk“ type=“xsd:float“ minOccurs=“0“ maxOccurs=“unbounded“/>
                 </xsd:sequence>
                 <xsd:attribute name=“Compress“ use=“optional“ type=“xsd:boolean“></xsd:attribute>
               </xsd:complexType>
             </xsd:element>
           </xsd:sequence>
         </xsd:complexType>
       </xsd:element>
     </xsd:sequence>
    </xsd:complexType>
 </xsd:element>
</xsd:schema>

Extension for New File Format
XML is a text format, so is XSD. A file in Windows OS almost always has an extension which provides information to the Windows Shell which application should be started once a file of particular type is, for example, double clicked in Windows explorer. For my test application, I take chosen .xef (XML example file-type) as file extension

Sample XML file definition

 10
 Test File Format
 101
 
    30
    
     1.0
     2.0
    
 


I prefer Visual Studio IDE and C# whenever I need to try some demo application which has a user interface. The next post of this blog will include how to read & write an XML file and validate it against our predefined schema using .Net framework classes.

The complete code can be downloaded from: http://keensocial.freeiz.com/blogs/xmlfileformat/xmlfileformat.zip. The project file can only be opened in Visual Studio 2010. If you have other versions of Visual Studio, you have to create a new solution and add the extracted files from the zip file to the solution.

No comments:

Post a Comment