Wednesday, June 27, 2012

How To Dump File Structure To XML. Part 1.

Goal: Having XSD grammar to describe file structure

Assume you want to reproduce (replicate) the file structure of some host you have no the direct access to. Assume you just want to save the file structure to perform some evaluation against it. The best way is to store it in XML file. This post is about how to do that. Also as a side effect the following topics're going to be slightly touched:

- JAXB: How to generate xml file from scratch
- How to apply XSD to build convenient XML parser via xjc.

So first of all we need to design the schema of the data we're going to keep. Having such the schema will allow us to build the parser which will be used to write the data to xml and to read them from it in a very simple way. One may develop the schema by using either DTD or XSD specifications. The second one allows to use xjc translator from j2SE distribution to translate the schema to parser classes.

What meta information do we need?

So let's consider the data structure we're going to process. This should represent the minimal meta-information of the file system. So it should support having the root node, files and folders which in turn should allow holding files and folders inside. Also the structural elements (files and folders) should have the attributes like "name" (mandatory one), "isHidden", "isArchived" etc.

XSD is the format of describing the rules certain XML should be built with. It's a sort of grammar for a language which extends XML. XSD is also the XML so it can represent hierarchical structure where elements have the types dictating the way how one can (and allowed to) use them in their document meeting the schema. For example you cannot assign literal value to the attribute if you specified numeric type for it in the schema.

XSD is the declarative language so that you describe the types (which btw can extend other ones), the relationships, restrictions etc. and then use this information depending on your current need (recall that our need is to build the parser)

Such the declarative nature allows to process XSD with dedicated processors (aka translators) which can generate Java classes representing the types, dependencies and relationships of the entities described in the schema. Such the classes are marked with special annotations and getter methods so that they are easy to use to parse XML via JAXB (JavaXML Binding) technology.

Let's check an example

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.notifymeplease.org/fstoxml/schema"
 xmlns:tns="http://www.notifymeplease.org/fstoxml/schema" elementFormDefault="qualified">

 <xs:element name="root" type="tns:XTFSFolderRoot"/>
 
 <xs:complexType name="XTFSFolderRoot">
  <xs:sequence minOccurs="0" maxOccurs="unbounded">
   <xs:choice>
    <xs:element name="folder" type="tns:XTFSFolder"/>
    <xs:element name="file" type="tns:XTFSFile"/>
   </xs:choice>
  </xs:sequence>
  <xs:attribute name="path" use="required"/>
 </xs:complexType>
 
 <xs:complexType name="XTFSFolder">
  <xs:complexContent>
   <xs:extension base="tns:XTFSFile">
    <xs:sequence minOccurs="0" maxOccurs="unbounded">
     <xs:choice>
      <xs:element name="folder" type="tns:XTFSFolder"/>
      <xs:element name="file" type="tns:XTFSFile"/>
     </xs:choice>
    </xs:sequence>
   </xs:extension>
  </xs:complexContent>
 </xs:complexType>

 <xs:complexType name="XTFSFile">
  <xs:attribute name="name" type="xs:string" use="required" />
  <xs:attribute name="isHidden" type="xs:boolean" use="optional"
   default="false" />
  <xs:attribute name="isReadOnly" type="xs:boolean" use="optional"
   default="false" />
  <xs:attribute name="isArchived" type="xs:boolean" use="optional"
   default="false" />
 </xs:complexType>

</xs:schema>

Mapping "types" to "tags"

This simple schema represents the data structure we're going to use. It consists of three types description: XTFSFile, XTFSFolder (which extends XTFSFile) and XTFSFolderRoot. These types will be then translated to corresponding Java classes and will contain getter methods for each the attribute returning the values of specified types for those attributes.

Note that the types do not mean tags in your xml which you're going to build according to this schema. To describe the rules of how to arrange the tags you should come up with so called "elements". Here in the example schema it is seen that we have element "root" (which describes how to use root tag in your XML). This element is of type XTFSFolderRoot. Notice that when we refer to this type we have to specify the namespace of the document we're currently developing. This is required to distinguish your custom types from the types of "http://www.w3.org/2001/XMLSchema" namespace. 

XTFSFolderRoot implies that we can have the sequence of  "folder" or "file" tags under the "root" one. This sequence can be infinite or it can not exist at all (minOccurs="0" maxOccurs="unbounded"). Folder and file tags should meet the rules described in XTFSFolder and XTFSFile types correspondingly. Note that folder type implies holding elements of either files or folders which is actually the  recursive dependency.

Types also contain the attribute description with flags of whether they are required or not, and if not then which value to use by default if they are not specified in the document.

So since we have this schema we can create some sample xml file and validate it to ensure the schema works correctly and catches all the places where the document doesn't meet it. However we are not going to use the schema to read the files but rather to write ones. Anyway we should translate it to the Java classes, include them to classpath and write the code to utilize them. 

All this stuff in the Part 2 of this post.