Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Friday, September 28, 2007

Canonical XML

Indeed XML is widely accepted format to exchange structured information between heterogeneous and homogeneous systems. A structure or a piece of information can be represented in more than one way using XML. So, there is always a need arises to check if two XML documents are represents the same piece or structure of information. For example:
























represents the same information.

So, from definition perspective a XML document is passed through a set of rules. The resultant document is called the Canonical form. So, to verify two XML documents for its logical equivalence their canonical form is derived and then compared.

Stylus studio is one such IDE which is providing integrated support to convert an XML document to its canonical form.

Some of the rules for deriving a canonical form as per W3C specifications are:
  • XML documents should be encoded in UTF-8 encoding.
  • All line breaks (#xD or a combination of #xA and #xD) must be replaced with #xA before starting the canonicalization of a XML document.
  • Attributes must be normalized as per W3C specifications. There again some rules like replaces tabs with single spaces and so on...
  • Single quotes must not be used for attribute values.
  • Special characters are also required to be replaced with other simple character sequences like " with html notation '"'
  • All the entity references must be replaced by it absolute values.
  • Default attributes should be included in the XML document.
  • XML and DTD declarations should be removed to create a canonical XML document.
  • Shorter form of empty elements like <> is also need to be converted to a full format that is <>.
  • Namespace declarations must be kept intact in the canonical form.
  • Namespace and attribute declaration must follow the lexicographic ordering.
So, following above and few more rules as per W3C specification helps to achieve a logical form of a XML document against which another logical equivalence of another XML document can be compared.

This helps in areas like digital signature verification as this is one of the most widely used form for signing or authenticating electronic documents over net which may be XML or in non XML format.

No comments: