in Web Services

Versioning XML Schemas

XML versioning is a much talked about problem. Despite the volume of information available on XML and schema versioning, most people agree that this problem is hard with no single "silver-bullet" solution. In this post, I would like to discuss some possible solutions that can be implemented with data binding solutions such as
XmlBeans. As always, this is work-in-progress, and I welcome any comments.

I have two objectives in this post. My first objective is to explore a few possibilities for versioning of XML schemas. My second objective is to see the impact of schema versioning on XML processing applications. Note that, in most cases, it is unrealistic to assume that applications can maintain different code paths to process instance documents confirming to each version of the schema. So, we need versioning solutions that make it easy for the same code to deal with the current as well as older versions at the same time.

As I commented in one of my
previous post
s, XML extensibility solutions do not address versioning needs. Although you could still mix versioning solutions with extensibility solutions, in this post, I would like to consider versioning alone.

The Schema Versioning Problem

What exactly is the schema versioning problem and why should anyone care about it?

Versioning describes evolutionary changes, and for a variety of reasons, applications creating/processing XML, as well as people creating XML documents need to know which version of the schema a given XML instance document conforms to. For instance, when you are creating a J2EE deployment descriptor, you need to know whether you can use a specific feature available in a specific version of an API or not. Since XML schema is the most popular and common form of describing XML documents, XML versioning problem almost always becomes an XML schema versioning problem.

Let’s say we have some XML instance documents conforming to an XML schema. Let’s call it V1 schema. Let’s say we made some changes to the schema, and added some features to the application processing this XML. Let’s call this the V2 schema. To take advantage of these changes, we create some new instance documents that conform to the modified schema. Now we have some instance documents that conform to the V1 schema, and some that conform to the modified schema. If the original documents and the new documents are being processed by different applications completely isolated from one another, we have no problem. There is no need to think of changes to the schema as an evolution from V1 schema into the V2 schema. Let us therefore consider the case of the same application processing both V1 instances, and the new V2 instances. In this case, the schema modifications must be treated as an evolution from V1 to V2. The question is how to design the V2 schema such that the following conditions are met:

  • The XML processing application should be able process instance documents conforming to current and older schema versions, without requiring any changes to instance documents. This addresses backwards compatibility.
  • Applications/users should be able to update an instance document
    conforming to one one version to the next version with minimal changes. Tools/users creating these documents should not be required to modify the XML content (e.g., adding new elements or attributes, or removing elements or attributes) to conform to a new schema version. Think of upgrading a J2EE web app deployment descriptor from Servlet API 2.x to Servlet API 2.y. You should be able migrate such descriptors by simply changing the schema URI in the deployment descriptor.

Ground Rules

I learned these ground rules from David Orchard‘s excellent papers on XML extensibility and forwards/backwards compatibility. Here are the kind of changes that we can allow in the schema such that backwards compatibility is guaranteed:

  1. Must not add new required elements or attributes in the schema
  2. Must not remove any existing elements or attributes
  3. Must not make changes to the existing schema

The first and second rules guarantee that applications/users can adopt to the new schema without changing the XML content. This is also a general software design principle. One you publish an interface, it is set is stone, and you should not introduce incompatible changes.

The third rule guarantees that applications that use a given version of the schema do not break. This rule will also help maintaining a consistent view of types defined in a given schema.

Note that changing the value of schemaLocation attribute of instance documents is not a versioning solution. The schemaLocation is just a hint, and need not resolve to any physical schema document.

Example

For the sake of illustration, let me consider the following schema (the Version 1).

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="my:v1" xmlns:v1="my:v1"
elementFormDefault="qualified" attributeFormDefault="qualified">
<xs:complexType name="NameType">
<xs:sequence>
<xs:element name="fName" type="xs:string"/>
<xs:element name="lName" type="xs:string"/>
</xs:sequence>
</xs:complexType>

<xs:complexType name="AddressType">
<xs:sequence>
<xs:element name="street" type="xs:string"/>
<xs:element name="city" type="xs:string"/>
<xs:element name="zip" type="xs:string"/>
<xs:element name="state" type="xs:string"/>
</xs:sequence>
</xs:complexType>

<xs:complexType name="PersonInfoType">
<xs:sequence>
<xs:element name="name" type="v1:NameType"/>
<xs:element name="address" type="v1:AddressType"/>
</xs:sequence>
</xs:complexType>

<xs:complexType name="EmployeeType">
<xs:sequence>
<xs:element name="name" type="v1:NameType"/>
<xs:element name="id" type="xs:string"/>
<xs:element name="homeAddress" type="v1:AddressType"/>
</xs:sequence>
</xs:complexType>
<xs:element name="Employee" type="v1:EmployeeType"/>
</xs:schema>    

Note that, in this schema, I declared all complex types globally so that we could reuse these types in other schemas. Here is an instance document that conforms to the above schema.

<Employee xmlns="my:v1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="my:v1 v1.xsd">
<name>
<fName>Foo</fName>
<lName>Bar</lName>
</name>
<id>12345</id>
<homeAddress>
<street>1 Main Street</street>
<city>Nameless</city>
<zip>01010</zip>
<state>ZA</state>
</homeAddress>
</Employee>    

Extending the Schema

The first solution to try was creating a new V2 schema that extends the V1 schema. This approach consists of the following steps:

  1. Create a new schema, and import the V1 schema into it
  2. For each modified type, create an extended type in the new schema

For instance, if we want to add a social security number to the employee, we could do the following:

  • Define a new EmployeeType extending the EmployeeType from the V1 schema
  • Add an optional ssn element to the EmployeeType.
  • Declare a global element for the new EmployeeType

Here is the outcome.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="my:v2" xmlns:v2="my:v2"
xmlns:v1="my:v1" elementFormDefault="qualified" attributeFormDefault="qualified">

<xs:import namespace="my:v1" schemaLocation="v1.xsd"/>

<xs:complexType name="EmployeeType">
<xs:complexContent>
<xs:extension base="v1:EmployeeType">
<xs:sequence>
<xs:element name="ssn" type="xs:string" minOccurs="0" maxOccurs="1"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:element name="Employee" type="v2:EmployeeType"/>
</xs:schema>  

This schema declares a new EmployeeType in a new namespace my:v2 leaving other types from the V1 schema unchanged. We can easily convert an existing V1 instance document to conform to the V2 schema. Here is an example.

<v2:Employee xmlns:v2="my:v2" xmlns="my:v1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="my:v2 v2.xsd">
<name>
<fName>Jon</fName>
<lName>Doe</lName>
</name>
<id>12345</id>
<homeAddress>
<street>1 Main Street</street>
<city>Nameless</city>
<zip>01010</zip>
<state>ZA</state>
</homeAddress>
<v2:ssn>00-11-0000</v2:ssn>
</v2:Employee>    

This instance document adds an ssn element in the V2 namespace.

Adding an ssn element to the schema turns out to be simple problem. A more common issue is adding an optional element to a nested element, such as adding an optional country code the employee’s home address. This turns out be harder.

First of all, since we want to add an optional element to AddressType, we will have to create new type that extends V1 AddressType and adds the country code. Since we have to replace the original AddressType with its
extended type, we can not use schema extension or restriction. This leaves us with declaring a new EmployeeType in the V2 schema, as shown below.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="my:v2" xmlns:v2="my:v2"
xmlns:v1="my:v1" elementFormDefault="qualified" attributeFormDefault="qualified">
<xs:import namespace="my:v1" schemaLocation="v1.xsd"/>

<xs:complexType name="AddressType">
<xs:complexContent>
<xs:extension base="v1:AddressType">
<xs:sequence>
<xs:element name="country" type="xs:string" default="US" minOccurs="0" maxOccurs="1"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>

<xs:complexType name="EmployeeType">
<xs:sequence>
<xs:element name="name" type="v1:NameType"/>
<xs:element name="id" type="xs:string"/>
<xs:element name="homeAddress" type="v2:AddressType"/>
<xs:element name="ssn" type="xs:string" minOccurs="0" maxOccurs="1"/>
</xs:sequence>
</xs:complexType>
<xs:element name="Employee" type="v2:EmployeeType"/>
</xs:schema>  

This works, but there is a disadvantage with this approach. This schema defines a new EmployeeType complex type that is not related to the EmployeeType defined in the V1 schema. If you are using XML data binding frameworks such as XmlBeans, you will notice that the Java interface generated for the EmployeeType in the V2 schema will not extend the Java interface generated for the EmployeeType in the V1 schema. So, if you have code using such generated code, you will have to rewrite/refactor portions of such code to deal with new types. This may not always be easy to deal with.

The problem gets harder with the degree of nesting. So, a lesson to learn from this approach is to avoid making changes to inner elements if you want to maintain a high degree of type reusability (and hence code reusability).

Create a New Schema without Extensions

Let me now try a different approach. This approach consists of copying the V1 schema into V2 schema, and making changes to that. Here is the V2 schema.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="my:v2new"
xmlns:v2="my:v2new" elementFormDefault="qualified" attributeFormDefault="qualified">
<xs:complexType name="NameType">
<xs:sequence>
<xs:element name="fName" type="xs:string"/>
<xs:element name="lName" type="xs:string"/>
</xs:sequence>
</xs:complexType>

<xs:complexType name="AddressType">
<xs:sequence>
<xs:element name="street" type="xs:string"/>
<xs:element name="city" type="xs:string"/>
<xs:element name="zip" type="xs:string"/>
<xs:element name="state" type="xs:string"/>
<xs:element name="country" type="xs:string" default="US" minOccurs="0" maxOccurs="1"/>
</xs:sequence>
</xs:complexType>

<xs:complexType name="PersonInfoType">
<xs:sequence>
<xs:element name="name" type="v2:NameType"/>
<xs:element name="address" type="v2:AddressType"/>
</xs:sequence>
</xs:complexType>

<xs:complexType name="EmployeeType">
<xs:sequence>
<xs:element name="name" type="v2:NameType"/>
<xs:element name="id" type="xs:string"/>
<xs:element name="homeAddress" type="v2:AddressType"/>
<xs:element name="ssn" type="xs:string" minOccurs="0" maxOccurs="1"/>
</xs:sequence>
</xs:complexType>
<xs:element name="Employee" type="v2:EmployeeType"/>
</xs:schema>    

The first thing to note is that this approach does not care about type reusability. This schema still honors the same constraints that we used with the first approach. This schema does not introduce no new required attributes or elements, and does not remove any existing required/optional elements or attributes. All additional elements introduced are optional.

As I mentioned above, this kind of versioning forces the application to have separate code paths for dealing with each version of the schema. More about this in my next post.

Other Solutions

While working on this problem, I have come across a number of other solutions for the versioning problem. I would like to briefly comment on these solutions.

  • XML Schema Versioning mentions some approaches for capturing a version identifier within the schema, either by using the version attribute on the schema element, or using a schemaVersion attribute on the root element of a document. However, as this document rightly points out, XML parsers and validators cannot enforce versioning with these approaches.
  • In this
    XML 2004 presentation
    , Jim Gabriel discusses the motivation for
    versioning very well. However I find the techniques discussed for
    versioning not very useful. It is difficult to enforce techniques like comments, file naming conventions, or version attributes in code. In the same bucket, he also talks about storing schemas in databases, and I must admit I don’t understand how this can address versioning. But since I did not attend XML 2004, I can not comment further on the rationale.
  • XML Schema Best Practices talks about the conditions under which you
    can reuse a namespace for new versions of the schema. The conditions
    essentially say that you can reuse the namespace as long as the changes are compatible. But such an approach may break applications written to older versions of the schema. For example, think of an application that routes XML documents based on the contents. Such an application may fail if it finds elements declared in a later version of the schema.

Concluding Thoughts

If type reuse is important, I would recommend using extensions. I
experimented with XmlBeans, and found that the generated interfaces take advantage of extensions.

If esthetics are important and if the instance documents are required to be human-editable, I would recommend redefining all types in a new schema version. But this may require maintaining different code paths for each version. In my next post, I will write about a solution that lets the same code deal with current as well as older versions of the schema.

Write a Comment

Comment

  1. Hello,
    As you say, this is a interesting problem. Unfortuntatly I don’t understand it fully yet and I have a related problem!

    Application A use schema v1. Application 2 use schema v2. That contains exactly the same information as v1, but adds one optional element.

    Application 1 is allready deployed and running on v1.

    What happens if application 2 sends a XML document formatted after v2 to application 1?
    Will it parse the document at all? Will it just drop the additional element?

    I guess it depends on the application and the parser – or is it anything set out in XML standard which would define the behaviour?