Validating with XML Schema
You're now ready to take a deeper look at the process of XML Schema validation. Although a full treatment of XML Schema is beyond the scope of this tutorial, this section shows you the steps you take to validate an XML document using an XML Schema definition. (To learn more about XML Schema, you can review the online tutorial, XML Schema Part 0: Primer, at
http://www.w3.org/TR/xmlschema-0/
. You can also examine the sample programs that are part of the JAXP download. They use a simple XML Schema definition to validate personnel data stored in an XML file.)At the end of this section, you'll also learn how to use an XML Schema definition to validate a document that contains elements from multiple namespaces.
Overview of the Validation Process
To be notified of validation errors in an XML document, the following must be true:
Configuring the DocumentBuilder Factory
It's helpful to start by defining the constants you'll use when configuring the factory. (These are the same constants you define when using XML Schema for SAX parsing.)
static final StringJAXP_SCHEMA_LANGUAGE
= "http://java.sun.com/xml/jaxp/properties/schemaLanguage"; static final StringW3C_XML_SCHEMA
= "http://www.w3.org/2001/XMLSchema";Next, you configure
DocumentBuilderFactory
to generate a namespace-aware, validating parser that uses XML Schema:... DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance()factory.setNamespaceAware(true); factory.setValidating(true); try { factory.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA); } catch (IllegalArgumentException x) { // Happens if the parser does not support JAXP 1.2 ... }
Because JAXP-compliant parsers are not namespace-aware by default, it is necessary to set the property for schema validation to work. You also set a factory attribute to specify the parser language to use. (For SAX parsing, on the other hand, you set a property on the parser generated by the factory.)
Associating a Document with a Schema
Now that the program is ready to validate with an XML Schema definition, it is necessary only to ensure that the XML document is associated with (at least) one. There are two ways to do that:
Note: When the application specifies the schema(s) to use, it overrides any schema declarations in the document.
To specify the schema definition in the document, you create XML like this:
<documentRoot
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation
='YourSchemaDefinition
.xsd' > ...The first attribute defines the XML namespace (
xmlns
) prefix,xsi
, which stands for "XML Schema instance." The second line specifies the schema to use for elements in the document that do not have a namespace prefix--that is, for the elements you typically define in any simple, uncomplicated XML document. (You'll see how to deal with multiple namespaces in the next section.)You can also specify the schema file in the application:
static final String schemaSource = "YourSchemaDefinition
.xsd"; static final StringJAXP_SCHEMA_SOURCE
= "http://java.sun.com/xml/jaxp/properties/schemaSource"; ... DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance() ...factory.setAttribute
(JAXP_SCHEMA_SOURCE
, new File(schemaSource));Here, too, there are mechanisms at your disposal that will let you specify multiple schemas. We'll take a look at those next.
Validating with Multiple Namespaces
Namespaces let you combine elements that serve different purposes in the same document without having to worry about overlapping names.
Note: The material discussed in this section also applies to validating when using the SAX parser. You're seeing it here, because at this point you've learned enough about namespaces for the discussion to make sense.
To contrive an example, consider an XML data set that keeps track of personnel data. The data set may include information from the W2 tax form as well as information from the employee's hiring form, with both elements named
<form>
in their respective schemas.If a prefix is defined for the
tax
namespace, and another prefix defined for thehiring
namespace, then the personnel data could include segments like this:<employee id="..."> <name>....</name><tax:form>
...w2 tax form data...</tax:form>
<hiring:form>
...employment history, etc....</hiring:form>
</employee>The contents of the
tax:form
element would obviously be different from the contents of thehiring:form
and would have to be validated differently.Note, too, that in this example there is a default namespace that the unqualified element names
employee
andname
belong to. For the document to be properly validated, the schema for that namespace must be declared, as well as the schemas for thetax
andhiring
namespaces.
Note: The default" namespace is actually a specific namespace. It is defined as the "namespace that has no name." So you can't simply use one namespace as your default this week, and another namespace as the default later. This "unnamed namespace" (or "null namespace") is like the number zero. It doesn't have any value to speak of (no name), but it is still precisely defined. So a namespace that does have a name can never be used as the default namespace.
When parsed, each element in the data set will be validated against the appropriate schema, as long as those schemas have been declared. Again, the schemas can be declared either as part of the XML data set or in the program. (It is also possible to mix the declarations. In general, though, it is a good idea to keep all the declarations together in one place.)
Declaring the Schemas in the XML Data Set
To declare the schemas to use for the preceding example in the data set, the XML code would look something like this:
<documentRoot
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:noNamespaceSchemaLocation
="employeeDatabase
.xsd"xsi:schemaLocation
= "http://www.irs.gov/fullpath
/w2TaxForm.xsd http://www.ourcompany.com/relpath
/hiringForm.xsd"xmlns:tax
="http://www.irs.gov/"xmlns:hiring
="http://www.ourcompany.com/" > ...The
noNamespaceSchemaLocation
declaration is something you've seen before, as are the last two entries, which define the namespace prefixestax
andhiring
. What's new is the entry in the middle, which defines the locations of the schemas to use for each namespace referenced in the document.The
xsi:schemaLocation
declaration consists of entry pairs, where the first entry in each pair is a fully qualified URI that specifies the namespace, and the second entry contains a full path or a relative path to the schema definition. (In general, fully qualified paths are recommended. In that way, only one copy of the schema will tend to exist.)Note that you cannot use the namespace prefixes when defining the schema locations. The
xsi:schemaLocation
declaration understands only namespace names and not prefixes.Declaring the Schemas in the Application
To declare the equivalent schemas in the application, the code would look something like this:
static final StringemployeeSchema
= "employeeDatabase.xsd"; static final StringtaxSchema
= "w2TaxForm.xsd"; static final StringhiringSchema
= "hiringForm.xsd"; static final String[]schemas = { employeeSchema, taxSchema, hiringSchema, }
; static final String JAXP_SCHEMA_SOURCE = "http://java.sun.com/xml/jaxp/properties/schemaSource"; ... DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance() ...factory.setAttribute
(JAXP_SCHEMA_SOURCE,schemas
);Here, the array of strings that points to the schema definitions (
.xsd
files) is passed as the argument to thefactory.setAttribute
method. Note the differences from when you were declaring the schemas to use as part of the XML data set:To make the namespace assignments, the parser reads the
.xsd
files, and finds in them the name of the target namespace they apply to. Because the files are specified with URIs, the parser can use anEntityResolver
(if one has been defined) to find a local copy of the schema.If the schema definition does not define a target namespace, then it applies to the default (unnamed, or null) namespace. So, in our example, you would expect to see these target namespace declarations in the schemas:
At this point, you have seen two possible values for the schema source property when invoking the
factory.setAttribute()
method: aFile
object infactory.setAttribute(JAXP_SCHEMA_SOURCE, new File(schemaSource))
and an array of strings infactory.setAttribute(JAXP_SCHEMA_SOURCE, schemas)
. Here is a complete list of the possible values for that argument:
Note: An array of
Object
s can be used only when the schema language (likehttp://java.sun.com/xml/jaxp/properties/schemaLanguage
) has the ability to assemble a schema at runtime. Also, when an array ofObject
s is passed it is illegal to have two schemas that share the same namespace.