Reading XML Data into a DOM
In this section, you'll construct a Document Object Model by reading in an existing XML file. In the following sections, you'll see how to display the XML in a Swing tree component and practice manipulating the DOM.
Note: In Chapter 7, you'll see how to write out a DOM as an XML file. (You'll also see how to convert an existing data file into XML with relative ease.)
Creating the Program
The Document Object Model provides APIs that let you create, modify, delete, and rearrange nodes. So it is relatively easy to create a DOM, as you'll see later in Creating and Manipulating a DOM.
Before you try to create a DOM, however, it is helpful to understand how a DOM is structured. This series of exercises will make DOM internals visible by displaying them in a Swing
JTree
.Create the Skeleton
Now let's build a simple program to read an XML document into a DOM and then write it back out again.
Note: The code discussed in this section is in
DomEcho01.java
. The file it operates on isslideSample01.xml
. (The browsable version isslideSample01-xml.html
.)
Start with the normal basic logic for an application, and check to make sure that an argument has been supplied on the command line:
public class DomEcho { public static void main(String argv[]) { if (argv.length != 1) { System.err.println( "Usage: java DomEcho filename"); System.exit(1); } }// main }// DomEchoImport the Required Classes
In this section, all the classes individually named so you that can see where each class comes from when you want to reference the API documentation. In your own applications, you may well want to replace the
import
statements shown here with the shorter form, such asjavax.xml.parsers.*
Add these lines to import the JAXP APIs you'll use:
import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.FactoryConfigurationError; import javax.xml.parsers.ParserConfigurationException;Add these lines for the exceptions that can be thrown when the XML document is parsed:
Add these lines to read the sample XML file and identify errors:
Finally, import the W3C definition for a DOM and DOM exceptions:
Note: A
DOMException
is thrown only when traversing or manipulating a DOM. Errors that occur during parsing are reported using a different mechanism that is covered later.
Declare the DOM
The
org.w3c.dom.Document
class is the W3C name for a DOM. Whether you parse an XML document or create one, aDocument
instance will result. You'll want to reference that object from another method later, so define it as a global object here:It needs to be
static
because you'll generate its contents from themain
method in a few minutes.Handle Errors
Next, put in the error-handling logic. This logic is basically the same as the code you saw in Handling Errors with the Nonvalidating Parser in Chapter 5, so we don't go into it in detail here. The major point is that a JAXP-conformant document builder is required to report SAX exceptions when it has trouble parsing the XML document. The DOM parser does not have to actually use a SAX parser internally, but because the SAX standard is already there, it makes sense to use it for reporting errors. As a result, the error-handling code for DOM applications are very similar to that for SAX applications:
public static void main(String argv[]) { if (argv.length != 1) { ... }try { } catch (SAXParseException spe) { // Error generated by the parser System.out.println("\n** Parsing error" + ", line " + spe.getLineNumber() + ", uri " + spe.getSystemId()); System.out.println(" " + spe.getMessage() ); // Use the contained exception, if any Exception x = spe; if (spe.getException() != null) x = spe.getException(); x.printStackTrace(); } catch (SAXException sxe) { // Error generated during parsing Exception x = sxe; if (sxe.getException() != null) x = sxe.getException(); x.printStackTrace();
} catch (ParserConfigurationException pce) { // Parser with specified options can't be built pce.printStackTrace(); } catch (IOException ioe) { // I/O error ioe.printStackTrace(); } }// mainInstantiate the Factory
Next, add the following highlighted code to obtain an instance of a factory that can give us a document builder:
public static void main(String argv[]) { if (argv.length != 1) { ... }DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
try {Get a Parser and Parse the File
Now, add the following highlighted code to get an instance of a builder, and use it to parse the specified file:
try {DocumentBuilder builder = factory.newDocumentBuilder(); document = builder.parse( new File(argv[0]) );
} catch (SAXParseException spe) {
Note: By now, you should be getting the idea that every JAXP application starts in pretty much the same way. You're right! Save this version of the file as a template. You'll use it later on as the basis for XSLT transformation application.
Run the Program
Throughout most of the DOM tutorial, you'll use the sample slide shows you saw in the Chapter 5. In particular, you'll use
slideSample01.xml
, a simple XML file with nothing much in it, andslideSample10.xml
, a more complex example that includes a DTD, processing instructions, entity references, and aCDATA
section.For instructions on how to compile and run your program, see Compiling and Running the Program from Chapter 5. Substitute
DomEcho
forEcho
as the name of the program, and you're ready to roll.For now, just run the program on
slideSample01.xml
. If it runs without error, you have successfully parsed an XML document and constructed a DOM. Congratulations!
Note: You'll have to take my word for it, for the moment, because at this point you don't have any way to display the results. But that feature is coming shortly...
Additional Information
Now that you have successfully read in a DOM, there are one or two more things you need to know in order to use
DocumentBuilder
effectively. You need to know about:Configuring the Factory
By default, the factory returns a nonvalidating parser that knows nothing about namespaces. To get a validating parser, or one that understands namespaces (or both), you configure the factory to set either or both of those options using following highlighted commands:
public static void main(String argv[]) { if (argv.length != 1) { ... } DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();factory.setValidating(true); factory.setNamespaceAware(true);
try { ...
Note: JAXP-conformant parsers are not required to support all combinations of those options, even though the reference parser does. If you specify an invalid combination of options, the factory generates a
ParserConfigurationException
when you attempt to obtain a parser instance.
You'll learn more about how to use namespaces in Validating with XML Schema. To complete this section, though, you'll want to learn something about handling validation errors.
Handling Validation Errors
Remember when you were wading through the SAX tutorial in Chapter 5, and all you really wanted to do was construct a DOM? Well, now that information begins to pay off.
Recall that the default response to a validation error, as dictated by the SAX standard, is to do nothing. The JAXP standard requires throwing SAX exceptions, so you use exactly the same error-handling mechanisms as you use for a SAX application. In particular, you use the
DocumentBuilder
'ssetErrorHandler
method to supply it with an object that implements the SAXErrorHandler
interface.
Note:
DocumentBuilder
also has asetEntityResolver
method you can use.
The following code uses an anonymous inner class to define that
ErrorHandler
. The highlighted code makes sure that validation errors generate an exception.builder.setErrorHandler( new org.xml.sax.ErrorHandler() { // ignore fatal errors (an exception is guaranteed) public void fatalError(SAXParseException exception) throws SAXException { }// treat validation errors as fatal public void error(SAXParseException e) throws SAXParseException { throw e; }
// dump warnings too public void warning(SAXParseException err) throws SAXParseException { System.out.println("** Warning" + ", line " + err.getLineNumber() + ", uri " + err.getSystemId()); System.out.println(" " + err.getMessage()); } } );This code uses an anonymous inner class to generate an instance of an object that implements the
ErrorHandler
interface. It's "anonymous" because it has no class name. You can think of it as an "ErrorHandler" instance, although technically it's a no-name instance that implements the specified interface. The code is substantially the same as that described in Handling Errors with the Nonvalidating Parser. For a more complete background on validation issues, refer to Using the Validating Parser.Looking Ahead
In the next section, you'll display the DOM structure in a
JTree
and begin to explore its structure. For example, you'll see what entity references andCDATA
sections look like in the DOM. And perhaps most importantly, you'll see how text nodes (which contain the actual data) reside under element nodes in a DOM.