The first step in parsing an XML file with simple API for XML, is to create an event handler class. I'll show you how to get started with this in the project SAXEventHandler. In this project, there's a main class called ReadXMLWithSAX. It has a main method which has a file name variable which is constructed from the data provider's dated Re: constant and the name of the file customers.XML. The main method has a throws clause with exception because we'll be dealing with some exceptions as we deal with simple API for XML.
There's also a plain old Java object, the customer class. This version of the customer class has just the setters and getters that are needed, the eight private fields, the constants for the names of the elements in XML or in JSON, and down at the bottom, the two-string method to output the customer information. To read data with sacks, you create a class and extend the class called Default Handler. And we'll do that with this class. SAXCustomerHandler.
Right now this class has two private members, a list of customer objects named data and a constant called XML date format. We'll get to that later. It also has a method called readDataFromXML. That receives the name of the file to be read and returns the data object. To implement the simple API for XML, take this class and add an extends clause. An extended class called DefaultHandler, which is a member of the package Org.xml.sacks.helpers.
When you select the class, any import statement should be added at the top. Now, here's how SAX works. The default handler class is a concrete class that has a set of default methods, but in order to handle any of the events that happen as an XML file is read, you override those methods. So now that we've extended the class, we can add the appropriate overrides, and I'll start by adding five overrides. I'll place the cursor after readDataromXML, the method, and press control space, and I'll select the Start Document method.
That creates an override version of that method. Within the method, I'll get rid of the comment and the call to the super classes version of the method, and I'll replace it with some system output and I'll output the string, Start document. Now I'll do the same thing and add an override of the end document method. I'll press Ctrl+Space. And I'll type end, and choose end document. And then I'll add some console output the same way I did with start document.
With sys out and then the string end document. Now I'll do the same thing for three more events. I'll override the start element method. There are a couple of versions. And I'll choose this one, which receives four arguments named URI, local name, queue name and attributes. I'll once again remove the comment and the call to the super classes method. And now here's what's going to happen. As the SAXParser encounters a start element, it'll trigger a call to this method.
And it'll pass in the name of the element and a collection of attributes. For an XML file without any namespaces or prefixes, the name of the element will be in the qName argument. So I'll add some console output, and I'll output Start element and then I'll append to that qName. Next I'll add an override for the end element event and I'll do the same thing. I'll copy and paste that output code. And I'll change the label from Start element to End element.
Finally, I'll add an override for the characters event. The characters event is triggered whenever a string of characters is encountered. This could be white space, that is, spaces, tabs and line feeds, or it could be meaningful data. And it's possible for the character's event to happen more than once between a start element and an end element. Say that you have a string consisting of plain text plus an entity, such as an ampersand, and then some more plain text.
In some environments that could trigger the characters event three times. For this exercise we won't worry about that, I'll just override the event and then add output to say that the event happened. So now we have a useable event handler for SAX. The next step is to go back to the method readDataFromXML, and add the code that will actually parse the document. We'll use two classes here named SAXParserFactory and SAXParser. Start with the factory class.
I'll type the beginning of the class name and I'll choose SAXParserFactory and I'll name this object factory, and I'll get its reference by calling a static method. SAXParserFactory.newInstance. Now, at this point, I could change the behavior of the factory by calling one of these methods. There's a method called set name space aware, one called set validating, another called set feature, and so on. But, I'm not going to do that. I'm just going to use a default factory object.
and I'll use it to create a parser object. I'll declare a new object named SAXParser, and I'll name it parser, and I'll get its reference with the method factory.newSAXParser. And now, I have a parser object. The next step is to tell the parser object to parse the file. I already have the file name, its being passed in here as an argument of the read data from XML method. So I'll wrap that in a file object and pass it to the parser. With this code, parser.parse, and for the first argument, I'll pass in a new file object, and wrap it around the file name.
Be sure to add an import for the file class. The next argument is the eventHandler object, which must extend defaultHandler, and this class is the one that's doing that, so I'll simply pass in this. Meaning use the current object to manage all the events that the SAXParser will emit. Now I have some error indicators, so I'll deal with them by pressing Ctrl+1 here, for a Mac press Cmd+1.
And I'll add a throws declaration to the method signature. And notice that there are two possible exceptions, SAXException, which can be thrown by the SAX Event methods and the IOException, which can be thrown by the file class. I have an error on the next line too and I'll throws declaration for that and I'll get a ParserConfigurationException. So those are all the exceptions that can be thrown by the code I have so far. So here's what this class is doing so far.
It has a public method called readDataFromXML, which receives the file name. It creates the SAXParserFactory and the parser, and then parses the file. As the file is parsed, the parser object calls all of these other methods as call back methods. At the beginning of the document, it calls the start document method. At the beginning of each element and the end of each element, it calls those methods, and so on. So now we'll go back to our main class, ReadXMLWithSAX, and we'll call this class and this method.
I'll create a new instance of the class, SAXCustomerHandler. That's the one I was just working on, and I'll name it saxhandler; and I'll instantiate it by calling it noargumentconstructor. Then I'll call the objects readDataromXML method, and I'll pass in the file name that's already been defined above. I'll get rid of this suppressed warnings annotation. I don't need that anymore. And now I'm ready to save and test the code. When I run the code, I see a whole string of output in the console.
The top of the output has already been lost, but if you scroll down you'll see a pattern emerge. Here's a start element for the customer element. Some characters which would be white space. Start element for name and some characters and then End element for name. Start element for phone, some characters and End element. And again you'll see a bunch of characters events happening which are triggered by white space between End elements and Start elements. In the next exercise, I'll show you how to write the code that can figure out when the character's event is meaningful and when it isn't.
And how to track all this information and store it, so that you can put it into a form that's meaningful for your java-based application.
- Choosing a Java-based XML API
- Reading XML as a string
- Comparing streaming and tree-based APIs
- Parsing XML with SAX
- Creating and reading XML with DOM
- Adding data to an XML document with JDOM
- Reading and writing XML with StAX
- Working with JAXB and annotated classes
- Comparing Simple XML Serialization to JAXB