Join David Gassner for an in-depth discussion in this video How SAX works, part of Java: XML Integration.
The first API I'll describe in this course is SAX or the simple API for XML. SAX is a streaming API. It's a read-only API so you can use it to parse XML content, but it does not have the ability to write out or serialize XML. SAX is an event-based parser. As it reads an XML file, it emits events, and then you capture those events with your own code. It's one of the very earliest XML APIs, and it's called the Simple API for XML, because when it was created, it represented a much simpler approach to reading XML than hand-parsing a plain text.
It was originally created by David Megginson, but it's a completely open source API. And in fact, it's bundled with pretty much all versions of Java. The SAX API, as a streaming processor, is much faster and can use much less memory than a tree-based processor such as DOM. There are two kinds of streaming processors, push and pull parsers. SAX is a push parser. That means that the primary control of the parsing process is handled by code that you don't own as the developer.
As that process reads the XML content, it pushes the data into your custom code that's encapsulated in callback methods that you define. SAX works fine on Android, and in fact is bundled as part of the Android runtime. Some developers like to use it, while others prefer the XML pull parser that's also a part of Android. Both are streaming processors, but they represent very different coding models. All streaming proccessor have certain benefits.
A streaming processor is a forward only processor. It only knows how to go from the beginning of the XML file to the end of the XML. As the XML is read into memory, the processor emits events to share the data with the developer. But after each event is handled, the data that's associated with that event can be discarded from memory by the processor. So, streaming processors are capable of handling very large XML content. The entire document doesn't have to be in memory all at the same time.
In order to read XML with SAX, you'll deal with events to read the data. As the SAX parser moves forward through the XML content It will emit an event for each significant node in the XML file. Some of the most commonly used events to get the data from XML include the startDocument and endDocument events, startElement and endElement, and the charactersEvent, which reports when usable text is available.
There are also error handling events, including warning, error, and fatalError. These events are handled for you by the super class if you ignore them, and a fataError is just that. It'll stop the processing in its tracks. If you want some custom handling of the errors, you would override the methods for these events. Other events that are available include notations, processing instruction, ignorable white space and entities. I won't to cover these additional events in this course but they're available if you need them for more complex XML content.
To work with SAX, you'll create a custom Java class that extends a class called Default Hander. This is the super class for your event handler and it has implementations of each of these event methods I described, such as start document, end document, start element, and so on. When you extend the default handler class you inherit all of its methods. And then if you want to handle any particular data, you override those methods and create your own custom code.
So this is an example of a start document method. As the parser object starts to read the XML content, it'll call this method. And many of these methods will receive arguments that give you data. It's up to you to design your code to capture the data and save it in some way. In this chapter's movies, I'll show you some strategies for doing that. To launch the parsing of a document, you'll create an instance of a class called SAXParser. And, you'll create an instance of your handler class.
Then, you'll call the parse method of the parser. When you call the parse method, you can pass in a file, an input streamor a number of other sources. And then you pass in your handler object. The parse method then does its thing: reading the XML file and calling the methods of your handler object. Here are some things to know about the SAXParser. As I mentioned, it's up to you to figure out how to track the data. Each of the event call back methods is called individually.
There's no automatic sharing of data between those methods. So you'll need to create fields in your handler class to store data as it's collected. Again, I'll show you some strategies for this. Another thing to watch out for is that the characters event in SAX can be called more than once, even if there's only a single text node. One of the most common things you'll see is that if a text node has an entity, such as ampersand AMP semicolon, some SAX processors will call three characters events, one for the text before the entity, one for the entity itself, and one for the text after the entity.
So it's up to you to design code that can capture that text for each event and then concatenate it together. So those are some things about the nature of this SAXParser. In the next set of movies, I'll show you some sample code for parsing XML files with SAX.
- Choosing a Java-based XML API
- Reading XML as a string
- Comparing streaming and tree-based APIs
- Parsing XML with SAX
- Creating and reading XML with DOM
- Adding data to an XML document with JDOM
- Reading and writing XML with StAX
- Working with JAXB and annotated classes
- Comparing Simple XML Serialization to JAXB