SAX
XML Parsers
Standards for XML parsers
-
DOM
= Document Object Model (tree-based)
W3C standard
-
SAX
= Simple API for XML (event-based)
“effective” standard, very popular
Versions for different programming languages (we only look at Java)
XML Parsers (General)
XML document XML parser App
XML parser: splits document into pieces and sends the “XML information set” to the application.
Event-based Parser
Events = callbacks sent to the application by the event-based parser (ie. element start, end)
Handlers = implemented by application to deal with events
Used for large documents with a lack of data structure.
- Sequential access
- Fast
- Constant memory
Callbacks
Callbacks
We call the SAX-parser (and supply it with methods) which it calls back:
startDocument(...)
beginning of parsing
startElement(...)
beginning of element
in depth
public void startElement(... , Attributes atts) throws SAXException
Attribute
is an interface with some useful methods:getLength()
- number of attributes
getLocalName(index)
- attribute’s local name
getQName(index)
- attribute’s qualified name
getValue(index)
- attribute’s value
getType(index)
- attribute’s type (CDATA, NMTOKEN, etc.)
characters(...)
character data
endElement(...)
end of element
endDocument(...)
last method called by parser
processingInstruction(...)
each processing istruction
<?target data?>
Event handlers
ContentHandler
(crucial) handles basic parsing call-backs
ie. element starts
we can use the
DefaultHandler
to implement stubs for all methods
then we only need to implement part of the interface
ErrorHandler
(crucial) handles parsing errors
DTDHandler
handles notation, unparsed entity declarations
EntityResolver
customized handling for external entities
Content handling: Examples
Example 1) Simple SAX program
<?xml version="1.0"?>
<course>Semi-structured Data</course>
startElement: course
characters: Semi-structured Data
endElement: course
Program consists of 2 classes:
CourseApp
import org.xml.sax.*; import org.xml.sax.helpers.*; public class CourseApp {public static void main(String[] args) throws Exception {//create XMLReader (parser reads and calls callbacks in handler) XMLReader parser = XMLReaderFactory.createXMLReader();//install the content handler MyHandler handler = new MyHandler(); parser.setContentHandler(handler); //start parsinqg for (int i =0; i < args.length; i++) { parser.parse(args[i]); } } }
MyHandler
contains handlers for 3 kinds of callbacks
import org.xml.sax.*; import org.xml.sax.helpers.*; // implements other methods public class MyHandler implements ContentHandler {//SAX calls this method when it encounters a start tag public void startElement(String namespaceURI, String localName, String qualifiedName, Attributes atts) throws SAXException { System.out.println("startElement: " + qualifiedName); }//SAX calls this method to pass in character data public void characters(char[] text, int start, int length) throws SAXException { System.out.println("characters: " + new String(text, start, length)); }//SAX calls this method when it encounters an end tag public void endElement(String namespaceURI, String localName, String qualifiedName) throws SAXException { System.out.println("endElement: " + qualifiedName); } }
Use the default-handler package
org.xml.sax.helpers.*
(implements all interface methods with empty body so we don’t have to).
Example 2)
Input
<products> <product> <name>Product A</name> <!-- fst --> <value>100</value> <product> <name>Product B</name> <value>200</value> <product> <name>Product E</name> <value>600</value> </product> </product> <product> <name>Product C</name> <value>400</value> <product> <name>Product E</name> <value>600</value> </product> </product> </product> <product> <name>Product D</name> <!-- snd --> <value>300</value> <product> <name>Product C</name> <value>400</value> <product> <name> Product E </name> <value>600</value> </product> </product> </product> </products>
Output
Total value of Product A: 1900 Total value of Product D: 1300
Handler:
TopProducts
import org.xml.sax.*; public class TopProducts implements DefaultHandler { String eleText; private int level = 0; private int value = 0;public void startElement(String namespaceURI, String localName, String qName, Attributes atts) throws SAXException { if ("product".equals(localName)) { level++; } }// SAX calls this method to pass in character data public void characters(char[] text, int start, int length) throws SAXException { eleText = new String(text, start, length); } public void endElement(String namespaceURI, String localName, String qName) throws SAXException { if ("name".equals(localName) && level == 1) { System.out.print("Total value of " + eleText + ": "); } if ("value".equals(localName)) { value += Integer.parseInt(eleText); } if ("product".equals(localName)) { level--; if (level == 0) { System.out.println(value); value = 0; } }}
Error Handling
Programming an
ErrorHandler
to handle parsing errors (otherwise parsing errors will be ignored).
Example:
Installing error handler
import org.xml.sax.*; import org.xml.sax.helpers.*; public class CourseApp { public static void main(String[] args) throws Exception { //create XMLReader XMLReader parser = XMLReaderFactory.createXMLReader();//install the content and error handler MyHandler handler = new MyHandler(); parser.setContentHandler(handler); parser.setErrorHandler(handler); // <------//start parsing for (int i =0; i < args.length; i++) { parser.parse(args[i]); } } }
Implementing methods
import org.xml.sax.*; import org.xml.sax.helpers.*;public class MyHandler extends DefaultHandler {public void fatalError(SAXParseException ex) throws SAXException { printError("FATAL ERROR", ex) }public void error(SAXParseException ex) throws SAXException { printError("ERROR", ex) }public void warning(SAXParseException ex) throws SAXException { printError("WARNING", ex) }private void printError(String err, SAXParseException ex) { System.out.printf("%s at %3d, %3d: %s \n", err, ex.getLineNumber(), ex.getColumnNumber(), ex.getMessage()); } }
ErrorHandler Methods
public void fatalError(SAXParseException ex) throws SAXException
→ well-formedness error
public void error(SAXParseException ex) throws SAXException
→ validation error
public void warning(SAXParseException ex) throws SAXException
→ minor error
Features
Features
https://xerces.apache.org/xerces2-j/features.html
Used to configure parser with:
setFeature(java.lang.String name, boolean value)
Example
import org.xml.sax.*; import org.xml.sax.helpers.*; public class CourseApp { public static void main(String[] args) throws Exception { //create XMLReader XMLReader parser = XMLReaderFactory.createXMLReader(); //install the content and error handler MyHandler handler = new MyHandler(); parser.setContentHandler(handler); parser.setErrorHandler(handler); //turn on validation parser.setFeature("http://xml.org/sax/features/validation", true); //start parsing for (int i =0; i < args.length; i++) { parser.parse(args[i]); } } }
can throw 2 exceptions:
SAXNotRecognizedException
- if feature can not be assigned
ie. turn on validation in a non-validating parser
SAXNotSupportedException
- if feature can not be activated
ie. turn on validation (in a validating parser) when part of the document has already been parsed
Example of features:Namespace awareness
http://xml.org/sax/features/namespaces
Reminder:
startElement(...)
is called at the beggining of every elementpublic void startElement(String namespaceURI, String localName, String qualifiedName, Attributes atts) throws SAXException
If parser is namespace aware :
namespaceURI
holds prefix (prefix:localname)
localName
holds element name without prefix
qualifiedName
empty
If parser is not namespace aware :
namespaceURI
empty
localName
empty
qualifiedName
holds elements name (with prefix)
Validation of document