SAX

https://docs.oracle.com/javase/7/docs/api/org/xml/sax/ContentHandler.html#endElement(java.lang.String, java.lang.String, java.lang.String)

XML Parsers

Standards for XML parsers

  • DOM

    = Document Object Model (tree-based)

    W3C standard

  • SAX

    = Simple API for XML (event-based)

    “effective” standard, very popular

    Versions for different programming languages (we only look at Java)

XML Parsers (General)

XML document \rarr XML parser \rarr App

XML parser: splits document into pieces and sends the “XML information set” to the application.

Event-based Parser

Events = callbacks sent to the application by the event-based parser (ie. element start, end)

Handlers = implemented by application to deal with events

Used for large documents with a lack of data structure.

  • Sequential access
  • Fast
  • Constant memory

Callbacks

Callbacks

We call the SAX-parser (and supply it with methods) which it calls back:

startDocument(...) beginning of parsing

startElement(...) beginning of element

  • in depth

    public void startElement(... , Attributes atts) throws SAXException


    Attribute is an interface with some useful methods:

    • getLength() - number of attributes
    • getLocalName(index) - attribute’s local name
    • getQName(index) - attribute’s qualified name
    • getValue(index) - attribute’s value
    • getType(index) - attribute’s type (CDATA, NMTOKEN, etc.)

characters(...) character data

endElement(...) end of element

endDocument(...) last method called by parser

processingInstruction(...) each processing istruction <?target data?>

Event handlers

ContentHandler (crucial) handles basic parsing call-backs ie. element starts

we can use the DefaultHandler to implement stubs for all methods

then we only need to implement part of the interface

ErrorHandler (crucial) handles parsing errors

DTDHandler handles notation, unparsed entity declarations

EntityResolver customized handling for external entities

Content handling: Examples

Example 1) Simple SAX program

<?xml version="1.0"?>
<course>Semi-structured Data</course>
startElement: course
characters: Semi-structured Data
endElement: course

Program consists of 2 classes:

  • CourseApp
    import org.xml.sax.*;
    import org.xml.sax.helpers.*;
    public class CourseApp {public static void main(String[] args) throws Exception {//create XMLReader (parser reads and calls callbacks in handler)
    XMLReader parser = XMLReaderFactory.createXMLReader();//install the content handler
    MyHandler handler = new MyHandler();
    parser.setContentHandler(handler);
    //start parsinqg
    for (int i =0; i < args.length; i++) {
    parser.parse(args[i]);
    }
    }
    }
  • MyHandler

    contains handlers for 3 kinds of callbacks

    import org.xml.sax.*;
    import org.xml.sax.helpers.*; // implements other methods
    public class MyHandler implements ContentHandler {//SAX calls this method when it encounters a start tag
    public void startElement(String namespaceURI, String localName, String qualifiedName, Attributes atts) throws SAXException {
    System.out.println("startElement: " + qualifiedName);
    }//SAX calls this method to pass in character data
    public void characters(char[] text, int start, int length) throws SAXException {
    System.out.println("characters: " + new String(text, start, length));
    }//SAX calls this method when it encounters an end tag
    public void endElement(String namespaceURI, String localName, String qualifiedName) throws SAXException {
    System.out.println("endElement: " + qualifiedName);
    }
    }

    Use the default-handler package org.xml.sax.helpers.* (implements all interface methods with empty body so we don’t have to).

Example 2)

  • Input
    <products>
    <product>
    <name>Product A</name> <!-- fst -->
    <value>100</value>
    <product>
    <name>Product B</name>
    <value>200</value>
    <product>
    <name>Product E</name>
    <value>600</value>
    </product>
    </product>
    <product>
    <name>Product C</name>
    <value>400</value>
    <product>
    <name>Product E</name>
    <value>600</value>
    </product>
    </product>
    </product>
    <product>
    <name>Product D</name> <!-- snd -->
    <value>300</value>
    <product>
    <name>Product C</name>
    <value>400</value>
    <product>
    <name> Product E </name>
    <value>600</value>
    </product>
    </product>
    </product>
    </products>
  • Output
    Total value of Product A: 1900
    Total value of Product D: 1300

Handler:

  • TopProducts
    import org.xml.sax.*;
    public class TopProducts implements DefaultHandler {
    String eleText;
    private int level = 0;
    private int value = 0;public void startElement(String namespaceURI, String localName, String qName, Attributes atts) throws SAXException {
    if ("product".equals(localName)) {
    level++;
    }
    }//  SAX calls this method to pass in character data
    public void characters(char[] text, int start, int length) throws SAXException {
    eleText = new String(text, start, length);
    }
    public void endElement(String namespaceURI, String localName, String qName) throws SAXException {
    if ("name".equals(localName) && level == 1) {
    System.out.print("Total value of " + eleText + ": ");
    }
    if ("value".equals(localName)) {
    value += Integer.parseInt(eleText);
    }
    if ("product".equals(localName)) {
    level--;
    if (level == 0) {
    System.out.println(value);
    value = 0;
    }
    }}

Error Handling

Programming an ErrorHandler to handle parsing errors (otherwise parsing errors will be ignored).

Example:

  • Installing error handler
    import org.xml.sax.*;
    import org.xml.sax.helpers.*;
    public class CourseApp {
    public static void main(String[] args) throws Exception {
    //create XMLReader
    XMLReader parser = XMLReaderFactory.createXMLReader();//install the content and error handler
    MyHandler handler = new MyHandler();
    parser.setContentHandler(handler);
    parser.setErrorHandler(handler); // <------//start parsing
    for (int i =0; i < args.length; i++) {
    parser.parse(args[i]);
    }
    }
    }
  • Implementing methods
    import org.xml.sax.*;
    import org.xml.sax.helpers.*;public class MyHandler extends DefaultHandler {public void fatalError(SAXParseException ex) throws SAXException {
    printError("FATAL ERROR", ex)
    }public void error(SAXParseException ex) throws SAXException {
    printError("ERROR", ex)
    }public void warning(SAXParseException ex) throws SAXException {
    printError("WARNING", ex)
    }private void printError(String err, SAXParseException ex) {
    System.out.printf("%s at %3d, %3d: %s \n", err, ex.getLineNumber(), ex.getColumnNumber(),
    ex.getMessage());
    }
    }

ErrorHandler Methods

public void fatalError(SAXParseException ex) throws SAXException → well-formedness error

public void error(SAXParseException ex) throws SAXException → validation error

public void warning(SAXParseException ex) throws SAXException → minor error

Features

Features

https://xerces.apache.org/xerces2-j/features.html

Used to configure parser with: setFeature(java.lang.String name, boolean value)

  • Example
    import org.xml.sax.*;
    import org.xml.sax.helpers.*;
    public class CourseApp {
    public static void main(String[] args) throws Exception {
    //create XMLReader
    XMLReader parser = XMLReaderFactory.createXMLReader();
    //install the content and error handler
    MyHandler handler = new MyHandler();
    parser.setContentHandler(handler);
    parser.setErrorHandler(handler);
    //turn on validation
    parser.setFeature("http://xml.org/sax/features/validation", true);
    //start parsing
    for (int i =0; i < args.length; i++) {
    parser.parse(args[i]);
    }
    }
    }

can throw 2 exceptions:

SAXNotRecognizedException - if feature can not be assigned

ie. turn on validation in a non-validating parser

SAXNotSupportedException - if feature can not be activated

ie. turn on validation (in a validating parser) when part of the document has already been parsed

Example of features:Namespace awareness

http://xml.org/sax/features/namespaces

  • Reminder: startElement(...) is called at the beggining of every element
    public void startElement(String namespaceURI, String localName, String qualifiedName, Attributes atts) throws SAXException

If parser is namespace aware :

namespaceURI holds prefix (prefix:localname)

localName holds element name without prefix

qualifiedName empty

If parser is not namespace aware :

namespaceURI empty

localName empty

qualifiedName holds elements name (with prefix)

Validation of document