IDevResource.com - XML Channel - Professional XML by Wrox Press

The Developer's Resource & Community Site

COM	XML	ASP	Java & Misc.	NEW: VS.NET
International	This Week	Forums	Author Central	Find a Job

SAX 1.0: The Simple API for XML

(Reproduced with kind permision of Wrox Press: https://www.wrox.com)

Page 4 (Page 3):

The Rule-Based Design Pattern

An alternative way of structuring a SAX application, which again has the objective of separating functions and keeping the structure modular and simple, is a rule-based approach.

In general rule-based programs use an "Event-Condition-Action" model: they contain a collection of rules of the form "if this event occurs under these conditions, perform this action". Rule based programming can thus be seen as a natural extension of event-based programming.

The processing model of XSL (discussed in Chapter 9) can be seen as an example of rule-based programming. Each XSL template constitutes one rule: the event is the processing of a node in the source document; the condition is the pattern that controls which template is activated, and the action is the body of the template. We can use the same concepts in a SAX application.

The diagram below illustrates the structure of a rule-based SAX application. The input from the XML parser is fed into a switch, which evaluates the events against the defined conditions, and decides which actions to invoke. The actions are then passed to processing modules each of which is designed to perform one specific task.

There are all sorts of ways conditions and actions could be implemented, but we'll describe a very simple implementation, where the condition is based only on element type.

Firstly, let's write the DocumentHandler. We'll call it Switcher because its job is to switch processing to a piece of code that handles the specific element type.

What Switcher does is to maintain a set of rules as a Hashtable. The set of rules is indexed by element type. The application can nominate a class called an ElementHandler to process a particular element type. When the parser notifies an element start tag, the appropriate ElementHandler is located in the set of rules, and it is called to process the start tag. At the same time, the ElementHandler is remembered on a stack, so that the same ElementHandler can be used to process the end tag and any character data occurring immediately within this element.

Here’s the Switcher code:

import org.xml.sax.*;
import java.util.*;

/**
  * Switcher is a DocumentHandler that directs events to an appropriate element
  * handler based on the element type.
  */
  
public class Switcher extends HandlerBase 
{

    private Hashtable rules = new Hashtable();
    private Stack stack = new Stack();

    /**
    * Define processing for an element type.
    */

    public void setElementHandler(String name, ElementHandler handler) 
    {
        rules.put(name, handler);
    }
    
    /**
    * Start of an element. Decide what handler to use, and call it.
    */
    
    public void startElement (String name, AttributeList atts) throws 
                                                          SAXException 
    {
        ElementHandler handler = (ElementHandler)rules.get(name);
        stack.push(handler);
        if (handler!=null) 
        {
            handler.startElement(name, atts);
        }
    }

    /**
    * End of an element.
    */

    public void endElement (String name) throws SAXException 
    {
        ElementHandler handler = (ElementHandler)stack.pop();
        if (handler!=null) 
        {
            handler.endElement(name);
        }     
    }

    /**
    * Character data.
    */
    
    public void characters (char[] ch, int start, int length) throws SAXException
    {
        ElementHandler handler = (ElementHandler)stack.peek();
        if (handler!=null) 
        {
            handler.characters(ch, start, length);
        }    
    }

}

An ElementHandler is rather like a DocumentHandler, but it only ever gets to process a subset of the events: element start and end, and character data. So although we could use a DocumentHandler here, we've defined a special class. This serves both as a definition of the interface and as a superclass for real element handlers: good Java coding practice might suggest using a separate interface class, but this will do for now.

import org.xml.sax.*;

/**
  * ElementHandler is a class that process the start and end tags and 
  * character data
  * for one element type. This class itself does nothing; the 
  * real processing should
  * be defined in a subclass
  */
  
public class ElementHandler {
   
    /**
    * Start of an element
    */
    
    public void startElement (String name, AttributeList atts) throws 
                                                           SAXException {}

    /**
    * End of an element
    */

    public void endElement (String name) throws SAXException {}
 
    /**
    * Character data
    */
    
    public void characters (char[] ch, int start, int length) throws 
                                                          SAXException {}

}

So far this is all completely general. We could use the Switcher and ElementHandler classes with any kind of document, to do any kind of processing. Now let's exploit them for a real application: we want to produce an HTML page showing selected data from our list of books.

Here's an application that does it. We'll start with the main control structure, What this does is to create a Switcher and register a number of ElementHandler classes to process particular elements in the input XML document. It then creates a Parser, nominates Switcher as the DocumentHandler, and runs the parse.

import org.xml.sax.*;
import com.icl.saxon.ParserManager;

public class DisplayBookList
{

   public static void main (String args[]) throws Exception 
   {
      (new DisplayBookList()).go(args[0]);
   }

   public void go(String input) throws Exception 
   {
      Switcher s = new Switcher();
      s.setElementHandler("books", new BooklistHandler());
      s.setElementHandler("book", new BookHandler());
      s.setElementHandler("author", new AuthorHandler());
      s.setElementHandler("title", new TitleHandler());
      s.setElementHandler("price", new PriceHandler());
      s.setElementHandler("volume", new VolumeHandler());
      Parser p = ParserManager.makeParser();
      p.setDocumentHandler(s);
      p.parse(input);
   }

//...rest of code goes in here...
}

The actual element handlers can be defined as inner classes within the DisplayBookList class: this is useful because it enables them to share access to data.

The ElementHandler for the outermost element, "books", causes a skeletal HTML page to be created:

private class BooklistHandler extends ElementHandler
   {

      public void startElement(String name, AttributeList atts) 
      {
         System.out.println("<html>");
         System.out.println("<head><title>Book List</title></head>");
         System.out.println("<body><h1>A List of Books</h1>");
         System.out.println("<table>");
         System.out.println("<tr><th>Author</th>");
         System.out.println("<th>Title</th><th>Price</th></tr>");
      }

      public void endElement(String name) 
      {
         System.out.println("</table></body></html>");
      }
   
   }

The ElementHandler for the repeated "book" element starts and ends a row in the generated HTML table, and initializes some variables to hold the data:

private String author;
   private String title;
   private String price;
   private boolean inVolume;

   private class BookHandler extends ElementHandler 
   {

      public void startElement(String name, AttributeList atts) 
      {
         author = "";
         title = "";
         price = "";
         inVolume = false;
      }

      public void endElement(String name) 
      {
         System.out.println("<tr><td>" + author + "</td>");
         System.out.println("<td>" + title + "</td>");
         System.out.println("<td>" + price + "</td></tr>");
      }
   }

Finally, the element handlers for the fields within the <book> element update the local variables holding the data. We're being careless about performance here in the interests of clarity – it would be better to use StringBuffers rather than Strings for the variables.

private class AuthorHandler extends ElementHandler
   {

      public void characters (char[] chars, int start, int len) 
      {
         author = author + new String(chars, start, len);
      }
   }

   private class TitleHandler extends ElementHandler 
   {

      public void characters (char[] chars, int start, int len) 
      {
         if (!inVolume) 
         {
            title = title + new String(chars, start, len);
         }
      }
   }

   private class PriceHandler extends ElementHandler 
   {

      public void characters (char[] chars, int start, int len) 
      {
         if (!inVolume) 
         {
            price = price + new String(chars, start, len);
         }
      }
   }   

   private class VolumeHandler extends ElementHandler 
   {

      public void startElement(String name, AttributeList atts) 
      {
         inVolume = true;
      }

      public void endElement(String name) 
      {
         inVolume = false;
      }
   }

The flag inVolume is used to track whether the current element is within a containing <volume> element, in which case it is ignored. Once you've put all this together (the full code can be found in the download for the book at https://www.wrox.com) you can run this on a sample XML file with a command like this:

>java DisplayBookList file:///c:/data/books2.xml

The following output should then appear:

<html>
<head><title>Book List</title></head>
<body><h1>A List of Books</h1>
<table>
<tr><th>Author</th><th>Title</th><th>Price</th></tr>
<tr><td>Nigel Rees</td>
<td>Sayings of the Century</td>
<td>8.95</td></tr>
<tr><td>Evelyn Waugh</td>
<td>Sword of Honour</td>
<td>12.99</td></tr>
<tr><td>Herman Melville</td>
<td>Moby Dick</td>
<td>8.99</td></tr>
<tr><td>J. R. R. Tolkien</td>
<td>The Lord of the Rings</td>
<td>22.99</td></tr>
</table></body></html>

You can elaborate on this design pattern as much as you like. Possible enhancements include:

Providing element handlers with access to a stack containing details of their context
Selecting element handlers based on conditions other than just the element name
Using element handlers as part of a pipeline, by allowing them to fire events into another DocumentHandler.

The advantage of this design pattern is that it avoids a great deal of if-then-else programming. It removes the need to change the DocumentHandler to add conditional logic every time a new element type is introduced. Instead all you need to do is to register another element handler.

Contribute to IDR:

To contribute an article to IDR, a click here.

To contact us at IDevResource.com, use our feedback form, or email us.

To comment on the site contact our webmaster.