An XML Fragment Reader
July 16, 2003
While many potential uses of XML result in fragments of XML text, not complete documents, XML parsers require complete documents to do their jobs properly. I have been running an XML-based servlet to conduct online surveys. It records user responses by adding XML formatted data to a continuously growing cumulative file. I needed a way to analyze survey responses on the fly without going to the trouble of copying the file and adding the markup required to create a complete document.
The solution turns out to be simple and quite flexible. It enables you to combine
many bits
of XML formatted character streams to feed an XML parser. With the release of the
Java SDK
1.4, XML parser classes joined the standard Java release, creating a standard API
for parser
access. Thus, in the org.xml.sax package, you'll find the InputSource
class. An
InputSource
object can feed a character stream to either a SAX or a DOM
parser. You can create an InputSource
from a Reader
, the basic
Java class for streams of characters.
A workable plan of attack is to create a class extending Reader
that can
supply characters to an InputStream
from a sequence of character stream
sources. In this example, I use String
and File
objects to supply
character streams but any object that can create a Reader
, such as a
URLConnection
or a java.sql.Clob
from a database, will work. The
class I created is called XMLfragmentReader
.
The XMLfragmentReader Class
The following shows the import statements, instance variables, and constructor for
the
XMLfragmentReader
class. The constructor takes an Object
array
and a PrintStream
that can be used to log events and errors. When the
constructor is finished, the first Reader
has been opened and is ready:
package com.lanw.rutil ; import java.io.* ; import org.xml.sax.* ; import org.xml.sax.helpers.* ; import javax.xml.parsers.* ; import org.w3c.dom.* ; public class XMLfragmentReader extends java.io.Reader { boolean rdyflag = false ; Reader rdr ; // current Reader Object[] sources ; int[] lineCounts ; // per source char eol = '\n' ; String readerID ; int sourceN ; // index of current Reader long charsRead ; // in current Reader // PrintStream log ; String lastErr ; public XMLfragmentReader( Object[] src, PrintStream ps ) throws IOException { sources = src ; lineCounts = new int[ sources.length ]; if( ps != null ) log = ps ; else log = System.out ; log.println("Created XMLfragmentReader with " + sources.length + " sources." ); createReader( 0 ); }
The createReader
method that follows determines the type of the nth source
object and creates the appropriate reader.
private void createReader( int n ) throws IOException { Object src = sources[n] ; charsRead = 0 ; log.println("Creating reader for: " + src ); if( src instanceof String ){ rdr = new StringReader((String) src); rdyflag = true ; readerID = "InputString " + n ; } if( src instanceof File ){ rdr = new BufferedReader(new FileReader( (File)src )); rdyflag = true ; readerID = ((File)src).getAbsolutePath() ; } // expand here with more source types }
Because XMLfragmentReader
extends the java.io.Reader
abstract
class, we must supply the following methods. There are two important points to notice
here.
- Whenever a reading method gets an indication that the current source is exhausted,
it
calls
nextReader
to open the next one. - When one or more characters have been read, we always check to see if an end of line character has been encountered. A line count is maintained for each source to help interpret any parsing errors.
public boolean ready() throws IOException { return rdr.ready() ; } public void close() throws IOException { rdr.close() ; rdyflag = false ; } // Return a single character or -1 if all reader sources // are exhausted. public int read() throws IOException { int ch = rdr.read(); charsRead++ ; if( ch == -1 ){ if( nextReader() ){ ch = rdr.read(); } // if no next reader return -1 } if( ch == eol ) { lineCounts[sourceN]++ ; } return ch ; } public int read(char[] cbuf) throws IOException { return read( cbuf, 0, cbuf.length ) ; } public int read(char[] cbuf, int off, int len) throws IOException { int ct = rdr.read( cbuf, off, len ); if( ct == -1 ){ if( nextReader() ){ ct = rdr.read( cbuf, off, len ); } // if no next reader return -1 } if( ct > 0 ){ countLines( cbuf, off, ct ); } charsRead += ct ; return ct ; } public long skip(long n) throws IOException { return rdr.skip( n ) ; }
Every source after the first one is created by a call to the nextReader
method
which follows. It returns true
if there is another source that can be opened.
The countLines
method is used by the read
method that reads into a
buffer to keep track of the number of lines in the nth source:
// return true if next reader created ok private boolean nextReader() throws IOException { close(); // sets rdyflag = false ; if( ++sourceN >= sources.length ) return false ; createReader( sourceN ); return rdyflag ; } // note that len is the number actually read private void countLines( char[] cbuf, int off, int len ){ for( int i = 0 ; i < len ; i++ ){ if( cbuf[ off++ ] == eol ){ lineCounts[ sourceN ]++ ; } } }
To facilitate debugging XML documents, a SAX parser keeps track of the number of lines
it
has read so it can report a line and column number where an error is detected. However,
owing to the multiple sources we read, this absolute line number is useless if we
can't
report line number within a specific source. The following method returns a
String
that reports the source number and relative line number that
corresponds to an absolute line number:
// Method to convert a absolute line number as reported in // a SAXException to a source and relative line number public String reportRelativeLine( int absN ){ int runningLines = 0 ; for( int i = 0 ; i < sources.length ; i++ ){ runningLines += lineCounts[i] ; if( absN <= runningLines ){ int startN = runningLines - lineCounts[i] ; return "Source number: " + i + " line: " + (absN - startN) ; } } return "Unable to locate line# " + absN ; }
Example Uses
One use of XMLfragmentReader
is to create an
org.xml.sax.InputSource
object that is in turn passed to a
SAXParser
. The other thing the parser needs is an event handler, typically
created as an extension to the DefaultHandler
class in the
org.xml.sax.helpers
package. The following method in the
XMLfragmentReader
class takes a DefaultHelper
and parses the
combined sources provided by the reader, calling the event handler methods in the
handler.
Note how the exception reporting uses the reportRelativeLines
method:
// returns null if no error, else a String with details public String parse( DefaultHandler handler ){ SAXParser parser ; try { InputSource input = new InputSource( this ); SAXParserFactory fac = SAXParserFactory.newInstance(); parser = fac.newSAXParser() ; // default log.println("Start parse"); parser.parse( input, handler ); log.println("End parse"); }catch(SAXParseException spe){ StringBuffer sb = new StringBuffer( spe.toString() ); sb.append("\nAbsolute Line number: " + spe.getLineNumber()); sb.append("\nColumn number: " + spe.getColumnNumber() ); sb.append("\n"); sb.append( reportRelativeLine( spe.getLineNumber() )); lastErr = sb.toString(); }catch(Exception e){ StringWriter sw = new StringWriter(); e.printStackTrace( new PrintWriter( sw ) ); lastErr = sw.toString(); } return lastErr ; }
The other way to work with an XML document is through creation of a
org.xml.dom.Document
object. I have also included a simple method to do this
in the XMLfragmentReader
class:
public Document build( ){ DocumentBuilder builder = null ; Document doc = null ; try { InputSource input = new InputSource( this ); DocumentBuilderFactory fac = DocumentBuilderFactory.newInstance(); builder = fac.newDocumentBuilder(); // default Log.println("Start build"); doc = builder.parse( input ); Log.println("End build"); return doc ; }catch(SAXParseException spe){ StringBuffer sb = new StringBuffer( spe.toString() ); sb.append("\nAbsolute Line number: " + spe.getLineNumber()); sb.append("\nColumn number: " + spe.getColumnNumber() ); sb.append("\n"); sb.append( reportRelativeLine(spe.getLineNumber())); lastErr = sb.toString(); }catch(Exception e){ StringWriter sw = new StringWriter(); e.printStackTrace( new PrintWriter( sw ) ); lastErr = sw.toString(); } return null ; } }
Here is a simplified example of creating a document from fragments of XML text. Two strings are used to form the start and end of the document and two files representing survey results from two periods are used to create the contents:
public Document example() throws IOException { Object[] src = new Object[4] ; src[0] = "<?xml version=\"1.0\"?>\r\n<root>\r\n" ; src[1] = new File( "c:\\XMLonTheFly\\Data\\test0117A.xml"); src[2] = new File( "c:\\XMLonTheFly\\Data\\test0117B.xml"); src[3] = "</root>\r\n" ; XMLfragmentReader fr = new XMLfragmentReader( src, System.out ); Document dom = fr.build(); if( dom == null ){ System.out.println("Error: " + fr.lastErr ); } return dom ; }