Chapter 21

21: XML Parsers, Editors, and Utilities

It would be impossible to include a complete list of tools in any book, so Tthis chapter describes some of the more commonly used tools to read XML into memory (parsers), and to create and edit XML documents by hand (editors), andalong with related utilities. For information on other tools, a good place to start looking is It would be impossible to include a complete list of tool in any book; at http://www.xml.com/pub/Guide/XML_Parsers/ is a good place to start looking for others. In addition to being in common use, Tthe tools described here are the most commonly used, or have been chosen because they are likely to be useful if you are working with databases, whether on the World Wide Web or elsewhere.

Parsers: Tools that Read XML into Memory

There are two main models used by XML parsers:, the event model and the tree models. In the event model, the start of an XML element (sayfor example) simply calls a function you supply, and doesn't save anything in memory unless you do it yourself. Parsers using the tree model build up a complete data structure in memory, and either return it to you (XML::Parser in Perl can do that) or give you an API, usually based on the DOM, to manipulate it. Use the event model for speed and low memory overhead. Use the tree model for compatibility with browsers or other DOM-aware applications, or for simplicity in programming.

Use the event model for speed and low memory overhead. Use the tree model for compatibility with browsers or other DOM-aware applications, or for simplicity in programming.

SomeCertain XML parsers can work either way. They are split by primary language here, although some, such as expat, have been incorporated into other languages and may be mentioned more than once.

C and C++ Parsers

The most widely used tools in C and C++ are those written by James Clark; they are and available at http://www.jclark.com for free download. The license is very open, and the software can be used commercially.

SP

SP is James'Clark’s SGML parser. As its name suggests, it handles the fuller SGML standard. Every valid XML document is also a valid SGML document, and since JamesClark was a member of the XML working group, and is also on the ISO SGML committee, you should not be surprised to hearlearn that SP handles XML.

The parser is most often used with a C command-line front end (on Windows and Unix), called nsgmls, which produces a regular and easy-to-parse text output. The format is an augmented ESIS (an industry standard for SGML) and is easy to process in Perl, awk, or other languages.

There is a C++ API for SP, but it is not very clearly documented. SP is a fully validating parser, so it reads all of the declarations in both the internal document type declaration subset and any external DTD files as necessary. There is an enhanced version of SP called OpenSP at openjade.sourceforge.net, available under the same free licencse terms.

NOTE

SP can also use "PUBLIC" identifiers through a catalog mechanism;, but although this is appropriate for SGML, it's a pretty bad idea in XML, since the required SYSTEM identifier overrides the PUBLIC identifier anyway.

expat

The expat parser is written in C, and is easier to work with if you are just using XML, not SGML. Unlike SP, expat is not a full validating parser;. I it won't read declarations in an external DTD. There is an overview of expat at http://www.xml.com/pub/1999/09/expat/ written by Clark Cooper, the current maintainer of the Perl XML::Parser package.

Building expat and SP on Unix

For expat, you should end up with a directory structure similar to that shown in Figure 21.1;, and edit the Makefile that's in the expat directory. The changed lines are in bold:

CC=gcc
# If you know what your system's byte order is, define XML_BYTE_ORDER:
# use -DXML_BYTE_ORDER=12 for little-endian byte order;
# use -DXML_BYTE_ORDER=21 for big-endian (network) byte order.
# -DXML_NS adds support for checking of lexical aspects of
# the XML namespaces spec
# -DXML_MIN_SIZE makes a smaller but slower parser
CFLAGS=-O2 -Ixmltok -Ixmlparse -DXML_NS
# Use one of the next two lines; unixfilemap is better if it works.
FILEMAP_OBJ=xmlwf/unixfilemap.o
#FILEMAP_OBJ=xmlwf/readfilemap.o
OBJS=xmltok/xmltok.o \
  xmltok/xmlrole.o \
  xmlwf/xmlwf.o \
  xmlwf/xmlfile.o \
  xmlwf/codepage.o \
  xmlparse/xmlparse.o \
  xmlparse/hashtable.o \
  $(FILEMAP_OBJ)
EXE=

all: xmlwf/xmlwf$(EXE)

xmlwf/xmlwf$(EXE): $(OBJS)
        $(CC) $(CFLAGS) -o $@ $(OBJS)

clean:
        rm -f $(OBJS) xmlwf/xmlwf$(EXE)

xmltok/nametab.h: gennmtab/gennmtab$(EXE)
        rm -f $@
        gennmtab/gennmtab$(EXE) >$@

gennmtab/gennmtab$(EXE): gennmtab/gennmtab.c
        $(CC) $(CFLAGS) -o $@ gennmtab/gennmtab.c

xmltok/xmltok.o: xmltok/nametab.h

.c.o:
        $(CC) $(CFLAGS) -c -o $@ $<

Be very careful sure to note that the lines indented by eight characters start with a tab. If you change them to use spaces, you'll get an error about a "missing separator" or a syntax error from make.

Running make after editing the Makefile should produce a program called xmlwf in the xmlwf directory. This program can be run to see ifdetermine whether an XML file is well- formed or not.

Figure 21.1 Install structure for expat.

RXP

RXP, a freely available parser in C for Unix and Windows, iscan be found at http://www.cogsci.ed.ac.uk/~richard/rxp.html.

Gnome

The XML library for Gnome (http://xmlsoft.org/) supports DOM and SAX. The Gnome project offers one of the more popular desktop environments for the X Windows Ssystem, so the software, which is all open source, is very widely used.

Apache

Apache (http://www.apache.org) is the most widely used wWeb server on the Internet; it, and is freely available.; t The XML support being built in to it will be a major boost for XML. The Apache parser is at http://xml.apache.org.

Others

Look on this book’s companione wWeb site for this book to find pointers to other C parsers for XML.

Python

There is aA Python DOM implementation and several other tools can be found at http://fourthought.com/4Suite/4DOM/.

Lars M. Garshol keeps a page called "Tools for Pparsing XML with Python" at http://www.stud.ifi.uio.no/~larsga/download/python/xml/, which looks quite useful. In March 2000, it listed the following:

saxlib., the The Python version of SAX, with drivers.

xmlproc., a A validating XML parser.

PyPointers., an An XPointer implementation.

dtddoc., a A DTD documentation generator.

Java

As noted previously, Tthis is not primarily a Java book, mostly because there are already lots of numerous Java books out there.; t The following links may be useful, however:.

TeX

Yes, there's even an XML parser written in TeX macros. Though Yyou probably don't want this, I’ve— -- it's included it to show that XML Pparsers are getting pretty becoming fairly widespread. If you do want it, it's on CTAN and also is listed on the various archives.

Browsers

The best most well-known browser is, of course, Netscape's open source Mozilla project at http://www.mozilla.org, whose gecko rendering engine uses XML extensively. Mozilla uses XML to describe the user interface entirely. Unfortunately, there is not yet any XSL support yet.

The JUMBO browser was written by Peter Murray-Rust, for use in biochemistry. It's at www.vsms.nottingham.ac.uk/vsms/java/jumbo/. It and has helped to motivate a lot of XML parser development. There is support for a "hyperglossary," as well as visualization of structures.

Citech has two commercial XML/SGML browsers at www.citec.fi/company/products/, both of which are both highly spoken of; the older and more mature MultiDoc Pro series of products is based on a toolkit by Synex (www.synex.se) called Viewport. The "doczilla" products use gecko from www.mozilla.org.

Interleaf Panorama (formerly SoftQuad Panorama) may be available from http://www.interleaf.com; this was the first SGML viewer for the Wweb, in 1994, but there seems to have been relatively little development since 1997, when the product was bought from SoftQuad by Interleaf.

InDelv claims to have an open source browser at that supports XSL at http://www.indelv.com.

Finally, no list of XML browsers would be respectable without mentioning that Microsoft Internet Explorer 5 includes XML and XSL support. Currently, IE5 is only useful only on Windows at the moment (the Solaris version had a lot of problems whenith I tried it, including turning my root window bright purple by overwriting the default color map, and crashing OpenWindows on Solaris 2.6/SPARC). The Macintosh version is in beta at the time of this writing;, and there doesn't seem to be a Linux version.

Transforming Data

The tools in this section transform non-XML text into XML, or XML into other things (including into more, but different, XML).

Formatting and Printing

FOP, an Open- Source XSL Formatter and Renderer, is a project started by James Tauber, and can be found at http://xml.apache.org/fop/, in source form, along with documentation and a mailing list.

Jade and OpenJade were mentioned under in the previous section, Ttransforming Dataations above; now there are Windows NT binaries at http://www.sscd.de/openjade, too now.

IBM offers LotusXSL from alphaWorks (http://www.alphaworks.ibm.com) as a free download. There are also XSL and XML editors at http://www.alphaWorks.ibm.com/tech/xsleditor.

There are some You can find XML and XSL tutorials at http://zvon.vscht.cz/ZvonHTML/Zvon/zvonTutorials_en.html.

Editors

There's aA useful list of editors is given at http://wdvl.internet.com/Software/XML/editors.html, and another at http://xmlsoftware.com/editors/.

Figure 21.1 XMetaL showing tags.

Figure 21.2 XMetaL without tags showing.

Figure 21.3 XMetal's source view.

Figure 21.4 XED.

Figure 21.5 XML Spy.

Some Other XML Editors

Microsoft XML Notepad.: http://www.microsoft.com/xml/notepad

sxml for emacs: http://www.inria.fr/koala/plh/sxml.html

Visual Markup. http://www.vtopia.com/products/markup/ (commercial;, Windows only)

XML Authority. From http://www.Extensibility.com

XMLWriter. http://www.XMLwriter.net/ (commercial;, Windows only, uses Microsoft's XML parser) (http://www.XMLwriter.net/)

XPublish. http://interaction.in-progress.com/xpublish/index (commercial, Macintosh)

Java component library. http://www.alphaWorks.ibm.com/tech/xsleditor (described under Transforming Data).