Creating XML with Genx
June 23, 2004
Genx is an easy-to-use C library for generating well-formed XML output. In addition to being well-formed, Genx writes all output in canoncial form. It was created by Tim Bray with help from members of the xml-dev mail list. Work on Genx was announced on xml-dev on 19 January 2004. Some of the benefits of Genx include size, efficiency, speed, and the integrity of its output. Genx is well documented; it's fairly easy to figure out what's going on just by looking at the well-commented source code.
This article will show you how to download, install, and compile Genx; then it will
walk
you through two examples programs. The article assumes that you are familiar with
XML, the C
programming language, and that you have a C compiler and the make
build utility
available on your system. The example programs in this article have been tested under
version beta5 of Genx.
Setting Up Genx
The first thing you have to to is download Genx. It comes in a tarball only. After you download it
to a working directory, you need to extract the files. While at a shell or command
prompt,
change directories to a working directory you've set up for Genx. If you are on a
machine
that runs a Unix operating system, decompress the Genx tarball (e.g., gzip -d
genx.tgz
), then extract the tar file genx.tar
(e.g., tar xvf
genx.tar
). This create a genx
subdirectory where all the files from
the archive will be extracted. (If you are on Windows without Cygwin, you can use
a utility
like WinZip to extract the GZIP archive.)
Compiling Genx
Genx comes with a Makefile
for building the project. While in the
genx
subdirectory, just type make
, and the process begins. The
build will compile the needed files genx.c
and charProps.c
.
genx.c
includes the genx.h
header file; charProps.c
is where character properties are stored, and it is apparently used to test for legal
characters in XML.
The ar
(archive) command is invoked to create an archive from object files
genx.o
and charProps.o
The archive is called
libgenx.a
. The ranlib
utility is also invoked to create an index
for the archive. You will need to use libgenx.a
when you compile your own Genx
files. One other program, tgx.c
, is also compiled and run. This program runs a
number of tests on Genx and reports on what it finds so you know everything is working
as
intended.
A First Example
Several test programs are provided in the Genx package and are stored under the
docs
subdirectory. I have written a few additional sample programs that I'll
highlight here. You can download
these programs. Place this archive under your genx
subdirectory and extract the
contents there to the genx-examples
subdirectory. Change directories to
genx-examples
and type make
again (the example archive I've
provided also has its own Makefile
). After you invoke make
in
genx-examples
, the example programs will be built and ready to go.
First, here is a simple C program called tick.c
that uses functions from the
Genx library:
#include <stdio.h> #include "../genx.h" int main() { genxWriter w = genxNew(NULL, NULL, NULL); genxStartDocFile(w, stdout); genxStartElementLiteral(w, NULL, "time"); genxAddAttributeLiteral(w, NULL, "timezone", "GMT"); genxStartElementLiteral(w, NULL, "hour"); genxAddText(w, "23"); genxEndElement(w); genxStartElementLiteral(w, NULL, "minute"); genxAddText(w, "14"); genxEndElement(w); genxStartElementLiteral(w, NULL, "second"); genxAddText(w, "52"); genxEndElement(w); genxEndElement(w); genxEndDocument(w); }
The second line of the program is an #include
directive for the copy of the
genx.h
header file that is located in the directory above
genx-examples
, provided that Genx and the examples were installed as
directed. You can also place a copy of genx.h
in the location for system
include files (on my Cygwin system, for example, the location is
c:/cygwin/usr/include
). If a copy of genx.h
is in the system
include location, you can change the #include
directive to #include
<genx.h>
.
The first statement inside main
creates a writer for the output of the
program. The variable w
is of type genxWriter
, and it's
initialized by the genxNew
function. (Looks like a Java constructor, doesn't
it?) genWriter
is a pointer to the struct genxWriter_rec
which
stores all kinds of information about the document being built. The three arguments
to the
genxNew
function are for memory allocation and deallocation. When all three
arguments are set to NULL
, we are basically instructing Genx to use its default
memory handling (with malloc
and free
).
Following this initialization of a writer is a series of function calls, each with
small
job. Notice that the first or only argument to each of these functions is w
,
the writer structure. The call to genxStartDocFile
starts the writing process.
The second argument, stdout
, indicates that the document will be written to
standard output. (The document could otherwise be written to a file as you will see
in the
next example.) At the end of the program is a call to genxEndDocument
which
signals the end of the document and flushes it.
The program also contain four calls to genxStartElementLiteral
each of which
is terminated by a call to genxEndElement
. genxStartElementLiteral
has three arguments. The first is the writer structure (w
) explained
previously, next is a namespace name or URI (always NULL
here), and the third
is the element name, such as time
or hour
.
If you give an element a namespace URI in the second argument, Genx writes the namespace
URI on the element with an xmlns
attribute and automatically creates a prefix,
which is used on any child elements that have the same namespace declared.
The text content for a given element, if any, is created with genxAddText
,
with the second argument containing the actual text, such as 23
or
14
.
You can probably guess that genxAddAttributeLiteral
writes an attribute on the
element that is created immediately before it. It has four arguments. The first is
the
writer structure, and the second is a namespace URI which is NULL
if no
namespace applies. The third argument is the attribute name and the fourth is the
attribute
value.
To run the program, just type tick
at the prompt (it was compiled with
make
previously). The output of the program should look like this:
<time timezone="GMT"><hour>23</hour><minute>14</minute><second>52</second></time>
This output is canonicalized XML. Some obvious marks are no XML declaration, no whitespace between element tags, and double quotes rather than single quotes around attribute values. Now let's look at a Genx example that is a little more complex.
Another Approach
In the next example we will explore a different approach for writing an XML document
with
Genx. The program tock.c
declares elements, an attribute, and a namespace
before it uses them, then writes elements and an attribute with different functions
that are
more efficient than their "literal" counterparts. It also write its non-canonical
output to
a file. Here is the code:
#include <stdio.h> #include "../genx.h" int main() { genxWriter w = genxNew(NULL, NULL, NULL); FILE *f = fopen("tock.xml", "w"); genxElement time, hr, min, sec; genxAttribute tz; genxNamespace tm; genxStatus status; tm = genxDeclareNamespace(w, "http://www.wyeast.net/time", "tm", &status); time = genxDeclareElement(w, tm, "time", &status); tz = genxDeclareAttribute(w, NULL, "timezone", &status); hr = genxDeclareElement(w, tm, "hour", &status); min = genxDeclareElement(w, tm, "minute", &status); sec = genxDeclareElement(w, tm, "second", &status); genxAddText(w, "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"); genxStartDocFile(w, f); genxPI(w, "xml-stylesheet", " href=\"tock.xsl\" type=\"text/xsl\" "); genxComment(w, " the current date "); genxAddText(w, "\n"); genxStartElement(time); genxAddAttribute(tz, "GMT"); genxAddText(w, "\n "); genxStartElement(hr); genxAddText(w, "23"); genxEndElement(w); genxAddText(w, "\n "); genxStartElement(min); genxAddText(w, "14"); genxEndElement(w); genxAddText(w, "\n "); genxStartElement(sec); genxAddText(w, "52"); genxEndElement(w); genxAddText(w, "\n"); genxEndElement(w); genxEndDocument(w); }
The second line after main
creates a FILE
object by calling the
fopen
function with a filename (tock.xml
) where the output is to
be written and the stream or writer object (w
) from which the data will be
supplied. Following that, four elements (time
, hr
,
min
, and sec
) are declared to be of type
genxElement
. The attribute tz
is declared to be of type
genxAttribute
, and the namespace tm
is declared with
genxNamespace
. status
is of type genxStatus
, an
enum that helps keep track of the status of things, such as GENX_SUCCESS
and
GENX_BAD_NAME
, and so forth. status
is used as the last argument
of the functions that follow with the address-of operator &
.
After the initial declarations, all these variables are initialized with an appropriate
function, genxDeclareNamespace
, genxDeclareElement
, and
genxDeclareAttribute
. For example, the namespace variable tm
is
given a namespace name (http://www.wyeast.net/time
) and a prefix
(tm
) with the genxDeclareNamespace
function:
tm = genxDeclareNamespace(w, "http://www.wyeast.net/time", "tm", &status);
The genxAddText
function inserts strings — an XML declaration and new
line characters and spaces — into the file output stream. The addition of the XML
declaration is what makes the output non-canonical.
The functions genxPI
and genxComment
write an XML stylesheet
processing instruction and a comment, respectively. Then the functions
genxStartElement
and genxAddAttribute
begin writing the markup.
The functions use an object rather than text to write the markup literally, with better
performance than their counterparts genxStartElementLiteral
and
genxAddAttributeLiteral
. Other elements, such as genxAddText
and
genxEndElement
, may be used with both variations of the element and attribute
creation elements, or just inserting between-elements whitespace, and so on.
To run the program, type the word tock
at a command or shell prompt. Genx will
then create the file tock.xml
, shown here:
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet href="tock.xsl" type="text/xsl" ?> <! the current date --> <tm:time xmlns:tm="http://www.wyeast.net/time" timezone="GMT"> <tm:hour>23</tm:hour> <tm:minute>14</tm:minute> <tm:second>52</tm:second> <tm:time>
Just for fun, this non-canonical output can be transformed with XSLT stylesheet
tock.xsl
and validated with the RELAX NG
schema tock.rng
. Both files are in the example archive.
Wrap Up
There are a number of other Genx functions that I have not touched on — such as the
memory management functions genxGetAlloc
, genxSetAlloc
. My take is
that Tim Bray is on the right track, and that if you use C and you need to generate
XML
output, you will no doubt find that Genx is an efficient tool.