Hacking Open Office
January 26, 2005
If you are using any word processor or editor in a group situation, such as a technical writing team, or an office, then it will probably be in your interest to set up templates for authors to use to ensure consistency, reduce effort, and help automate conversation of documents between formats, such as building web pages from office documents. If you are also trying to store and manipulate content in XML but want to use a word processing environment for authoring, then well-crafted templates are even more important.
In this article, I'm going to explore some of the ways that OpenOffice.org's Writer application (I'm using version 1.1.2 on Linux and 1.1.3 on Windows XP) is open to customization and configuration. I'll walk through some of the techniques I used to set up the first templates I built with the application in my quest for an interoperable, XHTML-ready system of templates and styles which will work across Microsoft Word and Writer.
Here are four techniques you might like to use if you are maintaining templates: (a) using an unzip tool to rip open the Writer file format and get at the parts (b) using XSLT to automate production of a large set of styles (c) adding a keyboard-accessible menu to apply those styles, and (d) automatically generating a number of macros to help in (c). I will illustrate the techniques using my application, but they are easily adaptable to other situations.
But first, here's a bit of background. Tim Bray wrote earlier this year about the state of word processing applications for the web.
If everyone's going to write for the web (and it looks a lot of people are going to), we need the web equivalents of Word Perfect and Wordstar and Xywrite and Microsoft Word, and we need them right now.
Some discussion flowed around this, with some claiming that OpenOffice.org is an adequate solution right now (see Tim's addition to his page) and others speculating that a new application may be required. A wiki even appeared in which the issue could be discussed. I joined in the discussion and decided that OpenOffice.org and Word are both part of the solution until something better comes along, kicking off a project to create configuration layers for both Word and OpenOffice.org as general purpose XHTML editors for generic documents. In the course of the work, I have come to appreciate the open, XML-based goodies in OpenOffice.org which is just as well, because I would not like to customize it using the Graphical User Interface (GUI) or deal with its macro language, although I look forward to decent Python scripting, which appears to be on the way.
OpenOffice.org is an open source office suite, which includes a pretty decent word processor, Writer. Like any decent word processor, it has a number of customization options, and like any software, it has its own set of strengths and weaknesses. It does have a customizable XSLT stylesheet that can be used to generate XHTML from any word processing document, but this produces far from ideal output unless you go to some lengths to customize it, as it is simply impossible to produce sensible mappings from word processing documents to XHTML in all cases. Templates are a necessity to enable authors to work with a set of styles that will map to XHTML.(Another major issue is that unless you actually run the XHTML export stylesheet manually after you have saved the document in the normal way and extracted the content, you do not even get access to the images in your document. So at this stage, I consider the XHTML export to be a work in progress.)
Hack 1: Unzipping and Manipulating Writer Files
Let's start with the basics: the file format. You can read about it in detail in a
forthcoming O'Reilly title, which is available
online in draft OpenOffice.org XML Essentials—Using OpenOffice.org's XML Data
Format. We're only concerned with the Writer application here, rather than
spreadsheets and suchlike. We will be dealing with Writer documents .sxw
and
Writer templates .stw
, mostly the latter. These files are both actually ZIP
files containing all your document data, with all the configuration and textual content
in
XML.
Three Ways to Unzip Files Using Windows
|
First hack, a quick exercise:
-
Create a new OO.o text document and type in it, something like "Hello world".
-
Save your new one-paragraph epic as test.sxw.
-
Unzip the content. On a Unix-esque system (Windows users, see the sidebar), you can probably type this:
unzip -d test test.sxw
And you will be rewarded with some component files in a directory called "test":
extracting: test/mimetype inflating: test/content.xml inflating: test/styles.xml extracting: test/meta.xml inflating: test/settings.xml inflating: test/META-INF/manifest.xml
-
Open up the content.xml file in a text editor or an XML editor. This is where the, um, content of your document is kept.
-
Ignore everything except the part you just created. You'll find it in a
text:p
element, which is what? A paragraph. -
Duplicate your "Hello world" paragraph.
-
Save
content.xml
-
And re-zip it back together as an open office document, possibly by changing into
testdir
and typingzip -r ../newdoc.sxw *
to give you a new document callednewdoc.sxw
. -
If you have been careful not to break the document, then you will have a new Writer document with "Hello World" in it twice.
Now you're hacking OpenOffice.org. Why? You might like to automate some kinds of document processing, create documents, or in an extreme situation, make changes to a document when you don't have a copy of OpenOffice.org. Try that with a Word ".doc" file! (Actually, don't. See my previous article on how to turn Word documents into XML.)
Hack 2: Adding Styles to a Template
Next step is to do some real work, this time on a template. We're going to make a whole lot of styles. A style is a named set of formatting instructions, so you can make parts of your document look and function alike with the application of a single named label, rather than having to laboriously hand-format each part of the document. Instead of having to remember that all your headings are 18-point Helvetica, you assign a heading style to each and let the machine format them for you. This is (a) lazier, (b) easier to change when Helvetica goes out of fashion, (c) going to let you build a table of contents simply by harvesting anything labeled as a heading, (d) going to make generating XHTML easy, and (e) highly recommended.
So here's the spec for this application, where we want to transform Writer documents into XHTML. We need styles for headings, ordered and unordered lists with different flavors of numbering, block-quote styles for quoting blocks of text at different levels of indenting, and paragraphs that can be nested to continue a list item. Using these styles, we will be able to reliably create XHTML documents from both Microsoft Word and OpenOffice.org in a fairly consistent manner. Word processors are really only good at flat sequences of paragraphs, but we can use well-designed styles to create nesting for XHTML.
Family | Type | Styles names | ||||
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
Paragraph (p) | p | |||||
Heading (h) | h1 | h2 | h3 | h4 | h5 | |
Heading (h) | Numbered (#) | h1# | h2# | h3# | h4# | h5# |
List item (li) | Numbered (#) | li1# | li2# | li3# | li4# | li5# |
List item (li) | Bullet (*) | li1* | li2* | li3* | li4* | li5* |
List item (li) | Uppercase Alpha (A) | li1A | li2A | li3A | li4A | li5A |
List item (li) | Lowercase Alpha (a) | li1a | li2a | li3a | li4a | li5a |
List item (li) | Lowercase Roman (i) | li1i | li2i | li3i | li4i | li5i |
List item (li) | Lowercase Roman (I) | li1I | li2I | li3I | li4I | li5I |
List item (li) | Continuing paragraph (p) | li1p | li2p | li3p | li4p | li5p |
Blockquote (bq) | bq1 | bq2 | bq3 | bq4 | bq5 | |
Definition List | Term (dt) | dt1 | dt2 | dt3 | dt4 | dt5 |
Definition List | Description (dd) | dd1 | dd2 | dd3 | dd4 | dd5 |
I will leave detailed discussion of how this mapping from list styles to XHTML will be done for another time, but I do provide a couple of examples here so you get the flavor. The items in brackets are the style names that you would use in the word processor. The example would look pretty much the same in OO.o as it does here in XHTML give or take a bit; check out the source of this page to see the HTML:
-
(Style: li1*) A list bullet
-
(li1*) And another
-
(li2#) And a numbered item
(li2p) With a follow-on paragraph
-
(li2#) And another numbered item
-
-
(li1*) And another list item introducing a quote:
(bq2) From somebody else.
(These style names have been chosen for their brevity, regularity, and the fact that they do not overlap with built-in or "standard" styles in either OO.o or Word, making the job of converting between formats simpler.)
That's a lot of styles to set up using the point'n'click method, way too much like
work for me, so my approach was to create a blank template, open it up to see how
it worked, and then use XSLT to hack the styles.xml
inside a Writer template
file (.stw
) which contains, you guessed it, definitions of the styles for this
template. I did create the heading and plain-paragraph styles by hand using the GUI,
but the lists were too fiddly to do that way.
For this part of the exercise, we are going to be operating on a template rather than a document. To get a template:
-
Open a blank document in OO.o.
-
From the File menu, select Save As.
-
From the "Save as type" drop-down, select "OpenOffice.org. 1.0 Template".
-
Type a name,
template
, and the result will be new file calledtemplate.stw
.
Unzip the template into a directory called template (unzip -d template
template.stw
).
To add styles, we want to transform styles.xml
using a stylesheet which you
can get here.
-
Copy styles.xml to old-styles.xml
-
On my Fedora 2 Core Linux machine, the transformation is a matter of typing:
xsltproc --novalid add-styles.xsl old-styles.xml > styles.xml
See the sidebar for advice about how to run transforms using Windows.
Using XSLT from the Command Line on Windows
|
I will cover only the highlights of the XSLT template here.
The first thing we need to do is to add style definitions. We do this by finding the
beginning of the place where the outline styles are defined, using a template with
an
appropriate match
attribute, and slip in some other styles first.
<xsl:template match="text:outline-style"> <!-- Add new paragraph styles here --> <xsl:call-template name="make-styles"> <xsl:with-param name="family">li</xsl:with-param> <xsl:with-param name="type">*</xsl:with-param> </xsl:call-template>
This calls a named template make-styles
, which takes as parameters the family
and type of style, as set out in the table above. This template is used recursively
to
generate five levels of style definition.
The recursion starts with a default level parameter of 5, and then it calls itself,
passing
$level - 1
to the level parameter until at $level = 0
it stops.
The result is the same as a construct like a for-loop.
<xsl:template name="make-styles"> <xsl:param name="family" select="'li'"/> <xsl:param name="type" select="'*'"/> <xsl:param name="level">5</xsl:param> <xsl:param name="style-name" select="concat($family, $level, $type)"/> <xsl:choose> <xsl:when test="$level = 0"> <!--We're done--> </xsl:when> <xsl:otherwise> <!--Recurse--> <xsl:call-template name="make-styles"> <xsl:with-param name="level" select="$level - 1"/> <xsl:with-param name="type" select="$type"/> <xsl:with-param name="family" select="$family"/> </xsl:call-template>
Which is followed by the part that actually makes the style:
<style:style style:name="{$style-name}" style:family="paragraph" style:parent-style-name="Default" style:list-style-name="{$style-name}"> <xsl:choose> <xsl:when test="$family = 'dt'"> <xsl:attribute name="style:next-style-name"> <xsl:value-of select="concat('dd',$level)"/> </xsl:attribute> <style:properties text:space-before="{($level - 1)}cm" fo:margin-left="{($level - 1)}cm" fo:margin-right="0cm" fo:text-indent="0cm" fo:font-weight="bold" style:auto-text-indent="false"/> </xsl:when> <xsl:when test="$family = 'bq'"> <style:properties text:space-before="{$level}cm" fo:margin-left="{$level}cm" fo:margin-right="0cm" fo:text-indent="0cm" fo:font-style="italic" style:auto-text-indent="false"/> </xsl:when> <xsl:when test="$type = 'p' or $family='dd'"> <style:properties text:space-before="{($level)}cm" fo:margin-left="{($level)}cm" fo:margin-right="0cm" fo:text-indent="0cm" style:auto-text-indent="false"/> </xsl:when> </xsl:choose> </style:style>
This has a xsl:choose
to select different formatting for different families of
paragraph style. Bulleted and numbered styles don't get any formatting in this part,
as
their indenting and so on is set further down in the named template
make-lists
Hint: You can do a lot with OpenOffice.org via experimentation; use the GUI to set up some styles, save the document, and have a peek inside to see what happens. Then you can extract the relevant bits and use them in stylesheets or other code.
Writer not only has styles for paragraphs and sub-paragraph text-spans, but it has separate styles for lists. This can cause a few headaches, because the correspondence between the two is a bit fluid. You can link a paragraph style to a list style, but that does not prevent you from later choosing a different list style. And more problematically, each list can have multiple levels. (Yes, I have heard of conditional styles, and no I don't think they will help in this case).
For the project I'm presenting here, the two goals are to (a) inter-operate with Microsoft
Word, via Word .doc
files, which Writer is fairly good at reading and writing,
and (b) create a template that can later be used to create good-quality XHTML. OO.o's
list
styles will cause problems for Word, which has a tighter mapping between list levels
and
paragraph styles and a looser way of combining them. There will also be trouble when
creating XHTML. The problem is that in 'normal' use of OO.o, it is very easy to end
up with
paragraphs that are not formatted exclusively with styles. For example, if you want
to mix
unordered lists and blockqoutes, then you could end up with a very complex set of
interactions between list and paragraph styles and custom formatting that a stylesheet
may
not be able to reliably decode.
So, my approach is to try to work with a one-to-one mapping between paragraph styles and list styles. This is a compromise, but it means that authors can work with paragraph styles exclusively. This is achieved by creating a list for each of our paragraph styles that has bullets or numbering and then setting all the levels in that list to have the same formatting, so that it does not matter if they inadvertently get changed.
Working with Lists in WriterWhen you have the insertion point inside a list, two things happen that you need to be aware of:
Finally, sometimes applying a paragraph style that is linked to a list style does not have the desired effect. In this case, you may need to click on the Numbering On/Off or Bullets On/Off buttons a couple of times to clear an existing list. |
The final step in this hack is to re-constitute the template. Zip the contents of the directory back into a template:
cd template zip -r ../new.stw *
Open the resulting new.stw
using writer, via File / Templates / Edit (not via
File / Open, which will create a new instance document).
An alternative technique you might like to consider is to import styles from your new template into an existing one--meaning you could maintain several templates containing discrete sets of styles (lists, headings, character styles). To import, use Format / Style / Load, and browse to a file. You can select which kinds of styles to import and whether to overwrite existing ones.
We have now covered two techniques for OpenOffice.org customization: unzipping documents and templates and adding styles by hacking the styles file.
Hack 3: Adding a Styles Menu with Keyboard Shortcuts
Now that we have a new template, it is possible to apply the new paragraph style,
using
the 'Stylist' (hit F11
to toggle it on and off), and largely ignore the list
Styles unless you get into trouble. But applying styles in OO.o is painful. There's
no
simple way to map styles to keystrokes, and even the stylist does not let you use
the
keyboard to help select the style you want. The next stage is to show how you can
add a new
'Style' menu to the application, with keyboard shortcuts.
The first problem is that there is no way to add a style to a menu. First we have to add a macro and then call that from the menu. And not just one macro--we need a macro for each and every style. Fortunately, we can automate this process. We will tackle the problem by starting with the menu, then using the menu to generate the required macros. This approach means that if you want to hand-code all or part of a menu, you can still use the stylesheet here to generate macros for each style mentioned in the menu.
This is what the new menu will look like:
A new styles menu with keyboard access via ALT key combinations.
OO.o has a configuration system for changing menus. It is very hard to use, and poorly implemented, so we will spend as little time in it as possible. All we need to do is make one small change to the main menu, and OO.o will save it as XML in the configuration directory, at which point you can grab it and hack it using XSLT, or manually add to it.
- Open Writer.
- From the Tools menu choose Configure.
- Click the Menu tab.
- Hit the New Menu button.
- Close the dialog box.
What you have just done is make a change to OO.o's configuration, which it will write out into a configuration directory. Where that is will depend on your operating system. To find out where:
- From the Tools menu, choose Options.
- Under OpenOffice.org, in the list of categories at the left, select Paths.
- Find and note-down the path for User configuration.
- Close 00.o completely, including the quick start application it leaves in the Windows system tray.
- Find the user configuration directory you just wrote down, and there should be a file called menubar.xml
I have supplied a sample stylesheet that
works in a way that is very similar to the setup stylesheet covered earlier. It generates
a
hierarchical menu of each of the families of styles, adding them to the old menu bar
and
spitting out a new menu bar. Parts of this are hard-coded to provide the menu hierarchy,
but
there are recursive parts to handle the repetition involved in creating all those
macro
calls. Here is a fragment of the stylesheet that creates the menu for 'li' styles;
there is
a level
parameter used here as in the previous stylesheet.
<menu:menu menu:id="slot:{$level}" menu:label="Level ~{$level} - li{$level}"> <menu:menupopup> <menu:menuitem menu:id="macro://./Standard.WPInteropStyles.li{$level}bull()" menu:label="Bullet {$level} - li{$level}~*"/> <menu:menuitem menu:id="macro://./Standard.WPInteropStyles.li{$level}num()" menu:label="Numbered {$level} - li{$level}~#"/> <menu:menuitem menu:id="macro://./Standard.WPInteropStyles.li{$level}p()" menu:label="Paragraph {$level} - li{$level}~p"/> ... </menu:menupopup> </menu:menu>
This stylesheet is designed to load, via document()
, a data file (wp-interop-styles.cml)
containing names for all the character or sub-paragraph text styles. I generated this
list
by grabbing all such element names from the XHTML recommendation and putting them
into an
XML data file. (To use this as-is, you will have to either set these styles up by
hand,
download the latest sample template from my web
site, or add to the setup stylesheet covered earlier.)
The ~
character is used to indicate the appropriate keyboard shortcut.
-
Rename
writermenubar.xml
toold-writermenubar.xml
-
Run the stylesheet:
xsltproc --novalid generate-menubar.xsl old-writermenubar.xml > writermenubar.xml
adjusting the paths to the various files as necessary.
Now we have a new menu for OO.o which will always be visible. If you start up Writer, (remember to shut down OpenOffice.org completely first) then you will be able to point and click to apply styles or use the keyboard, starting with ALT-S and then hitting the underlined characters to delve into the menus (at least you will once we install the macros needed to apply styles).
Hack 4: Generating Macros
The final stylesheet uses the menubar we just generated as its input and creates a text-output (not XML) that can be pasted into OO.o's macro editor.
To install then macros generated by our stylesheet:
-
Run the make-macros.xsl stylesheet:
xsltproc --novalid make-macros.xsl writermenubar.xml > macros.txt
This creates a main subroutine called
SetStyle
which takes a style name as an argument and applies the style. -
Open the template, via File / Templates / Edit (not via File / Open, which will create a new instance document).
-
From the Tools menu choose Macros, then Macro...
You should now be looking at a tree control, showing the various documents and templates you have open.
-
Click on new template / Standard to select the standard library of macros. This will probably be empty depending on your starting setup.
-
Click New, to create a new module, and name it
WPInteropStyles
.You should now be looking at a macro-editor window.
-
Paste the contents of
macros.txt
into the macro editor, replacing all the boilerplate code that's in there. -
Clickety-click your way out of the macro editor, and save the template.
A bit of detective work will show you where the macros live within the file format once you save. Hint: look in META-INF/manifest.xml to see where your macros are stored within the Writer file-package.
In this article, I have covered a few techniques that will be of interest to template maintainers working with OpenOffice.org writer: how to crack open the file format, how to maintain large sets of styles, and how to customize menus and macros, all without using anything except standard tools, zip, an XSLT processor, and a text editor. All this can, of course, be further automated with a programming language of some kind, even a batch file. There are some changes coming in version 2 of OpenOffice.org, but all these techniques will be forwards compatible, although some things like the location and name of the menu-bar files look like they will change.