Entity Declarations, Attributes and Expansion
August 28, 1998
Entity Declarations
Entities must be declared before they can be used. They may be declared in the DTD, if your XML parser processes the DTD (also known as the external subset), or the internal subset. Note: if the same entity is declared more than once, only the first declaration applies and the internal subset is processed before the external subset.
All entities are declared with the "ENTITY" declaration. The exact format of the declaration distinguishes between internal, external, and parameter entities.
Declaring Internal Entities
An internal entity declaration has the following form:
<!ENTITY entityname "replacement text">
You can use either double or single quotes to delimit the replacement text. The declaration of yoyo, mentioned earlier, would be:
<!ENTITY yoyo 'Yoyodyne Industries, Inc.'>
Declaring External Entities
External entity declarations come in two forms. If the external entity contains XML text, the declaration has the following form:
<!ENTITY entityname [PUBLIC "public-identifier"]
SYSTEM "system-identifier">
The system identifier must point to an instance of a resource via a URI, most commonly a simple filename. The public identifier, if supplied, may be used by an XML system to generate an alternate URI (this provides a handy level of indirection on systems that support public identifiers).
An external entity that incorporates chap1.xml into your document might be declared like this:
<!ENTITY chap1 SYSTEM "chap1.xml">
Despite the growing trend to store everything in XML, there are some legacy systems that still store data in non-XML formats. Graphics are sometimes stored in odd formats like PNG and GIF, for example ;-).
External entities that refer to these files must declare that data they contain is not XML. They accomplish this by indicating the format of the external entity in a notation:
<!ENTITY entityname [PUBLIC "public-identifier"]
SYSTEM "system-identifier" notation>
See the section called Entity Attributes for more detail. An external entity that refers to the GIF image pic01.gif might be declared like this:
<!ENTITY mypicture SYSTEM "pic01.gif" GIF>
Declaring Parameter Entities
Parameter entity declarations are identified by a % preceding the entity name:
<!ENTITY % pentityname1 "replacement text"> <!ENTITY % pentityname2 SYSTEM "URI">
Note the space following the % in the declaration. Parameter entities can be either internal or external, but they cannot refer to non-XML data (you can't have a parameter entity with a notation).
Entity Attributes
External entities can be further classified as either "parsed" or "unparsed". Entities which refer to external files that contain XML are called "parsed entities;" entities which refer to other types of data, identified by a notation, are "unparsed."
The parser inserts the replacement text of a parsed entity into the document wherever a reference to that entity occurs. It is an error to insert an entity reference to an unparsed entity directly into the flow of an XML document. Unparsed entities can only be used as attribute values on elements with ENTITY attributes.
Unparsed entities are used most frequently on XML elements that incorporate graphics into a document. Consider the following brief document:
<!DOCTYPE doc [ <!ELEMENT doc (para|graphic)+> <!ELEMENT para (#PCDATA)> <!ELEMENT graphic EMPTY> <!ATTLIST graphic image ENTITY #REQUIRED alt CDATA #IMPLIED > <!NOTATION GIF SYSTEM "CompuServe Graphics Interchange Format 87a"> <!ENTITY mypicture SYSTEM "normphoto.gif" GIF> <!ENTITY norm "Norman Walsh"> ]> <doc> <para>The following element incorporates the image declared as "mypicture":</para> <graphic image="mypicture" alt="A picture of &norm"/> </doc>
You could also declare the image attribute as CDATA and simply type the filename, but the use of an entity offers a useful level of indirection.
Entities in Attribute Values
There is a somewhat subtle distinction between entity attributes and entity references in attribute values. An "ordinary" (CDATA) attribute contains text. You can put internal entity references in that text, just as you can in any other content. An ENTITY attribute can only contain the name of an external, unparsed entity. In particular, note that it contains the name of the entity, not a reference to the entity.
Entity Expansion
Section 4.4 and Appendix D of the XML Recommendation describe all the details of entity expansion. The key points are:
-
Character references are expanded immediately. They behave exactly as if you had typed the literal character.
-
Entity references in the replacement text of other entities are not expanded until the entity being declared is referenced. In other words, this is legal in the internal subset:
<!ENTITY foobar "&f;bar"> <!ENTITY f "foo">
because the entity reference "&f;" isn't expanded until "&foobar;" is expanded.
-
Parsed entities are recognized in the body of your document, where unparsed entities are forbidden. Unparsed entities are allowed in entity attributes, where parsed entities are forbidden.
-
Although you can put references to internal entities in attribute values, it is illegal to refer to an external entity in an attribute value.
Caveats
A couple of significant caveats apply to the use of entities:
-
Non-validating parsers are not required to resolve entities declared outside the document (in the external subset). In fact, non-validating parsers may not perform entity expansion at all.
-
At this time (August, 1998), it's not clear to what extent mainstream web browsers will support entities.
Conclusion
Entity references, while they can perhaps be a little tricky, offer a number of benefits:
-
The ability to define commonly used text in a single location.
-
The ability to break large documents up into workable modules.
-
They offer one possible foundation for a reuse strategy.
XML Q&A covers a variety of topics, dictated by you, the viewer. Please share your questions and suggestions for things you'd like to see covered to xmlqna@xml.com.