Write Once, Publish Everywhere
August 16, 2000
It's 2:00 p.m. -- I am surfing the Web. Suddenly, I remember that I have an airplane to catch at 5:00 p.m. Driven by a sudden adrenaline rush, I log onto an XML server, read the project progress report, and finally decide to print it and read it later on in the airplane.
3:00 p.m. -- Gee ... time flies, I am stuck in the traffic. Will I be late for the plane? I pick up my mobile phone and tell it to browse to our corporate XML server to grab some e-mail. "Acme Corp." I say, the phone replies by displaying the login screen. Keeping a eye on the traffic jam in front of me, I enter my user ID and password, then, I select the e-mail option and the phone displays the message subject list on its WML mini-browser.
3:30 p.m. -- I am not so far from the airport but the traffic is moving slower than the guy I just saw walking beside the street. I am getting more and more nervous about missing my flight. Again, I pick my mobile phone and call Tellme (1-800-555-8355). The Tellme virtual host welcomes me with a joyful "Tellme." I reply immediately by saying "traffic," and I learn that the traffic jam is caused by an accident. I need to make a rapid decision. I say to my mobile phone "MapQuest," and it brings me to the MapQuest site. I enter the info with my two thumbs, thinking that WAP phones really need some improvements in human factors. (I said the same thing in 1978 about my first microcomputer). Be patient, my inner voice says, trying to bring some wisdom to the situation. But the adrenaline flows and reminds me that the urgency is not resolved. Is there any alternative route? MapQuest seems to suggest one. OK, I can take the next street and follow the instructions until the airport. Yes! No traffic jam there.
4:05 p.m. -- I am finally at the airport....
In just two hours, our hero used landline and wireless devices to find the information he needed. He browsed the Web with his fingers, thumbs, and his voice. He used, in fact, three different browsers to access the Web:
- A plain old HTML browser, running on a wired PC
- A WML mini-browser, incorporated in a mobile phone
- A VoiceXML browser, located somewhere on the West Coast
The Challenge
In the 20th century, the challenge was to create an electronic publishing infrastructure. As a by-product, we also gained a new application infrastructure.
In the 21st century, the challenge is to adapt to the new pervasive Internet. We are no longer restricted solely to PCs connected by a wire to the Web. As developers, we no longer have to deal with a world solely dominated with Windows. We have to adapt our content and applications to a new plethora of devices. This new wave comes, surprisingly, through the phone, and most particularly the mobile phone.
Mobile phones are now equipped with WML mini-browsers. The same mobile phone can be used, as well, as a voice-browsing device. Voice browsing became feasible because of the tremendous improvements over recent years in voice recognition and voice synthesis technologies. The limiting form-factor of the palm created a market for the palm-top computers. It seems that smart phones and palm computers may collide in the same market space if phones become small computers and small computers become phones.
The computer of the future understands what we say, will talk to us, show us images and movies, and fit in a palm. The keyboard is not a natural extension of the human body. After all, we do not communicate naturally with a keyboard, but with our voice.
One might call it an evolution -- that started with huge computing devices taking up big rooms, rapidly morphing into interactive devices fitting into a palm. But, for the immediate future, let's focus on the first step of this technology: delivering our content on an HTML browser, a WML browser, and finally on a VoiceXML browser.
Our Project
Our task is to create interactive applications using XML technologies. These applications should be accessible on three different devices: the telephone (VoiceXML), mobile phone mini browsers (WML), and finally, PC browsers (HTML).
To realize this vision, we will create abstract models and encode them in XML. Then, we must recognize the device and transform these models into the appropriate rendering format. The rules of the game are to use, as much as possible, XML technologies which are freely available, and also to restrict our scope to XML technologies documented by a public specification.
Each rendering device is limited by some form factors. So, before exploring the technologies used, let's first explore these devices' form factors.
The Form Factor
Each device imposes some limits on interaction. Often these limits are imposed by the form factor. For instance, a 21-inch computer screen can display more information than a phone's screen. Also, a phone comes with a limited set of keys, and is more adapted to vocal interaction than to text entry.
We can say that the phone's form factor makes it more adapted to aural interaction, and that the PC's form factor makes it more adapted to visual interaction.
There is also the possibility that palm-top computers equipped with small screens (but bigger than phone screens), and with a small and discreet headset, may merge the visual and aural worlds more effectively than today's devices. Who will do that? Maybe a phone company learning about how to add the visual world to their aural device. Or a palm-top company learning how to add voice interaction to their visual world. Who knows?
Enough philosophy. Ready for the lab?
Welcome to Didier's Lab
In the Didier's lab series, we'll experiment with XML technologies. We'll discuss new recommendations from the W3C and test them in real-life applications. In the coming weeks, we'll build an XML site one piece at a time. We'll try, as much as possible, to use the XML technologies as documented by the W3C and other working groups and consortiums.
This Week's Goal: The Login Process
This week, we focus mainly on the login operation for our multi-device portal site. The goal of the login process is to authenticate users. When a user is authenticated, the content displayed to the user can be personalized or, according to the user's permissions, restricted.
We aim to provide a login mechanism available on plain old phones, WAP devices, and HTML browsers. To do so, we need to create a device-independent model for expressing the login document. This model will be transformed into, respectively, a VoiceXML form, a WAP deck, and an HTML form.
First Task: Create an Abstract Model of the Login Form
Is there anything out there to help us encode such a form model? You bet, we have the XForm specifications freely and publicly available at http://www.w3.org/MarkUp/Forms/. XForms are structured in three layers as illustrated below:
We'll only use the part defined in a working draft, the data model. Its specification (a working draft) is available at: http://www.w3.org/TR/xforms-datamodel/.
On devices running WML mini-browsers or HTML browsers we'll offer the ability to register and log in to the services offered by our XML prototype server. For devices running a VoiceXML browser, we'll only offer the login, mainly because it is not very convenient to register via voice-only. Thus, our users can register and access our server through a WML or HTML browser, and also access the service through a VoiceXML browser.
WML is not very efficient for registration. The best solution is probably to use the HTML browser for registration. For this week's experiment, the user will be presented with the choice to log in and register via both the WML and HTML browsers, and only to log in via a VoiceXML browser.
If the user can choose between the registration and login process, we have then to create a menu model. To encode the menu we'll use an other XML technology defined in a public W3C specification: XLink. You can find the latest specification (a candidate recommendation) at http://www.w3.org/TR/xlink/.
As we all know, there's generally more than one solution for any particular problem. The one we will apply involves using the XInclude technology to adapt our documents for each particular device. This solution has the secondary advantage of reducing style sheet dependencies on the particular device profiles.
We'll show how the XInclude and XLink behavior-inheritance feature leads to an extensible framework. We'll also learn how XInclude's basic behavior can be expanded to support conditional inclusion. The login/registration abstract model will be dynamically constructed using an extension of the Xinclude technology.
The basic rule is: If the user agent is a VoiceXML browser, then the model includes only the login form. If the browser is a WML browser, then the model includes a menu and two forms (a registration form and a login form). The same applies for an HTML browser.
We pack the login and registration forms in the same document because a WML browser can store more than one card. A card is the equivalent of a mini web page.
Basically, we reduce the style sheet code complexity by giving the device adaptation task to the XInclude and XSLT engines. So far, so good. Now let's take a look at the main document. This is the XML server homepage.
The login.xml
document has several inclusion references to the
following documents:
-
logIn_signIn_menu.xml
-
logIn_form.xml
-
signIn_form.xml
It looks like this:
<xdoc xmlns:xinclude="http://www.w3.org/1999/XML/xinclude" xmlns:xlink="http://ww.w3.org/TR/xlink"> <xpart xinclude:href="logIn_signIn_menu.xml" xinclude:parse="xml"> <filter> <device-profile format="wml html"/> </filter> </xpart> <xpart xinclude:href="signIn_form.xml" xinclude:parse="xml"> <filter> <device-profile format="wml html"/> </filter> </xpart>
<xpart xinclude:href="logIn_form.xml xinclude:parse="xml"> <filter> <device-profile format="wml html vxml"/> </filter> </xpart> </xdoc>
The <xdoc>
element is the document's root element. The
document contains document parts or <xpart>
elements. Each
<xpart>
element is an XInclude element.
The XInclude characteristics are inherited by any element incorporating the XInclude attributes. XInclude and XLink features are both inherited characteristics. Any element can inherit the Xinclude or Xlink characteristics by incorporating the xinclude or the xlink attribute. Of course, not both at the same time.
In the login.xml
document, the XInclude processor will replace
any element having the xinclude:href
attribute set to a URI. The document
fragment (such as an element) located at this URI is included in the login.xml
document. So, in this instance, the first document fragment to be included is the
menu,
encoded as an extended XLink element.
The XInclude processor finds the first occurrence of an XInclude enabled
element in the login.xml
document. Its xinclude:href
attribute
points to the logIn_signIn_menu.xml
location. The xinclude:parse
attribute instructs the XInclude processor to parse the XML document before including
it.
(The external document fragment can be included as text if the xinclude:parse
attribute is set to "text"). After the first Xinclude element has been processed,
the login.xml
document looks like this:
<xdoc xmlns:xinclude="http://www.w3.org/1999/XML/xinclude" xmlns:xlink="http://ww.w3.org/TR/xlink"> <netfolder xlink:type="extended" xlink:title="Didier's Lab experiment" id="menu"> <resource xlink:type="locator" xlink:title="Sign in" xlink:href="#SignInForm"/> <resource xlink:type="locator" xlink:title="log in" xlink:href="#LogInForm"/> </netfolder> <xpart xinclude:href="signIn_form.xml" xinclude:parse="xml" > <filter> <device-profile format="wml html"/> </filter> </xpart> <xpart xinclude:href="logIn_form.xml" xinclude:parse="xml"> <filter> <device-profile format="wml html vxml"/> </filter> </xpart> </xdoc>
The xpart
element adds some more functionality to the basic
XInclude characteristics:
-
The
filter
element is an XInclude add-on of our invention allowing us to set conditional inclusions. For instance, in thelogin.xml
document, the device profile is used to trigger or inhibit the inclusion process. In the example above, the firstxpart
element will be replaced by the referred document fragment only if the user agent is a WML or HTML device. Otherwise, the inclusion process is not performed. Thus, with the filter element, the XInclude element can be an inclusion or an omission mechanism. The filter introduces some environmental dependencies to the inclusion process.
The added filter uses the device profile to decide if a particular document fragment is included. The device profile is determined when the XML server recognizes the user agent. The user agent can be recognized by the HTTP user-agent HTTP header. But keep in mind that just with the WAP/HDML world, you'll have to cope with about 575 different user agent strings. If you count the HTML, WML, HDML, and VoiceXML browsers, the XML server will have to recognize more than 600 different user agent identifiers. To limit the experiment's complexity, let's assume that the XML server identified the user agent capabilities and set the device profile accordingly!
Now that we have created a form model, and adapted this form model to the particular device connecting to the server, we need to transform the XML data model into:
-
VoiceXML for plain old telephones
-
XML for WML mini browsers
-
HTML for PC browsers
We'll cover this step in next week's article.