XML Portal Content Aggregation
May 15, 2000
Enterprise portals are intended to provide users with a single point of access to a variety of content aggregated from within the enterprise, as well as from business or trading partners and the Web. XML provides an excellent language for representing both structured and unstructured information. The feature of extensibility, the ability to arbitrarily define new markup languages, enables XML to represent an infinite number of data types. We are moving toward a time when users of popular application and database software will have the ability to save information as XML. Until this becomes the norm, organizations interested in aggregating disparate data using XML will be faced with purchasing or developing custom "connector" applications that convert legacy formats to XML: a step that is necessary for giving users the ability to "work" with the data.
As an alternative to converting non-XML content to XML, organizations can utilize XML messaging to transport content in its native format to a portal server. In this case, the XML message can extract metadata along with the content. From the metadata, a portal server can index the content, and give users the ability to retrieve the native data or document for viewing through the portal. This would be facilitated through an information exchange application, such as Inso’s Outside In Server, which recognizes and displays over 225 data formats as HTML. Outside In is a popular application for portals aggregating disparate content that is either not easily converted to XML or is valuable as "read only" information.
XML messaging is also important for transporting XML data to and from the portal. Sequoia Software’s "EXTRA" schema, which was recently published on BizTalk.org, is an example of an XML message strategy designed to facilitate information integration with portal servers. The EXTRA schema, based on the BizTalk framework, provides an intelligent transport mechanism for routing information between the portal server and connected applications.
About EXTRA
The EXTRA schema allows the capabilities unique to a portal server, such as data entry, aggregation of data, document routing, and document access control to be extended to external data sources such as legacy databases. XML messaging transports data from multiple external sources into the portal server, and once in the portal, that data is not only indexed and searchable, but is available to be utilized by portal functionality. This is all made possible by encapsulating information in XML messages and passing them between systems as BizTalk messages.
EXTRA messages consist of the BizTalk routing information, portal instructions, and the actual content. BizTalk routing information provides the necessary information about the message sending and receiving applications. The EXTRA schema defines the Packet portion of the message, which contains instructions for the portal. This Packet contains a set of actions that can be performed in the portal, metadata about the content, and the content itself. The EXTRA schema provides definitions for saving, indexing, updating, and deleting documents (the content), as well as functions for updating metadata related to a specific document (this metadata can be used by the portal to index the content). The metadata will be used later to search for this content inside of the portal.
To provide the portal with the necessary information for storing the content of the message, characteristics of the content, like the source application, description of the content and the format of the content can be stored in this part of the message. The application sending the message into the portal provides the values for these characteristics. The content format can be XML, text, or binary files, like Microsoft Word or Excel. The BizTalk specification recommends base 64 encoding for message content to allow these binary formats, like a Microsoft Word document, to be transferred using XML messaging.
The EXTRA schema makes it possible to interact with both structured and unstructured data from inside of the portal server. No matter how the data is stored or where it is extracted from, XML messaging provides the format for getting information into the portal.
Use Case — Unstructured Data
A portal interface is ideal for providing users access to content from legacy applications
with difficult-to-use or unfamiliar Web interfaces. To make this possible, the portal
must
first be populated with data from the legacy application, which can be facilitated
using
BizTalk and EXTRA schemas. A sample message is shown below. The BizTalk tags provide
the
routing information and the EXTRA tags are inside of the <body>
tag. The
<save>
tag defines the original data source of the application, the
document type and the file type of the content. The meta-data is populated in this
section.
In this example, there are multiple <indexfield>
tags that define the
data that can be used to search on this content. The content tags in this example
message
contain base 64 encoded data that will be decoded by the portal server. In this case,
the
content is a scanned document stored in a TIFF format. The document would need to
be encoded
and added to the XML message. The portal server would need to decode the content before
storing it in the portal. This would give the user the ability to search on this document
and view it with in the portal. In the example of a TIFF file, the Outside In Server
from
Inso could render this information inside the browser.
<?xml version="1.0"?>
<biztalk_1 xmlns="urn:biztalk-org:biztalk:biztalk_1">
<header>
BizTalk Routing Information …
</header>
<body>
<packet xmlns="http://schemas.biztalk.org/sequoiasoftware_com/myaxudtv.xml">
<save id="41" datasource="LegacyDB" doctype="Patient_Data"
filetype="TIFF" mode="1">
<key name="Client_ID">12328282</key>
<key name="LastName">Pickett</key>
<indexfield name="LastName">Pickett</indexfield>
<indexfield name="SSN">213-22-1111</indexfield>
<content encoded="yes">
<![CDATA[VGhpcyBpcyBzYW1wbBiZSB1c2...]]>
</content>
</save>
</packet>
</body>
</biztalk_1>
Use Case — Structured
A February 29, 2000 report by Ron Exler of the Robert Francis Group, XML Catches Fire, notes, "Efforts to establish XML ‘glossaries’ for vertical industries are proceeding at a breakneck pace. It appears XML is emerging as a key component of many business-to-business e-commerce undertakings. Importantly, many of these standards groups are driven not by vendors but by end users. Such efforts are more likely to succeed in defining appropriate industry definitions in a timely manner than vendor-driven groups." With more and more data formatted as XML, BizTalk messaging becomes an effective and attractive method for transferring data between sending (host) applications and the receiving application (the portal server). This scenario will be especially relevant as companies use the portal to support e-business processes with trading partners, where the portal will digest and present content required to support material and informational transactions.
The sample message below shows a record from a database saved as a XML file, with
the XML
structure stored in the <content>
tag. With XML data, the portal can take
advantage the index fields for structured searches using an XML query language (XQL).
The
portal user can create a more detailed search for the data using tag and value pairs
to find
the data they need. An additional attribute in the <save>
tag,
mode
, allows for different relationships to be defined between the portal and
the external database. For example, these modes have been implemented in Sequoia Software's
XML Portal Server to allow one-way or bi-directional channels to be established and
maintained between the portal and the external data source. Thus, data can persist
in the
portal or an interactive relationship can be established, allowing the portal to update
the
external data source directly.
<?xml version="1.0"?>
<biztalk_1 xmlns="urn:biztalk-org:biztalk:biztalk_1">
<header>
BizTalk Routing Information …
</header>
<body>
<packet xmlns="http://schemas.biztalk.org/sequoiasoftware_com/myaxudtv.xml">
<save id="41" datasource="Database" doctype="Patient_Data"
filetype="XML" mode="1">
<key name="Client_ID">12328282</key>
<key name="LastName">Pickett</key>
<indexfield name="LastName">Pickett</indexfield>
<indexfield name="SSN">213-22-1111</indexfield>
<content encoded="no">
<![CDATA[
<client_record>
<client_id>12328282</client_id>
<firstName>William</firstName>
<lastName>Pickett</lastName>
<address>
<street>5457 Twin Knolls Rd.</street>
<city>Columbia</city>
<state>MD</state>
<zip>21045</zip>
</address>
<phone>
<work>410 666-7777</work>
</phone>
<date>01/06/1999</date>
</client_record>]]>
</content>
</save>
</packet>
</body>
</biztalk_1>
Conclusion
At Sequoia, we have developed the EXTRA schema to provide the necessary elements for bringing both structured and unstructured content into a portal server, and when desired, for establishing and maintaining dynamic relationships between the portal server and external data sources. Our portal server has been designed to not only leverage XML for content aggregation and management (indexing, rules processing, etc.), but also for exchanging data with disparate distributed applications using XML messaging. XML messaging provides an effective method of overcoming the challenges of enterprise integration for exchanging data in XML or in other formats. More importantly, it also levels the playing field for businesses needing to exchange information in support of business-to-business e-commerce.