Displaying XML Documents on the Web

By John Prince
Lonestar Chapter

eXtensible Markup Language (XML) is one of the current buzzwords in the Technical Communication industry. Many feel that XML will make as much of an impact on the Internet as did HTML. Articles have been published to provide overviews of—and peak interest in—XML. However, because XML's power also means complexity, it's difficult to explain everything you need to know about XML in one article. Instead of only focusing on the advantages of XML, this article gets you started on the road to actually learning XML by teaching you how to write a simple XML document, display it in a browser, and format it for viewing on the Internet.

The first thing you should know is that XML is not an object-oriented programming language like Java or C++, nor is it a scripting language like JavaScript or VBScript. In fact, XML contains no procedural or functional capabilities. It's a meta-markup language—a system for defining languages. In short, you create custom "tags" and then structure those tags according to certain principles.

XML's real power is that it separates content from presentation, which allows you to write one XML source document to display and use in a variety of ways. While HTML presents visual and auditory elements, XML describes content. For example, in HTML the heading 2 tag only determines how to display text. An XML tag particularizes specific information. After XML delivers data to the desktop, the data is parsed and can be edited and/or presented in multiple views without return trips to the server. XML can also describe data in a variety of applications allowing for a language and platform-neutral way to exchange data.

Why should you learn XML? This question is similar to one that was asked about five years ago in our industry, "Why should I learn HTML?" As e-commerce continues to grow, so will XML. A wide variety of XML-based languages are already in use. This includes Channel Definition Format (CDF), Resource Description Framework (RDF), and Chemical Markup Language (CML). As e-commerce companies begin using XML to exchange information from server to server, Technical Communicators will be needed to work with XML in some form.

Creating an XML Document and Displaying it in a Browser

While there are several specialized XML editors available (XML Notepad, XML Spy, and Exml), an XML editor is really not needed. You can write an XML document in any text editor (such as Notepad or WordPad). Figure 1 is an example of XML markup concerning the recent STC conference in Orlando:

<?xml version="1.0" standalone="yes"?>
<STC_CONFERENCE>
  <CONFERENCE_TITLE>
   Renaissance Communicators - A Vision of Our Future
  </CONFERENCE_TITLE>
  <LOCATION>
     <CITY>Orlando</CITY>
     <STATE>Florida</STATE>
  </LOCATION>
  <CONFERENCE_DATES>
       <MONTH_NAME>May</MONTH_NAME>
       <START_DATE>22</START_DATE>
       <END_DATE>25</END_DATE>
       <YEAR>2000</YEAR>
  </CONFERENCE_DATES>
</STC_CONFERENCE>

The first line is the XML declaration. It's a processing instruction that states the document is an XML file and that it's a standalone document (it doesn't need to read other documents to function). The <STC_CONFERENCE> tag is the root element. All of the subsequent tags are either its child or descendant elements. By looking at this data, you can see that it has a hierarchical structure. The element contains child elements that provide information about the location of the conference, such as the city and state. The element contains child elements that describe when the conference begins and ends . . . and so on.

After you incorporate the markup, the file must be saved as an .xml file. You can then open the document in a browser that supports XML documents (such as IE5), as seen in Figure 2:


Figure 2 - XML Source in Browser

Clicking a parent element expands or collapses the element to show or hide its child element(s). If the browser does not display the document, it means the document cannot be parsed because it isn't well-formed (correctly marked up according to XML standards).

Although the document is properly structured, it's not aesthetically pleasing. So how do you format it so that it looks like nice? With style sheets, which contain the rules for displaying an XML document (or instance) in a browser. What's beneficial about this is that you can present an XML document in several different ways just by changing the style sheet. Or, you can create many different style sheets to display a single XML document for specific purposes (print, screen display, PDAs, etc.).

You can do this by creating a Cascading Style Sheet (CSS), or by using eXtensible Style Language (XSL).

Using Cascading Style Sheets

As with XML, you can create a CSS file in any text editor. Figure 3 displays simple syntax for a CSS file that we'll use to display part of the XML document created in the preceding section:


Figure 3 - CSS Source Code
CSS associates particular formatting for the elements in the XML document. In most cases, when you assign a style rule to a parent element for display, such as the <LOCATION> element, all of its child elements use the associated format. Rules can also be assigned to a single child element, such as the <CITY> element. After the syntax is incorporated into the text editor, save the file as a .css file. However, the .css file must know which document to format. To do this, you must attach the style sheet to the XML document by opening the XML instance and inserting the following directly under the XML declaration:
<?xml-stylesheet type="text/css" href="filename.css"?>

This is a processing instruction that instructs the browser to apply the style sheet. The type attribute is the MIME type of style sheet. In this case its value is text. The href attribute's value declares where the style sheet is located. After saving the changes to the .xml file (and refreshing your browser if necessary), the document is displayed as seen in Figure 4:


Figure 4 - XML+CSS

One advantage of CSS is that it's fairly easy to learn and can format content for hardcopy and the Web as nicely as any layout application on the market. However, because CSS is not an XML language, you'll have to locate other resources to learn its syntax. You'll need to understand things like classes, psuedo-elements, display properties, etc., to really make your XML documents sing with CSS. Another advantage to using CSS is that major browsers support it. Even though CSS works as well with XML than HTML, it doesn't allow you to really use XML to its full capacity. There is a style language for XML documents that's superior to CSS. It's called XSL.

Using XSL

XSL is an XML application that is composed of two languages: XSL transformations (XSL-T) and XSL formatting objects (XSL-FO). While they do compliment each other, it's important to note that they can function independently. XSL is extremely powerful because it allows you to include programming statements to dictate how a document is rendered. As with CSS style sheets, you'll need to provide an XSL style sheet declaration in your XML document.

XSL Transformations

XSL-T gives you the ability to define rules for how one XML document is transformed into another XML document. One of the most common uses of this is to transform an XML document into well-formed HTML for rendering in a browser. The transformations are similar to server-side includes in HTML, but the actual transformation of the XML document and XSL style sheet take place on the client desktop instead of the server. The XSL style sheet works like a template selecting elements and/or the values of element attributes from the XML instance. The rules you define in the XSL style sheet determine what happens to the XML data.

For example, Figure 5 is basic XSL code to display the XML instance in a browser.


Figure 5 - XML+XSL-T

There are myriad of ways that this could have been presented. However, as with anything new, it's best to begin with something simple. When using XSL, you'll need to think of your input data as individual nodes. When scripting against your source document, if you don't have a clear visual representation of your data, you might find it difficult (if not impossible) to transform your data. Therefore, you should consider creating a tree diagram of your XML instances to use as a reference when marking up your XSL style sheet.

XSL Formatting Objects

At the moment, XSL-FO does not format instances directly within a browser. However, it is a good idea to at least grasp its underlying concept. If you are determined to use XSL-FO, there is a Java program called FOP that does support formatting objects. This program converts formatting object documents into .PDF documents (which you can display on the web). You can download FOP from http://www.jtauber.com/fop/.

After reading the preceding section, "Using CSS", you have a basic understanding of how style sheets can be used to define formatting rules for a complete set of tags. XSL-FO also allow you to define rules for your tags and do more with the layout than you can with XML+CSS. This includes cross-references, page numbers, etc.

XSL formatting objects are an XML vocabulary that you use to place your XML elements in a document. As with XSL-T, an XSL-FO document must be well-formed. There are over 50 XSL-FO elements and close to 200 formatting properties that can be used for these elements. The formatting properties for the objects are (by no accident) similar to the formatting properties of CSS. This formatting model is based on areas (similar to the box model used for CSS) that can contain other formatting objects, space, or text.

The formatting objects for XSL use the http://www.w3.org/XSL/Format/1.0 namespace. To begin your XSL-FO style sheet, use the following declaration:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"
 xmlns:fo="http://www.w3.org/XSL/Format/1.0"
 result-ns="fo">

Although XSL-T and XSL-FO can function independently, you would be losing the separation of content and presentation by using only XSL-FO. Consequently, to take full advantage of XSL-FO, you would write your XSL style sheet to use XSL-T to transform your XML instances into XSL-FO vocabulary.

Figure 6 is an XSL style sheet that does just that.


Figure 6 - XML+XSL-FO/T

Conclusion

XML can be used in a variety of ways. One of the most common uses of XML in our industry is to display documents on the web. By converting existing, or creating new documentation in XML, you'll have the power to manage and single-source your documentation in a whole new way. XML+CSS suffices if you are interested in only managing and displaying your documents on the web. XML+XSL is the better choice if you want to transform your documents into other XML languages for different applications.

Assignment

If, like me, you learn best by doing things yourself, here is a little assignment for you:

Look at the source code for this document and you'll notice it was written with XHTML+CSS. If you are up to the challenge, convert it to XML+CSS and then XML+XSL.

If you need help, or have general questions and/or comments, please feel free to contact me at: jprince@e-talkcorp.com.


Copyright © 2000 John Richard Prince