Extensible HyperText Markup Language

Serving XHTML As XML

XHTML was created by the HTML Working Group at W3C, whose mission is to "fulfill the promise of XML by applying XHTML to a wide variety of platforms." It is therefore XML that is driving the development of XHTML, and developers need to understand the connection between these two technologies.

Web Browsers Won't Render Invalid XHTML Markup

When a browser opens a Web page and the media type for the page is set to application/xhtml+xml, the browser will process the page as XML and will enforce the rules of XML. If there are markup errors on a Web page, the browser will not render the Web page. Instead, most Web browsers will display an XML parsing error message such as the one seen in the screen shot below.

Screen shot of a Web browser displaying an XML parsing error.

The browser's refusal to render the Web page can seem harsh, considering that we have become accustomed to using browsers that make adjustments for invalid markup. So why do browsers stop processing documents when they encounter errors in XHTML documents that are processed as XML? The answer is simple. Computers (i.e. software such Web browsers) are not mind readers. Well, that's not entirely true, because when Web browsers process HTML documents that contain errors, they do in fact try to guess the Web developer's intentions. Sometimes their guess is right, and sometimes it is wrong. Different browsers can also make different guesses.

Web browsers weren't always so tolerant of markup errors. Early HTML Web browsers stopped processing Web pages when they encountered markup errors. However, in the late 1990's, in a race to gain market share, Web browsers began to tolerate markup errors and make what they felt were appropriate adjustments for the errors. Today, this practice continues for HTML documents, because HTML is a legacy markup language and its functionality is fixed. By contrast, invalid markup is not tolerated for XML-based documents, because the functionality of XML is extensible and could therefore be very sensitive to errors. For example, XHTML (which is an XML language remember) is a host language that can contain other markup languages, such as MathML that describes mathematical formulas. So say that a markup error occurs in MathML on a Web page and that the browser tries to fix it (as it would with HTML). The repercussions of this could be serious. Suppose that the mathematical information that contains the error relates to a vaccine, or helps calculate the maximum load that a bridge can tolerate? Do we want to take the chance that the browser will fix the markup correctly?

Beyond disaster scenarios involving lethal vaccines and collapsed bridges, let's not forget that errors are not tolerated in other domains either. When computer code contains errors, we don't expect C++ applications to compile, or PHP script to run, or JavaScript functions to execute. The same is true for data formats like ZIP, Word and PDF. XML's low tolerance of errors should therefore be seen in the same light, plus it has the added convenience of an error feedback mechanism.

Errors Happen - Fix Them!

Everyone makes mistakes; it's no big deal. The trick is to find and fix them quickly. When XHTML is processed as XML, the Web browser stops processing the document at the first error and displays a message with information about the error. Sometimes these XML parsing error messages can be quite cryptic, so if you can't determine where the error lies, consider using the W3C Markup Validation Service ("validator" for short).

The validator is located at http://validator.w3.org and will check your markup for errors. There are three ways to submit markup:

Validate by URL
Enter a URL to a Web page. Don't forget to include http:// before the name of your Web site. For example: http://xhtml.com/en/xhtml/reference/
Validate by File Upload
If you are working with a static Web page on your computer, you can uploaded it using this option.
Validate by Direct Input
This option lets you copy and paste markup from your Web page into the validator.

Below is a description of the four most common XML markup errors:

1. Incorrect Nesting Of Elements

Elements that contain other elements must be nested correctly. Inner elements (also called child elements) must be closed before outer elements (also called parent elements) can be closed. The following example shows incorrect markup where a parent element (p) is closed before a child element (em) is closed:

  1. <p>The quick brown fox jumps over the <em>lazy dog.</p></em>

Compare this to the correct version:

  1. <p>The quick brown fox jumps over the <em>lazy dog</em>.</p>

2. All Elements Must Be Closed

Empty elements must have a / before the closing >. It is good practice to put a space before the /. For example:

  1. <p><img src="smith.jpg" alt="Headshot of James Smith" /></p>

Non-empty elements must also be closed. In the above example, the opening <p> element is closed by </p>.

3. Attribute Values Must Be In Quotes

Attribute values, even numeric values, must be in quotes, as is 100% in the example below:

  1. <table width="100%">

4. Markup Characters Used In Text Or Attribute Values Must Be Escaped

Markup characters are characters that are used to delimit elements, attributes and special character references. The four markup characters are: <, >, & and ". When these characters are used in text or attribute values, they must be escaped in the following manner:

  • < becomes &lt;
  • > becomes &gt;
  • & becomes &amp;
  • " becomes &quot;

For example, the following URL contains a markup character &:

  1. http://xhtml.com/en/?css=no&layout=yes

When writing this URL in XHTML, the & must be escaped:

  1. <p><a href="http://xhtml.com/en/?css=no&amp;layout=yes">Turn off CSS</a></p>

Setting Media Type For Web Pages

Media types determine if Web browsers treat XHTML as HTML or XML. If the media type for an XHTML Web page is set to text/html, browsers will parse the Web page as though it were HTML. If the media type is set to application/xhtml+xml, browsers will parse the Web page as XML.

Although most modern browsers such as Firefox, Opera and Safari support the application/xhtml+xml media type, Internet Explorer is lagging behind. The IE development team has indicated that they plan to support this media type in the future, but until they do, most developers serve XHTML 1.0 as HTML.

Some developers implement a technique called "content negotiation" where browsers such as Firefox, Opera and Safari get served XHTML as XML, while browsers like IE get XHTML served as HTML. The site xhtml.com uses this technique. Visitors using IE get XHTML served as HTML, while visitors using Firefox, Opera and Safari get XHTML served as XML. You can implement content negotiation on your own Web site if your site uses server-side scripting languages such as PHP, ASP, etc. Below are examples of how to set this up in different scripting languages.

PHP

  1. $accept = $_SERVER["HTTP_ACCEPT"];
  2. $ua = $_SERVER["HTTP_USER_AGENT"];
  3. if (isset($accept) && isset($ua)) {
  4. if (stristr($accept, "application/xhtml+xml") || stristr($ua, "W3C_Validator")) {
  5. header("Content-Type: application/xhtml+xml");
  6. }
  7. }

Active Server Pages

  1. Dim strAccept, strUA
  2. strAccept = Request.ServerVariables("HTTP_ACCEPT").Item
  3. strUA = Request.ServerVariables("HTTP_USER_AGENT").Item
  4. If InStr(1, strAccept, "application/xhtml+xml") > 0 Or InStr(1, strUA, "W3C_Validator") > 0 Then
  5. Response.ContentType = "application/xhtml+xml"
  6. End If

C# In ASP.NET

  1. string accept = Request.ServerVariables["HTTP_ACCEPT"];
  2. string ua = Request.ServerVariables["HTTP_USER_AGENT"];
  3. if (accept != null && ua != null) {
  4. if (accept.IndexOf("application/xhtml+xml") >=0 || ua.IndexOf("W3C_Validator") >= 0) {
  5. Response.ContentType = "application/xhtml+xml";
  6. }
  7. }

Frequently Asked Questions

Should All XHTML Web Pages Be Served As XML?

XHTML 1.0 was designed to be forward- and backward-compatible. So it is your choice to serve XHTML 1.0 as HTML, or as XML. Subsequent versions of XHTML must be served as XML.