Harmless XHTML

Author: Ted Shaneyfelt

Abstract

Widespread misleading reports have spread fear of XHTML worldwide. The effort expended to propagate these reports could be better exerted in promoting education and adoption of best practices. Best practices avoid the dreaded pitfalls, while advancing technology.

Context

Ian Hickson has pointed out that an article has been circulating promoting the idea of sending XHTML as text/html as harmful since a post on his own web log as early as September 2002. This document is a response to his posting at the time of this writing. This organization follows the organization of his document, presenting responses to concerns as they appear in that document.

Executive Summary

You may use XHTML safely even if it will not be delivered with the application/xhtml+xml MIME type. You do not need to revert back to SGML based HTML4, abandoning well-formed XML standards to do so safely. The problems that are often pointed out against XHTML delivered as text/html are debunked below.

Although IE6 does not support application/xhtml+xml it is possible to follow good practices to workaround the problem until Microsoft addresses its problem.

Why Using text/html is Not Bad

Claims that text/html is inappropriate for XHTML documents are commonly misunderstood, and lead to stagnation of technology adoption. There are good reasons to adopt an XML based language like XHTML, mainly the ability to parse without name-based discrimination. XSLT, for example allows one document to combine content live from a number of other XML based documents.

My understanding is that this claim of text/html being a bad delivery mechanism for XHTML is mainly directed at webmasters. If you are a webmaster and you know that your documents are XHTML, you should serve them as application/xhtml+xml, and browsers should support them appropriately, though in practice, IE6 does not support application/xhtml+xml at all, so you're caught in a no-win situation. Rather than follow one vendor's path away from modern standards, let's move forward with the newer and more capable standards.

It seems to me that the casual reader typically misunderstands the claim from the author's perspective. As an author, if you don't know for sure that your page will be delivered as application/xhtml+xml, you are led to believe that you should not write XHTML at all. This misunderstanding has so permeated thinking in recent years that tool developers even on open source projects to follow the path away from modern XHTML to favor HTML4.

I have seen the illustration of an author attempting to write XHTML and publishing as text/html without learning best practices first, then later finding the problems with their code when publishing as application/xhtml+xml. This illustration is a better call to educate authors and develop better tools to promote best practices than an argument against XHTML. I expect that more harm in the form of stagnation has been caused by developers resisting XHTML altogether because of this misunderstanding than benefits from pointing out potential problems.

Good Practices to Alleviate Specific Potential Problems

The issues that have been pointed out that affect documents switched between text/html and application/xhtml+xml can be overcome by good practices.

Copy and Paste

Misuse of copy and paste by authors is not a valid reason to abandon the XHTML standard or to prefer HTML 4 over it..

Why using XHTML with good practices is good even when sent as text/html

Arguments against sending XHTML have been made that don't apply to validated XHTML pages. The obvious solution is to validate the web pages as XHTML.

XHTML 1.0 Documents and HTML Compatibility

Making XHTML 1.0 documents work with HTML 4.01 UAs is addressed in RFC 2854 and also by the World Wide Consortium's Appendix C if the XHTML 1.0 Standard. A key step in the process of switching from obsolete HTML to XHTML is providing a migration path. The proposed practices do provide a migration path that serves existing HTML4 UAs and future XHTML UAs.Whether or not the resulting document is compatible with the HTML 4.0 standard and SGML is not the point. Rather, the suggested practice is compatible with HTML 4.0 UAs. It allows existing UAs to process XHTML documents as though they were HTML documents. In the practical world in which we live, this is an important concern, and it is adequate for migration to XHTML 1.0.

How to Write XHTML Documents that Render Properly as text/html

Advantages of XHTML

Although it is preferable to serve application/xhtml+xml rather than text/html to those UAs that can accept that mime type, some of the benefits are still applicable to documents served as text/html. Perhaps the most important benefit is the use of tools with XHTML. XSLT can be successfully applied to documents served as text/html, XSLT is an important tool that currently works with the most popular UAs to combine XML data from various sources with templates to display a resulting web page. This is in practice today, and has been working with Mozilla and IE. for years. Even IE doesn't seem to have any problems receiving a XHTML files as text/html and processing them as XML data in transforms.

Conclusion

There are few disadvantages in using XHTML. There is no need to throw out the standard with the bad practices. Adopt and prefer the XHTML standard, develop good practices, and only throw out the bad practices and migrate away from obsolete technology. Serving HTML documents with text/html is appropriate, and XHTML can be thought of as an XML encoding of HTML. Where pitfalls are encountered, use best practices to overcome them. Tool developers should encourage and assist in developing best practices rather than shying away from XHTML. In theory, and in practice, standards are important. In practice, if you're not ready to switch to XHTML, even nonstandard WFHTML is preferable over HTML 4.0. WFHTML sacrifices compliance with valid HTML to comply with a better standard of well-formed XML, while not necessarily meeting the standard of XHTML. But XHTML is preferable to both HTML4 and WFHTML. Valid XHTML that is written with best practices should be the preferred format for web pages, and it should be the default format for new HTML documents.

The problems with IE have known workarounds. Some groups (mozillaquest, for example) may have no desire to expend any effort to make their sites render correctly on IE. This is fine. When Microsoft makes a better browser, the problem will resolve itself, and until then, their browser will simply not render those sites correctly, and those webmasters who want their sites to be available to all can configure their servers accordingly.

I suppose most readers of Ian's document don't read his disclaimer in Appendix B.

Further Reading

See http://www.hixie.ch/advocacy/xhtml.Don't skip Appendix B. Ian is correct in pointing out that XHTML that follows Appendix C of XHTML 1.0 can be served properly without problems. This is what I recommend, but I would add that authors without control of the mime type should not be afraid of XHTML. IE's bug is a bug unworthy of panic, and certainly unworthy of impeding progress in adopting XML based standards like XHTML.

Acknowledgments

Thanks to Ian Hickson for providing the motivation for writing this article. The organization and much of the content of this document is based on his writings.


Valid XHTML 1.0 Strict
Document made with KompoZer