Conversation With X/HTML 5 Team


A new version of HTML is in the works, called X/HTML 5. was invited to post a series of questions to the X/HTML 5 team on their public mailing list. The responses, republished below, came from Ian Hickson, editor of the X/HTML 5 specification.

What Is X/HTML 5?

Web Applications 1.0, more commonly referred to as X/HTML 5, is a new version of HTML that is vying to replace HTML 4 and XHTML 1. The X/HTML 5 specification is being developed by the Web Hypertext Application Technology Working Group (WHATWG).

Ian Hickson Bio

Few people know the Web and Web technology like Ian Hickson. Ian learned the inner working of Web browsers when he worked for Opera Software and spent his free time validating bugs in Mozilla's Bugzilla. Even with a full work schedule, you can find Ian answering questions on many Web technology forums and mailing lists under the alias "Hixie". Ian currently works for Google where in 2005 he conducted the most comprehensive analysis of how markup is used on the Web. In 2007, Ian is editor of the X/HTML 5 spec and a member of the CSS Working Group at W3C.

Why do we need X/HTML 5? When did this need become apparent?
Ian Hickson

HTML started as a document language for scientists to share their work. It evolved over time; for example the img element was added, forms were added, WYSIWYG features were added, and then removed in favor of CSS, and so forth. In the last few years, the Web has evolved yet again, reaching a much more dynamic state than it had before (pundits refer to this as "Web 2.0"). The HTML 5 effort is about maintaining and evolving HTML to address the needs of 21st century Web authors.
X/HTML 5 is currently in Working Draft stage. What is the tentative timetable for moving X/HTML 5 through the standards approval process towards Recommendation stage?

We're trying a new spec design model with HTML 5, where certain parts of the spec can be considered "done" before others. This is because we have parts of the spec that are very mature, with multiple implementations, test suites, and active use, and we have others that are very new, and very much in flux.

Right now this isn't very obvious; members of the community are working on a system that will annotate the spec live though, so you can see what parts are stable and what parts are not. Other members of the community have also worked on a page that lets you pick what changes to the spec you want to see, so that you can, say, see only see changes to stable parts of the spec that affect Firefox.

All this makes it hard to give a real timetable. Some parts of the spec are already done and actively used — for example Yahoo! Pipes makes use of the canvas feature of HTML 5. Historically, the point at which specifications have been branded Recommendations has been somewhat arbitrary. For example, HTML 4, which reached the Recommendation stage in the late 90s, is still being developed and fixed. Not all parts of HTML 4 are implemented in browsers, some parts of HTML 4 are known to be wrong but have no errata, and so forth. A big part of the work on HTML 5 is actually just fixing HTML 4 problems.

HTML 5 is being developed completely in the open, by the way. Anyone can take part. See for links to our forums, IRC channel, blog, FAQ, wiki, mailing lists, etc.
X/HTML 5 introduces new markup constructs such as sectioning elements, enhancements to the input element, a construct for dialogs, a way to mark up figures, and much more. Can you briefly describe these new constructs and the reason they were added?

We're trying to use a much more scientific process with the development of HTML 5 than is usually used for new specifications. So, for example, many of the new sectioning elements (for marking up navigation blocks, articles, sections, footers, headers, and so forth) were based on a study of several billion documents done by Google, where we saw that these were the sectioning elements most used by authors.

Some of the other features, like the scoped stylesheet feature that allows style elements to be put in the document itself with the content, in such a way that only that content is styled, were added based on feedback from authors.

In fact, everything in the spec is subject to feedback from Web designers and Web browser implementers. Since we know the spec will be useless without the buy-in of both those groups, a lot of effort is being spent on trying to collect their opinions.

One of the other new features in the draft is datagrid, which is a tree view/list view control with built-in support for AJAX-backed data stores, so you can do something like the typical Webmail view of all your tens of thousands of e-mails, but instead of only showing 20 at a time, you can just scroll through all of them, without having to actually download them all until they're needed.

We also have client-side storage APIs (implemented by Firefox, I believe), offline indicators so you can write applications that detect when they're going offline, drag-and-drop APIs compatible with those implemented by IE and Safari, various networking APIs for both safe cross-domain and cross-frame communication, and for client-server two-way communication. Of course there's no way to know which will actually survive, we've already cut several features because they just weren't important enough.
One of the biggest problems with HTML is that content authors can get away with writing "tag soup".
Is it really a problem? Or is it the reason the Web is so wildly successful? Would the Web have taken off in the same way if it worked like most other systems, showing error messages whenever something was the least bit wrong?
[Because content authors can get away with writing "tag soup"], most content authors don't feel the need to write markup to specification. When markup is not written to specification, CSS may not get applied correctly, JavaScript may not execute, and some user-agents may not be able to process content as the author intended.

Having draconian error handling — the term we use for just not allowing errors instead of having silent error recovery like HTML does — is not the only solution for getting consistent behavior between browsers. The approach that we have taken with HTML 5 is to define what any document means, even if it is invalid — down to the last detail, so that every browser will handle every document in an equivalent way, whether the document is conformant or not. (It's the same technique CSS uses.)
Why not put an end to "tag soup" by requiring user-agents to only accept markup written to specification?

There are literally dozens if not hundreds of billions of documents already on the Web. A study of a sample of several billion of those documents with a test implementation of the HTML 5 Parser specification that I did at Google put a very conservative estimate of the fraction of those pages with markup errors at more than 78%. When I tweaked it a bit to look at a few more errors, the number was 93%. And those are only core syntax errors — it didn't count misuse of HTML, like putting a p element inside an ol element.

If we required browsers to refuse those documents, then you couldn't browse over 90% of the Web.

But consider — if one browser showed error messages on half the Web, and another browser showed no errors and instead showed the Web roughly as the author intended. Which browser would the average person use?

If we want to make HTML 5 successful, we have to make sure the browser vendors pay attention to it. Any requirements that make their market share go down relative to browsers who aren't following the spec will immediately be ignored.
X/HTML 5 has a construct for adding additional semantics to existing elements using predefined class names. Predefined class names could be the most controversial part of X/HTML 5, because the implementation overloads the class attribute. XHTML 2 provides similar functionality using the role attribute. Which approach is better and why?

The proposal to have predefined class names is still very much in the air, we're mostly waiting for author and implementation feedback to see if it is workable. Currently the HTML 5 spec leaves a number of things unanswered (like what happens if two classes on an element are contradictory), so it's definitely not finished.

I haven't been able to work out what the role attribute does or how it is supposed to be implemented. It has a spec, but that spec is really unclear. So I can't really comment on it.
The font element is a terrible construct, primarily because content creators using authoring tools use the font element instead of semantic markup. The X/HTML 5 spec supports the font element when content is authored using WYSIWYG editors. What is the rationale for this? Why would WYSIWYG editors get an exemption? And is this exemption going to make the Web less accessible?

Again, the whole font element and WYSIWYG section is up in the air, the current text is just a straw man and we've received lots of good feedback on it which will need to be taken into consideration.

The main reason that WYSIWYG editors would be given an exception is that the state of the art in user interface today doesn't have a good solution for making a semantic editor usable by the average person. We could require editors to do this, but since nobody knows how to do it, it would be a stupid requirement. Again, we have to compromise on perfection to actually address real-world needs and constraints.
Is it due to a flaw in HTML that it is difficult to build authoring tools, such as WYSIWYG editors, that generate markup rich in semantics, embody best-practices and that can be easily used by non-technical people?
No, I think it's just something that is fundamentally hard. People think visually. Trying to ask a Web designer to think in terms of (e.g.) headers instead of font sizes is just something that WYSIWYG implementers and UI researchers simply haven't solved yet. Personally I don't think it's a lost cause, but we're just not there yet.
Since much of the content on the Web is created using such authoring tools, can we ever achieve a semantically rich and accessible Web?

There will always be a continuum of sites from the unusable to the very accessible. As with all fields of human endeavor, there will always be the highly competent Web designers who understand fundamentally how to build device-independent sites that cater to all kinds of users, and there will always be the inexperienced and ignorant Web designers who think only in terms of their own personal experience, targeting a specific browser on a specific computer without taking into account any other potential user experience.

Probably the best we can do is design the language to make "the right thing" easier, and invest more heavily in education. In this regard HTML is in the same boat as more important subjects; I imagine that as we improve the quality of education in general, understanding of the importance of accessibility and related topics will improve as well.
The XHTML 5 spec says that "generally speaking, authors are discouraged from trying to use XML on the Web". Why write an XML spec like XHTML 5 and then discourage authors from using it? Why not just drop support for XML (XHTML 5)?
Some people are going to use XML with HTML 5 whatever we do. It's a simple thing to do — XML is a metalanguage for describing tree structures, HTML 5 is a tree structure, it's obvious that XML can be used to describe HTML 5. The problem is that if we don't specify it, then everyone who thinks it is obvious and goes ahead and does it will do it in a slightly different way, and we'll have an interoperability nightmare. So instead we bite the bullet and define how it must work if people do it.
The chair of the HTML Working Group at W3C, Steven Pemberton, said "HTML is a mess!" and "rather than being designed, HTML just grew, by different people just adding stuff to it".
He's right! This continues to this day. We're trying to bring some level of sanity to the process, though!
Since HTML is poorly designed, why is it worth preserving? Or is HTML fixable? If so, how does X/HTML 5 fix it?

The original reason I got involved in this work is that I realized that the human race has written literally billions of electronic documents, but without ever actually saying how they should be processed. If, in a thousand years, someone found a trove of HTML documents and decided they would write an HTML browser to view them, they couldn't do it! Even with the existing HTML specs — HTML 4, SGML, DOM 2 HTML, etc. — a perfect implementation couldn't render the vast majority of documents as they were originally intended.

Every major Web browser vendor today spends at least half the resources allocated to developing the browser on reverse-engineering the other browsers to work out how things should work. For example, if you have:

  1. <p> <b> Hello <i> Cruel </b> World! </i> </p>

...and then you run some JavaScript on that to show what elements there are and what their parents are, what should happen? It turns out that, before HTML 5, there were no specs that defined this!

I decided that for the sake of our future generations we should document exactly how to process today's documents, so that when they look back, they can still reimplement HTML browsers and get our data back, even if they no longer have access to Microsoft Internet Explorer's source code. (Since IE is proprietary, it is unlikely that its source code will survive far into the future. We may have more luck with some of the other browsers, whose sources are more or less freely available.)

Once I got into actually documenting HTML for the future, I came to see that the effort could also have more immediate benefits, for example today's browser vendors would be able to use one spec instead of reverse engineering each other; and we could add new features for authors.
Supporters of X/HTML 5 call XHTML 2 radical. History has shown us that radical technological change is often controversial, but in the end is the best choice. For example, in the last 40 years, the technology for delivering music has change radically, from vinyl, to cassette, to CD, to purely digital. Why should the Web shy away from a radical technological change?

If we look at music we see several key factors. The move from Vinyl to Cassette came with radical new benefits like the shock-resistance and the read-write nature of the media. Also, notice how cassette tape didn't replace vinyl; they co-existed with different audiences for a long time. The introduction of digital optical media (CDs) also introduced radical new benefits: significantly higher quality and loss-less reproduction. CDs replaced Vinyl, because they did the same thing, but radically better. However, tapes continued to be used for years, since CD-RW was not widely available straight away. We're now seeing a move from CD collections to audio stored on digital magnetic media (like iPods), but if you look carefully you'll see that a very clear migration paths exists from Audio CDs and tapes to portable media players: CDs can be ripped and stored on the new media, and the new media can play to tape adapters that plug into car stereos like old cassettes.

So we see that successful radical change requires two key features:

  • Radical new benefits to offset the pain of change
  • Backwards-compatibility with the old technology

I'm sure there will come a day where new technologies exist which do indeed shift the Web radically to new worlds, but I don't think we've seen them yet.

In a way, though, HTML 5 itself is a radical change. Not for what it specifies, but for the way it's been created. It's the first really open, collaborative community that has taken on a task of this magnitude (nothing less than rewriting the book on the core Web language) — hopefully, future technologies will also follow this model, instead of the more traditional behind-closed-doors, corporate-sponsored approach that most other technologies have used.
In the minds of most people, HTML is dead and X/HTML 5 is perceived as an attempt to resurrect it. Given this perception, how can you succeed in marketing HTML to consumers (those who build Web sites)?
According to some other research we did for the HTML 5 effort, over 95% of the Web today is HTML, with the rest being mostly a smattering of PDF, Word, and plain text. Under what definition of the word could it be considered "dead"? Web designers never stopped writing HTML pages. I don't think we'll have any difficulty convincing them to continue doing so.


  • For readability, "HTML 4.x/XHTML 1.x" has been shorted to read "HTML 4 and XHTML 1".

Further Reading