what i want from a new markup spec

So it has come to pass that the W3C has decided to take the WHATWG's HTML5 on board. It will form the basis of the W3C's HTML5. The goal is to have a public draft by June - yes, this year. Given that the spec now has to endure the full process of the W3C we'll see how that goes.

Anyway, this got me to thinking: what do I really want from a new markup specification? I've talked about this before but I realised that there's a difference between what I want and what I actually hope for :)

Ultimately it comes down to quite a small subset of the overall picture - the things I genuinely wish for in daily life. There are a few elements I'd like to see created or simply supported consistently by browsers.

basics

These are the basics, the minimum additions to fill in some blanks left by HTML 4.01.

  • An extensible, contextual heading/section system
  • A way to associate a CAPTION (or LABEL) with images and lists
  • Footnotes (which are really endnotes on the web)

It's a short list, since the reality is that the lack of decent CSS support impacts on my daily life far more than the limitations of markup. Frankly most developers out there still haven't mastered the semantics of HTML 4.01 so it's not like adding more elements will stop people making tag soup.

Meanwhile, semantics geeks like me will keep searching for the secrets of semantic alchemy with compounds and microformats. Where the markup is deficient we have ways of adding more meaning.

Although this is not an addition to a spec... I'd like to see real support for OBJECT so (amongst other things) we can replace images with the complex explanatory content required for complex graphics. Since certain popular browsers can't cope with this element, we still essentially don't have it.

headings

On the topic of headings, HTML5 does not do what I want since it still relies on H1-H6. I gather the HEADER element is meant to do some kind of section marking but frankly on a first reading it doesn't make a heck of a lot of sense. It certainly doesn't introduce any obvious practical benefit.

XHTML2's H and SECTION system is exactly what I want. I regularly wish I could write a code fragment with a heading, without having to know the heading rank. With the H/SECTION system, I could just define the fragment as a section and know that the heading rank will be sorted out in-situ.

If you maintain a small, stable site, headings may not have ever been an issue. But if you have ever maintained the code base for a very large site, you're probably nodding your head ;)

Even for a small blog headings are a problem. In your average blog the top two heading ranks are probably handled by the site template and CMS; but subheadings in actual posts have to be written in directly with heading tags. So you're probably inserting H3 tags right into your content. Too bad if you later want to change the post pages to have the post title as the H1 - then you'd have a jump from H1 to H3. You either have to stick with the original structure; or you have what I consider an invalid heading rank jump.

Consider the same blog, with H/SECTION... you can adjust the structure around the post as much as you like - it doesn't matter. The sections and corresponding heading ranks take care of themselves.

Headings aren't glamorous. They're not uber-funky AJAX-friendly form inputs which will sparkle in the sun and inspire dancing in the street. They are bread and butter elements which we use every day. HTML has never made them easy to work with, so like it or not they would be a killer app for a markup spec.

exclusions

In addition to what I do want, I think it's important to think about what a spec excludes. I think it's high time for specs to stop weakly deprecating things and flat-out remove them. I'd kill off the semantically neutral and visual-design-based elements - FONT, B, I, S, U etc... and definitely no get-out-responsibility-free cards for WYSIWYG editors!

The spec should just have them treated and rendered the same as SPAN. They're all semantically meaningless and can be replaced either with CSS or semantically-meaningful elements.

I should note that by my reading, WHATWG's HTML5 deals with B and I by creating semantic meaning for them. While that approach has some merit, I doubt the majority of developers will alter their usage according to the new semantics so those elements' usage will just be incorrect for new reasons. If everyone out there was to adopt the new semantics, I'd probably support the approach :)

wish list

These are things I want, but in the balance of things they're not the first things I'd argue to have included. That's the basics list :)

  • A dedicated caption or group label for sets of radio buttons - FIELDSET and LEGEND don't really work for long descriptions.
  • A drag-and-drop form input which is also keyboard accessible - keystroke/click to pick and keystroke/click to drop. Drag and drop is a useful paradigm but the possible solutions at the moment are not much good for keyboard or screen reader users.
  • An element to enclose extra info for assistive technology users, something a little like NOSCRIPT. Having to use CSS tricks to hide assistive content creates a clash between content and style; not to mention putting your content at risk of Google blacklisting. An element named something like ASSIST could be ignored by search engines and enabled by assistive tech like screen readers. [Note - this is a pretty sketchy idea, no doubt there are all sorts of practical issues. I'm not saying it's perfect. It's just that we need some legit way to give extra info to users who need it, without getting blacklisted from Google. A dedicated element might be the way to go - although proper support for OBJECT would help an awful lot with accessibility it still won't help the search engine issues.]

Another short list. I wouldn't say no to specific elements for navigation, but I don't think they would really fix problems. Accessibility basics give way to usability issues - if your navigation is hard to distinguish from content, it's more of a usability issue than a markup issue.

HTML5 has elements for navigation, document content, header, footer etc... I'm not a huge fan of the naming system but I can see the potential benefits. Still, such elements aren't really priorities for me. I'm still going to give users skip links and Google has no plan to reward semantics anyway. If - and it's an if - screen readers were to make use of these new semantic elements then I'd probably use them. But screen readers lag behind and users often can't afford the latest versions anyway, so we're still going to be using skip links anyway.

all i want for christmas...

So basically what I want from a new spec is a few basics that were missed the last time around. I'm not actually hanging out for bells and whistles, although HTML 5 seems full of them and no doubt we'll happily use them.

Has reality lowered my expectations? Perhaps. Will I be glad of some kind of update - something, anything - after all these years? Almost certainly. Remember it has been more than seven years since XHTML 1.0 became a recommendation. That's 70 web years - a long time between updates.

After all that time it seems that most developers had lost faith in the W3C. Taking on HTML 5 seems like the only rational way forward and it was probably the only thing the W3C could do to regain a little bit of relevance in the world of markup. The browser makers certainly seemed to have jumped ship to WHATWG's HTML 5, or were quietly preparing to do so.

When I first heard of the WHATWG I thought it was unnecessary - maybe even a little irresponsible - to break away from the W3C. Many years later I'm glad they did.

So anyway with a June deadline, here's hoping we have a new HTML spec in time for Christmas. Santa... I'll be a good boy, I promise.

Labels: , , , , , , , , , ,

thoughts about html

So, there's a coordinated call for feedback on the WHATWG's activities. There's a lot to cover in the call to action, so I'll just start with some thoughts about HTML...

I haven't read the WHATWG HTML 5 and Forms 2 specs "properly", so much as skimmed them. Forgive me, they are big specs with draft status from an as-yet unrecognised group. I don't read W3C specs for fun either ;) So this is mostly off the top of my head, you'll have to excuse me if something is already covered and I've missed it.

Headings and sections

I rather like the XHTML 2 version of headings and sections, as opposed to HTML 5's current system which seems to inherit all the problems of HTML 4 and none of the advantages of XHTML 2.

  • Why limit things to just six heading levels?
  • Why not declare hn as an extensible set of headings?
  • Why use specific headings if you're using sections - just set a heading for each section and let nesting take care of the rest.

I'm not a fan of the W3C's specific example though, since I feel that each section should start immediately with a heading. I'd like to see the strong sections removed. But otherwise this system seems simple and elegant to me (although maybe I'm just weird - I'm aware that's a possibility!):

<body>
<h>This is a top level heading</h>
<p>....</p>
<section>
    <p>....</p>
    <h>This is a second-level heading</h>
    <p>....</p>
    <h>This is another second-level heading</h>
    <p>....</p>
</section>
<section>
    <p>....</p>
    <h>This is another second-level heading</h>
    <p>....</p>
    <section>
        <h>This is a third-level heading</h>
        <p>....</p>
    </section>
</section>
</body>

In anticipation of the argument "documents shouldn't be so big they need more than six levels", I'll simply suggest you go and convince all the world's lawyers and legislators then get back to me :) Besides, it's entirely possible to have more than six levels in a short document that would not be suitable for presentation in multiple web pages.

Better lists

I think <ol>, <ul> and <dl> should all have a <caption> element or a way to explicitly associate a heading. We're grouping information together after all, I think it makes sense to be able to explicitly state what the grouping is all about. It's one of the really useful things you can do with tables.

I also think ordered lists need more sophisticated numbering systems - we should not have to resort to CSS or use invalid code! eg. we should be able to start an <ol> from, say, 11; because 1-10 were on another page. I'm specifically thinking of search results which are commonly split into multiple pages, yet each page should not restart the list count . Currently it's only valid to set the value of each <li>, which is absurd - so the HTML 5 spec's .

Labels for radio button groups

I don't think HTML 4.01 provides a satisfactory method of labelling/captioning a group of radio buttons. Each radio button gets a label; but really the group needs something to describe the purpose of the set of inputs.

You can use a <fieldset> + <legend> combination for short descriptions, but it feels like a hack (not to mention the practicalities of hacking CSS to get browsers to display long legends!).

Captions for images

I'm not quite sure how this could be approached; but I think a visible caption for images would make sense. Hidden text could then be more akin to longdesc than alt. The <object> element provides an excellent model for alternate content, but not a caption.

The cite attribute

While this is ok, I do wonder at the requirement for a URI. How do I choose a URI to cite Shakespeare for example? What one single URI makes sense? Plus long experience shows us that URIs don't live forever - who remembers to check their cite URIs?

So why not an attribute for the name of the person and an attribute for the title of the work they are being quoted from? Sure, there's potential for ambiguity, but don't try to tell me a URI could not lead to a document which talks about ten John Smiths.

<p creator="covenant" work="we want revolution" cite="http://www.google.com.au/search?&q=%22we+want+revolution%22+covenant+lyrics">we want revolution<br />
constant evolution<br />
start your engines blow your fuses<br />
burn the bridges for the future<br />
this is our solution</p>

The <cite> element

<cite> doesn't make any sense to me either, since there's no explicit association with a quote. Take the example from the HTML 5 draft:

<p><q>This is correct!</q>, said <cite>Ian</cite>.</p>

So long as there is only one Q/CITE pair in the entire document, we're ok. After that, we're just guessing - and while a human might guess fairly well, an indexing system has no grasp of human context. So, perhaps a for attribute is in order:

<p><q id="ians-assertation">This is correct!</q>, said <cite for="ians-assertation">Ian</cite>.</p>

The <iframe> element

Why keep <iframe> in HTML 5 when the spec also includes <object>? Straight question. From a quick read, <object> seems to take care of everything that <iframe> can offer.

More...

The HTML 5 spec includes quite a few all-new elements such as <nav>, <x>, <m> and <progress>. Some are relatively logical, but others like <progress> just seem very odd to me. A progress bar is not a permanent content item, it's a temporary state. However I'll save real discussion of these elements for another day.

So what do you think? Join the discussion!

Labels: , , , , , ,

about

Web development and standards, as seen by Ben Buchanan.

see me speak

subscribe