exposing microformat content in the browser

It has been noted, over at the relaunched Webmonkey, that Microformat support seems to have dropped out of Firefox 3.

What has actually happened is that FF3 has an API for microformatted content but no UI to display it. There was a concern about how to alert the user and then how to let them access the data.

The short story is that even with Firefox 3, you'll need to install an add-on like Operator to take advantage of microformats data on the web. The reason the user interface is missing is because, as Kaply says, "there was never any agreement as to how to expose (microformats)".

Mozilla and the Firefox developers variously considered a sidebar or a toolbar, but decided that both would take up too much screen real estate.

Is this really such a difficult question? Why not just display the microformats logo next to the RSS logo in the address bar?

It's extensible - after clicking you would get a list of available microformatted items, just like you get a list of available feeds. It follows an existing paradigm set up by the RSS logo, specifically that the data on screen is available in another format. It takes up a tiny amount of screen real estate.

Opera already adds a logo in this manner when the content is available as a widget:

Opera toolbar showing RSS and Widget icons

It's hardly a stretch to imagine a Microformats icon as well (ignoring the fact that I'm no icon designer :)):

Opera toolbar showing RSS and Widget icons, plus added microformat icon

It feels pretty natural and you're already used to the RSS icon appearing in that location. Obviously there's an upper limit on how many logos you'd want, but that issue applies to the RSS icon too.

The security and maintenance issues of how to process the data do remain, of course. How do you update the processing routines, for instance? But even that seems like a minor issue when you consider how often Firefox updates get pushed out.

Updates seem like even less of an issue when you consider the frequency of new microformats being released - ie. not very often. Seriously, plenty get discussed but the list of actual "specification" grade microformats has barely changed in the past 18 months. In fact, off the top of my head I don't think it actually has changed in the past 18 months.

So, my suggestion to browser makers could be summarised like this:

  • When microformat content is identified in the page, display a microformat icon in the same way the browser displays the RSS icon.
  • Only support those microformats designated "specifications". Or even just support hCard and hCalendar, which are the ones most likely to be useful to the user in a browsing context.
  • When the specs change, include the parsing changes in your next update.

It's just a thought. At any rate, the lack of UI to access the Microformat API in Firefox just means that nothing changes for the time being. People who want to use Microformats use something like the Operator extension. Sometime in future the UI issue will no doubt get resolved one way or another.

Labels: , , ,

what i want from a new markup spec

So it has come to pass that the W3C has decided to take the WHATWG's HTML5 on board. It will form the basis of the W3C's HTML5. The goal is to have a public draft by June - yes, this year. Given that the spec now has to endure the full process of the W3C we'll see how that goes.

Anyway, this got me to thinking: what do I really want from a new markup specification? I've talked about this before but I realised that there's a difference between what I want and what I actually hope for :)

Ultimately it comes down to quite a small subset of the overall picture - the things I genuinely wish for in daily life. There are a few elements I'd like to see created or simply supported consistently by browsers.

basics

These are the basics, the minimum additions to fill in some blanks left by HTML 4.01.

  • An extensible, contextual heading/section system
  • A way to associate a CAPTION (or LABEL) with images and lists
  • Footnotes (which are really endnotes on the web)

It's a short list, since the reality is that the lack of decent CSS support impacts on my daily life far more than the limitations of markup. Frankly most developers out there still haven't mastered the semantics of HTML 4.01 so it's not like adding more elements will stop people making tag soup.

Meanwhile, semantics geeks like me will keep searching for the secrets of semantic alchemy with compounds and microformats. Where the markup is deficient we have ways of adding more meaning.

Although this is not an addition to a spec... I'd like to see real support for OBJECT so (amongst other things) we can replace images with the complex explanatory content required for complex graphics. Since certain popular browsers can't cope with this element, we still essentially don't have it.

headings

On the topic of headings, HTML5 does not do what I want since it still relies on H1-H6. I gather the HEADER element is meant to do some kind of section marking but frankly on a first reading it doesn't make a heck of a lot of sense. It certainly doesn't introduce any obvious practical benefit.

XHTML2's H and SECTION system is exactly what I want. I regularly wish I could write a code fragment with a heading, without having to know the heading rank. With the H/SECTION system, I could just define the fragment as a section and know that the heading rank will be sorted out in-situ.

If you maintain a small, stable site, headings may not have ever been an issue. But if you have ever maintained the code base for a very large site, you're probably nodding your head ;)

Even for a small blog headings are a problem. In your average blog the top two heading ranks are probably handled by the site template and CMS; but subheadings in actual posts have to be written in directly with heading tags. So you're probably inserting H3 tags right into your content. Too bad if you later want to change the post pages to have the post title as the H1 - then you'd have a jump from H1 to H3. You either have to stick with the original structure; or you have what I consider an invalid heading rank jump.

Consider the same blog, with H/SECTION... you can adjust the structure around the post as much as you like - it doesn't matter. The sections and corresponding heading ranks take care of themselves.

Headings aren't glamorous. They're not uber-funky AJAX-friendly form inputs which will sparkle in the sun and inspire dancing in the street. They are bread and butter elements which we use every day. HTML has never made them easy to work with, so like it or not they would be a killer app for a markup spec.

exclusions

In addition to what I do want, I think it's important to think about what a spec excludes. I think it's high time for specs to stop weakly deprecating things and flat-out remove them. I'd kill off the semantically neutral and visual-design-based elements - FONT, B, I, S, U etc... and definitely no get-out-responsibility-free cards for WYSIWYG editors!

The spec should just have them treated and rendered the same as SPAN. They're all semantically meaningless and can be replaced either with CSS or semantically-meaningful elements.

I should note that by my reading, WHATWG's HTML5 deals with B and I by creating semantic meaning for them. While that approach has some merit, I doubt the majority of developers will alter their usage according to the new semantics so those elements' usage will just be incorrect for new reasons. If everyone out there was to adopt the new semantics, I'd probably support the approach :)

wish list

These are things I want, but in the balance of things they're not the first things I'd argue to have included. That's the basics list :)

  • A dedicated caption or group label for sets of radio buttons - FIELDSET and LEGEND don't really work for long descriptions.
  • A drag-and-drop form input which is also keyboard accessible - keystroke/click to pick and keystroke/click to drop. Drag and drop is a useful paradigm but the possible solutions at the moment are not much good for keyboard or screen reader users.
  • An element to enclose extra info for assistive technology users, something a little like NOSCRIPT. Having to use CSS tricks to hide assistive content creates a clash between content and style; not to mention putting your content at risk of Google blacklisting. An element named something like ASSIST could be ignored by search engines and enabled by assistive tech like screen readers. [Note - this is a pretty sketchy idea, no doubt there are all sorts of practical issues. I'm not saying it's perfect. It's just that we need some legit way to give extra info to users who need it, without getting blacklisted from Google. A dedicated element might be the way to go - although proper support for OBJECT would help an awful lot with accessibility it still won't help the search engine issues.]

Another short list. I wouldn't say no to specific elements for navigation, but I don't think they would really fix problems. Accessibility basics give way to usability issues - if your navigation is hard to distinguish from content, it's more of a usability issue than a markup issue.

HTML5 has elements for navigation, document content, header, footer etc... I'm not a huge fan of the naming system but I can see the potential benefits. Still, such elements aren't really priorities for me. I'm still going to give users skip links and Google has no plan to reward semantics anyway. If - and it's an if - screen readers were to make use of these new semantic elements then I'd probably use them. But screen readers lag behind and users often can't afford the latest versions anyway, so we're still going to be using skip links anyway.

all i want for christmas...

So basically what I want from a new spec is a few basics that were missed the last time around. I'm not actually hanging out for bells and whistles, although HTML 5 seems full of them and no doubt we'll happily use them.

Has reality lowered my expectations? Perhaps. Will I be glad of some kind of update - something, anything - after all these years? Almost certainly. Remember it has been more than seven years since XHTML 1.0 became a recommendation. That's 70 web years - a long time between updates.

After all that time it seems that most developers had lost faith in the W3C. Taking on HTML 5 seems like the only rational way forward and it was probably the only thing the W3C could do to regain a little bit of relevance in the world of markup. The browser makers certainly seemed to have jumped ship to WHATWG's HTML 5, or were quietly preparing to do so.

When I first heard of the WHATWG I thought it was unnecessary - maybe even a little irresponsible - to break away from the W3C. Many years later I'm glad they did.

So anyway with a June deadline, here's hoping we have a new HTML spec in time for Christmas. Santa... I'll be a good boy, I promise.

Labels: , , , , , , , , , ,

wd06: John Allsopp - Microformats

[Semi liveblogged - patchy network today]

How do we get information?

Example: trying to find out what movie is good to go see.

  • Who do we trust to review a movie?
  • Can we trust centralised systems like IMDb?
  • Who owns the reviews?
  • Can we verify that the reviews are real?
  • Can we verify that they haven’t been modified since the author created them?

We want to tap into the wisdom of many people.

  • Try google? No, too many results and no reviews in the first few pages anyway.
  • Try aggregators? Blog searches? Not really working.

So we could write a review search engine, but how to identify a review? How do you get a consistent review scale?

People are really good at getting information from a surrounding context, but software is really bad at it.

Do we wait for the W3C to deliver all this and more with XHTML2? Do we invent new XML languages?

Microformats can provide this structure and allow search tools to access and index the info.

Microformats:

  1. simple
  2. html based
  3. data formats
  4. based on existing standards
  5. based on current developer practice

...their purpose is to bring richer semantics to today’s web:

  1. it doesn’t break browsers
  2. it doesn’t break pages

“Great technology needs to be adopted. But it’s chicken and egg. So we do have a chicken – no, wait, which one does come first?”

They are in use – technorati is a really big, well-known user of microformats but there are many more out there.

Get the Tails extension for Firefox and check out the web directions sites.

John finished with an example of using hCard, to show how easy it is.

“I’m always really scared when I type URLs in directly on screen… I’m not sure what you’ll see in my history!” – JA

“Sure I’m a geek but that’s COOL

What are you waiting for? They’re out there, get involved. Use them!

John just won “first person to pimp their book”…

Q: What’s the process for getting from idea to microformat?

There’s a detailed process on the microformats wiki; it starts with logical questions like making sure there’s a problem to be solved.

Labels: , ,

meta-data is dead. long live xhtml.

Lately I've encountered quite a few people who still feel meta-data is useful (even critical) for web pages. They were unwilling to consider the idea that a pure ROI evaluation might be worthwhile before spending time embedding meta-data in traditional <meta> tags.

Actually I think it's time for web publishers to focus their efforts elsewhere.

invisible meta-data is dead

Invisible meta-data doesn't work. - Tantek Çelik at WE05

Invisible meta-data has failed for search engines, due to sustained abuse by unethical or just misguided web developers. Being hidden from the user's view means there is no real accountability - few people check a website's meta-data to see if it is accurate.

As the web became more popular, people started to use search more heavily; then when people started making money from websites competition started to get fierce for the top rankings on search results. When simple relevance wasn't enough, people started looking for ways to work - and then exploit - the system.

It started out with a few extra keywords, maybe a few mispellings, a few variations for good measure. Then it moved on to adding less relevant but more popular keywords... eventually sites were adding massive amounts of entirely spurious meta-data.

The search engines fought back; and now search engines barely use meta-data at all when ranking pages. Specific search tools still make heavy use of meta-data, but general web search engines do not.

most meta-data is just bad

Intentional abuse aside, a great deal of meta-data is counter-productive simply because it is bad. This low-quality information can not produce a high-quality result when used for any purpose - a problem which has quite possibly been misunderstood from the dawn of computing:

On two occasions, I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able to rightly apprehend the kind of confusion of ideas that could provoke such a question. - Charles Babbage

To put it in more modern terms: rubbish in, rubbish out.

One fundamental problem with meta-data is that it requires a reasonable grasp of indexing to produce even passable meta-data. For a very large web site, you really need a trained indexer with an extensive controlled vocabulary (...and they're not afraid to use it, punk).

Your average web publisher is not a librarian or indexer. The chances they can produce a good meta-data set are slim to nil. To make matters worse, it is not quick or easy to properly train someone in the skills required to do so.

In a distributed publishing system, this can lead to complete chaos. Meta-data can be completely unrelated to the contents of the page. Administrative meta-data such as publish dates can be horribly out of date.

In fact, unless you have highly-trained indexers creating the meta-data for a large website, you are probably better off having no meta-data at all.

semantic markup to the rescue

A well-formed XHTML document actually contains a great deal of meta-data. Much of it is visible and a lot is self-defining - by creating the document, you create the meta-data. Then to really put the icing on the cake, search engines do use it.

From the evidence I've seen so far, the most powerful items of meta-data in XHTML are the <title> and <h1> elements - key pieces of microcontent which don't receive the attention they're due. They should define the contents of the document, providing top-level keywords for the content.

After <title> and <h1>, some the sub-headings; emphasised/strong-emphasised text (<em> and <strong>); language statement (eg. xml:lang="en"); and finally the body text. Body text does count, since it should contain the most relevant, accurate keywords anyway.

tag and release

These days we can really cap things off by adding the human keywords which might be associated with the content. Call it tagging, visible meta-data, folksonomies... the general principle is the same. Tagging gives you the opportunity to categorise your content in a useful way (limitations of the rel="tag" microformat aside). Not only can search engines read tags; but your users can read them and (where your system supports it) they can even use those tags to seek out further information on the topic.

Tags let you add the extra terms that people used to throw into <meta> tags - the related terms, regardless of their existence in the body text. Combine this with mispelled words being picked up by good search tools (eg. Google's "Did you mean...?" suggestions) and you no longer need to seed your documents with meta-data that could get you blacklisted. Why keyword bomb your content when users are being redirected to the correct term?

trust in the natural meta-data

The sum total of this natural meta-data gives search engines the ability to index and rank pages according to the content they actually contain (including visible tags to catch related terms), rather than meta-data that someone has hidden in the file. This way, you cannot have spurious meta-data without creating spurious content... which even the most casual user is likely to question, or the most time-poor developer should remember to update.

Meta-data as we knew it is dead. Long live XHTML.

Labels: , , , ,

limitations of rel="tag" microformat

I've been investigating the rel-tag microformat - tags, basically. I've realised there are two fairly obvious problems with the spec (I imagine I'm not the first to pick up on these, of course).

My scenario: an existing application hosts articles which are to be tagged. The intended tag space (target of the tag link) is to be a page listing all documents in the application with that tag (so users can read more on the same site).

So, what are the limitations of the spec (and some ideas to alleviate the problems)?

URLs

Tag URLs must end with a directory named /tagname. Nothing else will do, to meet the spec. In the real world, this is too strict a requirement - in the scenario I am investigating, I don't control the URLs produced by the application nor did I choose the server environment.

The app produces tag listings using an argument, something like ?tag=tagname at the end of the URL. The environment does not allow me to set up a rewrite/.htaccess to forward the spec-format URL to the app-format URLs. So I'm out of luck there.

So I can just use Technorati tag links, right? Well, depends if the client goes for that, which they probably won't since internal grouping is a higher priority. Plus, in context the user's natural expectation would be to click the tag link and find a listing of posts with that tag on the host site. If we send users off to another site we're losing a key benefit of using tags in the first place.

The rub is that we're not doing anything which goes against the spirit of the spec, we're just not meeting the very specific file system structure it's asking for. Microcontent driving the entire server's directory layout? I'm not sure our admin guys are going to buy that.

I can see why URL parsing is a way to push for some level of tag/link relevancy; but I don't really see why http://www.server.com/scriptname.ext?tag=tagname should not be acceptable as a tag URL (not saying there's no reason, just that it's not immediately obvious to me...). The last portion of the URL is still the tag; and plenty of applications out there use the syntax for categories.

Perhaps the motivation is to make a single way of extracting the tag from the URL. This is a little more tricky, but surely it's not impossible to cater to both path segments and argument value? eg. look for ?tag=tagname at the end of the string; if ?=tag is not found, look for /tag/tagname

If that's too vague/complex, then a class attribute could be used to specify argument format or path format. eg. class="tag-argument" and class="tag-path" to differentiate between the two formats.

visible meta-data?

The spec states: Making tag hyperlinks visible has the additional benefit of making it more obvious to readers if a page is abusing tag links, and thus providing more peer pressure for better behavior.

Well, no, actually it's not visible at all. The meta-data is taken from the URL (the tag is extracted from the href contents), which is not visible. The visible link text can be anything at all, which means you can create <a href="http://technorati.com/tag/nice+worksafe+stuff" rel="tag">nasty nwsf stuff</a> ...a tag index will think the post is about nice worksafe stuff. People are in for a surprise if they click through to the actual page and find it's quite accurately nasty nsfw stuff.

The visible meta-data was accurate, but the actual meta-data was not. Oddly enough, this abuse of the spec conforms to the spec. A more robust form of accuracy enforcement would be to require the visible link text to match the tag from the URL. The differences in spaces/encoded spaces shouldn't be too hard to cover since there are only two allowable encoding forms.

summary

Under the current draft, tags and relevant tagspaces are potentially hard to create; and it's still easy to abuse the system. Humans can still be tricked and so can the machines. It's a great spec if you happen to have a compliant directory structure, but if your site doesn't match then you either recreate your entire system... or, more likely, you sadly advise the client that tags aren't happening.

Labels: , , ,

griffith phonebook adds hCard and vCard

Griffith University staff phonebook listings now include hCards in the markup and an option to save vCards. Check out "Find a staff member" at www.griffith.edu.au/find/.

It's the most comprehensive usage of hCard that I've seen so far - most hCards I've seen were personal contacts, which tend to include less fields. The vCard format has a few shortcomings, which obviously flow through to hCard. For example, there's no way to type a URL; so you can only enter one URL and some vCard readers default it to 'personal'.

The implementation is now listed at hcard - new examples (microformats.org).

[Disclosure: I work for Griffith and this was a project of mine, together with Colin Morris. We proposed the hCard/vCard implementation as part of an internal innovation grant scheme but it got funded outright before shortlisting was completed :)]

[Disclaimer: The University has nothing to do with this page; and vice-versa.]

Labels: , , , ,

about

Web development and standards, as seen by Ben Buchanan.

see me speak

subscribe