Lately I've encountered quite a few people who still feel meta-data is useful (even critical) for web pages. They were unwilling to consider the idea that a pure ROI evaluation might be worthwhile before spending time embedding meta-data in traditional <meta> tags.

Actually I think it's time for web publishers to focus their efforts elsewhere.

invisible meta-data is dead

Invisible meta-data doesn't work. - Tantek Çelik at WE05

Invisible meta-data has failed for search engines, due to sustained abuse by unethical or just misguided web developers. Being hidden from the user's view means there is no real accountability - few people check a website's meta-data to see if it is accurate.

As the web became more popular, people started to use search more heavily; then when people started making money from websites competition started to get fierce for the top rankings on search results. When simple relevance wasn't enough, people started looking for ways to work - and then exploit - the system.

It started out with a few extra keywords, maybe a few mispellings, a few variations for good measure. Then it moved on to adding less relevant but more popular keywords... eventually sites were adding massive amounts of entirely spurious meta-data.

The search engines fought back; and now search engines barely use meta-data at all when ranking pages. Specific search tools still make heavy use of meta-data, but general web search engines do not.

most meta-data is just bad

Intentional abuse aside, a great deal of meta-data is counter-productive simply because it is bad. This low-quality information can not produce a high-quality result when used for any purpose - a problem which has quite possibly been misunderstood from the dawn of computing:

On two occasions, I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able to rightly apprehend the kind of confusion of ideas that could provoke such a question. - Charles Babbage

To put it in more modern terms: rubbish in, rubbish out.

One fundamental problem with meta-data is that it requires a reasonable grasp of indexing to produce even passable meta-data. For a very large web site, you really need a trained indexer with an extensive controlled vocabulary (...and they're not afraid to use it, punk).

Your average web publisher is not a librarian or indexer. The chances they can produce a good meta-data set are slim to nil. To make matters worse, it is not quick or easy to properly train someone in the skills required to do so.

In a distributed publishing system, this can lead to complete chaos. Meta-data can be completely unrelated to the contents of the page. Administrative meta-data such as publish dates can be horribly out of date.

In fact, unless you have highly-trained indexers creating the meta-data for a large website, you are probably better off having no meta-data at all.

semantic markup to the rescue

A well-formed XHTML document actually contains a great deal of meta-data. Much of it is visible and a lot is self-defining - by creating the document, you create the meta-data. Then to really put the icing on the cake, search engines do use it.

From the evidence I've seen so far, the most powerful items of meta-data in XHTML are the <title> and <h1> elements - key pieces of microcontent which don't receive the attention they're due. They should define the contents of the document, providing top-level keywords for the content.

After <title> and <h1>, some the sub-headings; emphasised/strong-emphasised text (<em> and <strong>); language statement (eg. xml:lang="en"); and finally the body text. Body text does count, since it should contain the most relevant, accurate keywords anyway.

tag and release

These days we can really cap things off by adding the human keywords which might be associated with the content. Call it tagging, visible meta-data, folksonomies... the general principle is the same. Tagging gives you the opportunity to categorise your content in a useful way (limitations of the rel="tag" microformat aside). Not only can search engines read tags; but your users can read them and (where your system supports it) they can even use those tags to seek out further information on the topic.

Tags let you add the extra terms that people used to throw into <meta> tags - the related terms, regardless of their existence in the body text. Combine this with mispelled words being picked up by good search tools (eg. Google's "Did you mean...?" suggestions) and you no longer need to seed your documents with meta-data that could get you blacklisted. Why keyword bomb your content when users are being redirected to the correct term?

trust in the natural meta-data

The sum total of this natural meta-data gives search engines the ability to index and rank pages according to the content they actually contain (including visible tags to catch related terms), rather than meta-data that someone has hidden in the file. This way, you cannot have spurious meta-data without creating spurious content... which even the most casual user is likely to question, or the most time-poor developer should remember to update.

Meta-data as we knew it is dead. Long live XHTML.