No Clean Feed - Stop Internet Censorship in Australia

questions for google

There's an SEO conference coming up shortly which will feature several Google employees. Russ is calling for your input on questions you'd like asked; and Scott has further thoughts:

I've already commented fairly heavily at both sites, so I guess this is a meta post :) For reference, my questions (in no particular order):

  • Since 301 redirections get you bombed (and longevity is a big factor in pagerank, so new URLs are effectively bombed), is there a way to move a site without losing your pagerank?
  • Will Google ever produce valid pages for their own sites? Many standardistas have produced proof of concept versions of Google search for example - standards compliant AND lightweight. Why not use them?
  • Do they think Flash will ever be seriously searchable, in a useful manner? Do they think it will be possible? Would they rank Flash content higher or lower than text content?
  • Does Google give equal weight to ABBR contents versus spelled-out terms?
  • Does Google give additional weight to tags/tagged pages? (...which leads to the next point...)
  • Will Google be indexing/weighting microformatted content? What is Google’s view of microformats and their potential benefits to search? If they did support microformats would that also suggest they’d need to pay more attention to semantics?
  • I’d also question their views on whether validation is a "signal of quality". In short, if a page validates surely that is an indication that the author/developer has paid close attention to the construction of the site… which would be a signal of quality in my book!

Then from my comments on Standardzilla:

Google: so few websites validate that it isn’t a signal of quality

...incredibly bad logic there. If a site validates at this point in time, it indicates that someone has paid serious attention to the quality of the page. Surely a signal of quality! Maybe they don’t want to open that door since they’d then be admitting that their own pages suck.

Their interest in accessibility is minimal at best. Accessible search is treated as a bit of a curiosity, as far as I can tell. A neat toy produced by someone’s 20%, but that’s about it.

The thing I’ve come to realise about Google is that they do not consider inaction be "doing evil". Despite the tremendous influence they have, they don’t use it to "do good". Personally I think their inaction is a form of doing evil, but that’s just me.

Do you have questions of your own? Head over to Max Design - standards based web design, development and training » Our chance to ask Google and make yourself heard!

Labels: , ,

meta-data is dead. long live xhtml.

Lately I've encountered quite a few people who still feel meta-data is useful (even critical) for web pages. They were unwilling to consider the idea that a pure ROI evaluation might be worthwhile before spending time embedding meta-data in traditional <meta> tags.

Actually I think it's time for web publishers to focus their efforts elsewhere.

invisible meta-data is dead

Invisible meta-data doesn't work. - Tantek Çelik at WE05

Invisible meta-data has failed for search engines, due to sustained abuse by unethical or just misguided web developers. Being hidden from the user's view means there is no real accountability - few people check a website's meta-data to see if it is accurate.

As the web became more popular, people started to use search more heavily; then when people started making money from websites competition started to get fierce for the top rankings on search results. When simple relevance wasn't enough, people started looking for ways to work - and then exploit - the system.

It started out with a few extra keywords, maybe a few mispellings, a few variations for good measure. Then it moved on to adding less relevant but more popular keywords... eventually sites were adding massive amounts of entirely spurious meta-data.

The search engines fought back; and now search engines barely use meta-data at all when ranking pages. Specific search tools still make heavy use of meta-data, but general web search engines do not.

most meta-data is just bad

Intentional abuse aside, a great deal of meta-data is counter-productive simply because it is bad. This low-quality information can not produce a high-quality result when used for any purpose - a problem which has quite possibly been misunderstood from the dawn of computing:

On two occasions, I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able to rightly apprehend the kind of confusion of ideas that could provoke such a question. - Charles Babbage

To put it in more modern terms: rubbish in, rubbish out.

One fundamental problem with meta-data is that it requires a reasonable grasp of indexing to produce even passable meta-data. For a very large web site, you really need a trained indexer with an extensive controlled vocabulary (...and they're not afraid to use it, punk).

Your average web publisher is not a librarian or indexer. The chances they can produce a good meta-data set are slim to nil. To make matters worse, it is not quick or easy to properly train someone in the skills required to do so.

In a distributed publishing system, this can lead to complete chaos. Meta-data can be completely unrelated to the contents of the page. Administrative meta-data such as publish dates can be horribly out of date.

In fact, unless you have highly-trained indexers creating the meta-data for a large website, you are probably better off having no meta-data at all.

semantic markup to the rescue

A well-formed XHTML document actually contains a great deal of meta-data. Much of it is visible and a lot is self-defining - by creating the document, you create the meta-data. Then to really put the icing on the cake, search engines do use it.

From the evidence I've seen so far, the most powerful items of meta-data in XHTML are the <title> and <h1> elements - key pieces of microcontent which don't receive the attention they're due. They should define the contents of the document, providing top-level keywords for the content.

After <title> and <h1>, some the sub-headings; emphasised/strong-emphasised text (<em> and <strong>); language statement (eg. xml:lang="en"); and finally the body text. Body text does count, since it should contain the most relevant, accurate keywords anyway.

tag and release

These days we can really cap things off by adding the human keywords which might be associated with the content. Call it tagging, visible meta-data, folksonomies... the general principle is the same. Tagging gives you the opportunity to categorise your content in a useful way (limitations of the rel="tag" microformat aside). Not only can search engines read tags; but your users can read them and (where your system supports it) they can even use those tags to seek out further information on the topic.

Tags let you add the extra terms that people used to throw into <meta> tags - the related terms, regardless of their existence in the body text. Combine this with mispelled words being picked up by good search tools (eg. Google's "Did you mean...?" suggestions) and you no longer need to seed your documents with meta-data that could get you blacklisted. Why keyword bomb your content when users are being redirected to the correct term?

trust in the natural meta-data

The sum total of this natural meta-data gives search engines the ability to index and rank pages according to the content they actually contain (including visible tags to catch related terms), rather than meta-data that someone has hidden in the file. This way, you cannot have spurious meta-data without creating spurious content... which even the most casual user is likely to question, or the most time-poor developer should remember to update.

Meta-data as we knew it is dead. Long live XHTML.

Labels: , , , ,

about

Web development and standards, as seen by Ben Buchanan.

subscribe