limitations of rel="tag" microformat

I've been investigating the rel-tag microformat - tags, basically. I've realised there are two fairly obvious problems with the spec (I imagine I'm not the first to pick up on these, of course).

My scenario: an existing application hosts articles which are to be tagged. The intended tag space (target of the tag link) is to be a page listing all documents in the application with that tag (so users can read more on the same site).

So, what are the limitations of the spec (and some ideas to alleviate the problems)?

URLs

Tag URLs must end with a directory named /tagname. Nothing else will do, to meet the spec. In the real world, this is too strict a requirement - in the scenario I am investigating, I don't control the URLs produced by the application nor did I choose the server environment.

The app produces tag listings using an argument, something like ?tag=tagname at the end of the URL. The environment does not allow me to set up a rewrite/.htaccess to forward the spec-format URL to the app-format URLs. So I'm out of luck there.

So I can just use Technorati tag links, right? Well, depends if the client goes for that, which they probably won't since internal grouping is a higher priority. Plus, in context the user's natural expectation would be to click the tag link and find a listing of posts with that tag on the host site. If we send users off to another site we're losing a key benefit of using tags in the first place.

The rub is that we're not doing anything which goes against the spirit of the spec, we're just not meeting the very specific file system structure it's asking for. Microcontent driving the entire server's directory layout? I'm not sure our admin guys are going to buy that.

I can see why URL parsing is a way to push for some level of tag/link relevancy; but I don't really see why http://www.server.com/scriptname.ext?tag=tagname should not be acceptable as a tag URL (not saying there's no reason, just that it's not immediately obvious to me...). The last portion of the URL is still the tag; and plenty of applications out there use the syntax for categories.

Perhaps the motivation is to make a single way of extracting the tag from the URL. This is a little more tricky, but surely it's not impossible to cater to both path segments and argument value? eg. look for ?tag=tagname at the end of the string; if ?=tag is not found, look for /tag/tagname

If that's too vague/complex, then a class attribute could be used to specify argument format or path format. eg. class="tag-argument" and class="tag-path" to differentiate between the two formats.

visible meta-data?

The spec states: Making tag hyperlinks visible has the additional benefit of making it more obvious to readers if a page is abusing tag links, and thus providing more peer pressure for better behavior.

Well, no, actually it's not visible at all. The meta-data is taken from the URL (the tag is extracted from the href contents), which is not visible. The visible link text can be anything at all, which means you can create <a href="http://technorati.com/tag/nice+worksafe+stuff" rel="tag">nasty nwsf stuff</a> ...a tag index will think the post is about nice worksafe stuff. People are in for a surprise if they click through to the actual page and find it's quite accurately nasty nsfw stuff.

The visible meta-data was accurate, but the actual meta-data was not. Oddly enough, this abuse of the spec conforms to the spec. A more robust form of accuracy enforcement would be to require the visible link text to match the tag from the URL. The differences in spaces/encoded spaces shouldn't be too hard to cover since there are only two allowable encoding forms.

summary

Under the current draft, tags and relevant tagspaces are potentially hard to create; and it's still easy to abuse the system. Humans can still be tricked and so can the machines. It's a great spec if you happen to have a compliant directory structure, but if your site doesn't match then you either recreate your entire system... or, more likely, you sadly advise the client that tags aren't happening.

Labels: , , ,

Comments

  1. Blogger AN, June 20, 2006 2:38 pm: 

    FYI, you can append a link to techorati with a reference to the site that limits the results to those that can be found on the specified site.

    For example:
    http://www.technorati.com/tag/
    MySubjectTag?from=http://mysite.com

    I haven't tested this yet with a rel="tag" link to see if it correctly registers the tag.

  2. Blogger 200ok, June 26, 2006 1:02 am: 

    AN - great tip, will have to give that a whirl and see how it goes :)

  3. Blogger Kevin Marks, January 05, 2007 6:34 pm: 

    That's right AN, the spec says that, and Technorati supports it too (it used to have a bug preventing that, but I fixed it).
    Ben, on the wider point, misleading your users with link text is bad, but it is clearly you that is doing that; the link that is indexed by the tag collator is what you actually linked to. If you are going to goatse your users, that is your lookout really.

  4. Anonymous Mike Schinkel, January 08, 2007 3:32 am: 

    Ben:

    I so completely agree with you points. Yes, it is far too constricting in its allowed URL format. Further, I find rel=tag very counter-intuitive which means it is likely to often be wrongly applied. It would make far more sense to me for the tag to be the visible text with a link being an optional addition, i.e.

    <span class="tag">foo</span>

    and

    <a href="http://www.wikipedia.org/wiki/foo"
    class="tag">foo</a>



    I think rel=tag might well have been designed to get more links (and hence Google-juice?) pointing to Technorati than it was to provide the best solution for the given need. I don't say that with animosity; after all Technorati is a business and businesses that drive standards processes typically do so for their own businesses benefit. Just trying to recognize the circumstances at play...

    FWIW.

about

Web development and standards, as seen by Ben Buchanan.

subscribe

elsewhere

[More bookmarks]