limitations of rel="tag" microformat
I've been investigating the rel-tag microformat - tags, basically. I've realised there are two fairly obvious problems with the spec (I imagine I'm not the first to pick up on these, of course).
My scenario: an existing application hosts articles which are to be tagged. The intended tag space (target of the tag link) is to be a page listing all documents in the application with that tag (so users can read more on the same site).
So, what are the limitations of the spec (and some ideas to alleviate the problems)?
Tag URLs must end with a directory named
/tagname. Nothing else will do, to meet the spec. In the real world, this is too strict a requirement - in the scenario I am investigating, I don't control the URLs produced by the application nor did I choose the server environment.
The app produces tag listings using an argument, something like
?tag=tagname at the end of the URL. The environment does not allow me to set up a rewrite/.htaccess to forward the spec-format URL to the app-format URLs. So I'm out of luck there.
So I can just use Technorati tag links, right? Well, depends if the client goes for that, which they probably won't since internal grouping is a higher priority. Plus, in context the user's natural expectation would be to click the tag link and find a listing of posts with that tag on the host site. If we send users off to another site we're losing a key benefit of using tags in the first place.
The rub is that we're not doing anything which goes against the spirit of the spec, we're just not meeting the very specific file system structure it's asking for. Microcontent driving the entire server's directory layout? I'm not sure our admin guys are going to buy that.
I can see why URL parsing is a way to push for some level of tag/link relevancy; but I don't really see why http://www.server.com/scriptname.ext?tag=tagname should not be acceptable as a tag URL (not saying there's no reason, just that it's not immediately obvious to me...). The last portion of the URL is still the tag; and plenty of applications out there use the syntax for categories.
Perhaps the motivation is to make a single way of extracting the tag from the URL. This is a little more tricky, but surely it's not impossible to cater to both path segments and argument value? eg. look for
?tag=tagname at the end of the string; if
?=tag is not found, look for
If that's too vague/complex, then a class attribute could be used to specify argument format or path format. eg.
class="tag-path" to differentiate between the two formats.
The spec states:
Making tag hyperlinks visible has the additional benefit of making it more obvious to readers if a page is abusing tag links, and thus providing more peer pressure for better behavior.
Well, no, actually it's not visible at all. The meta-data is taken from the URL (the tag is extracted from the href contents), which is not visible. The visible link text can be anything at all, which means you can create
<a href="http://technorati.com/tag/nice+worksafe+stuff" rel="tag">nasty nwsf stuff</a> ...a tag index will think the post is about nice worksafe stuff. People are in for a surprise if they click through to the actual page and find it's quite accurately nasty nsfw stuff.
The visible meta-data was accurate, but the actual meta-data was not. Oddly enough, this abuse of the spec conforms to the spec. A more robust form of accuracy enforcement would be to require the visible link text to match the tag from the URL. The differences in spaces/encoded spaces shouldn't be too hard to cover since there are only two allowable encoding forms.
Under the current draft, tags and relevant tagspaces are potentially hard to create; and it's still easy to abuse the system. Humans can still be tricked and so can the machines. It's a great spec if you happen to have a compliant directory structure, but if your site doesn't match then you either recreate your entire system... or, more likely, you sadly advise the client that tags aren't happening.