In short... character set encoding and special character display remain a pain in the butt.

Update: this t-shirt says it all (Mac version also available).

Labels: ,


  1. Anonymous Anonymous, July 28, 2005 3:08 a.m.: 

    It is true that there is still a requirement for some black magic in getting character encodings right. The things I have seen go wrong most often:

    JAWS is apparently not able to handle utf-8. So if I want to write my mail in spanish, to people who are using older systems, I either have to drop accents (which causes genuine ambiguity in my text) or switch that mail to iso-8859-1. If I want to write in Hungarian, Greek, Russian, etc. it is of course a problem. So I set my environment up in utf-8 by default and live with swapping for some email.

    IRC servers accept what is sent, and don't have a formal way of negotiating character sets or even saying what they accept. Same problem as mail, effectively.

    The big one I have found is servers not being set up correctly to know what they are serving. There is a tutorial on configuring character-encoding in Apache that W3C's internationalisation group produced as part of their extensive collection of materials on character encoding. I find most of these documents are actually far more friendly and readable than the average W3C document. But there are some genuine issues out there.

    What a smart system would do is auto-detect the encoding (most modern systems do in fact manage this at least to some extent). But yes, there is more work to be done...

    -- chaals