dashing into trouble - why html comments break in firefox

[Or, When Minutiae Attack!]

There's a perennial question that pops up on email lists and from other developers. The question goes something like this:

"This page works fine in everything except Firefox and I can't tell why... it's showing an HTML comment as raw code..."

the problem

This problem usually boils down to a quirk in the way Firefox handles HTML comments. Most browsers only treat --> as a closing comment in HTML. However, Firefox also treats any instance of -- as a closing comment.

So, if you have a comment with two or more adjacent hyphens, you're in trouble. Both of these are out:

<!-- --------------- Blah --------------- -->

<!--
<p>
Blah blah blah -- then blah.</p>
-->

Firefox will display the comment as raw code, instead of hiding the comment and its contents. The late Netscape Navigator did this too - in fact I first saw the problem in NN4, back in 2000.

the solution

The solution is simple, even if it's not always convenient: don't put adjacent hyphens inside an HTML comment. That's fine and dandy unless you have content authors who have a habit of using two hyphens instead of an em dash!

In any case, you either need to remove the adjacent hyphens, or if it's appropriate you can convert them to more correct characters.

If your page's content includes double hyphens and you're not allowed to modify it, then you're not going to be able to comment blocks of it out. Yes, that is indeed annoying.

is firefox wrong?

Technically, very technically, Firefox is right. The HTML 4 specification defines "--" as the comment delimiters; while "<!" and ">" are the markup declaration delimiters. From the spec:

White space is not permitted between the markup declaration open delimiter("<!") and the comment open delimiter ("--"), but is permitted between the comment close delimiter ("--") and the markup declaration close delimiter (">"). A common error is to include a string of hyphens ("---") within a comment. Authors should avoid putting two or more adjacent hyphens inside comments.

Information that appears between comments has no special meaning (e.g., character references are not interpreted as such).

Note that comments are markup.

...so, Firefox is not "wrong". It's just following the spec to the letter (or hyphen, as the case may be). The other browsers have gone with a more human-friendly interpretation of the spec.

so are all the other browsers wrong?

I don't think the other browsers are definitively wrong either. They still comply with a different interpretation. Personally I read the specification the same way: "any instance of -- followed by > or whitespace and > is a closing comment".

According to this interpretation, "-->" and "--  >" are valid closing comments, but "-- blah" is not. It's also a reasonably logical approach to say that since a closing comment should be "-->", then the browser should ignore anything which is not "-->". That's pretty much what a comment is there for, after all - to ignore stuff.

is the spec wrong?

Well the spec can't really be "wrong" I guess. It is what it is. But in this case I think the specification is a bit illogical:

  • I don't see the sense in random exceptions to rules - why specify vagueness? "this is the closing comment tag, except when it's not".
  • Why allow whitespace in the closing comment, when it's not allowed in the opening comment? If whitespace was prohibited the only valid interpretation would have been the complete "-->" and the whole problem goes away.
  • The "common error" note just confuses the issue. It does not say why including adjacent hyphens is an error, nor does it define any specific error handling method.
  • The double-hyphen approach can't be a technical requirement for producing a rendering engine, since the other browsers are able to restrict closing comments specifically to "-->".
  • It's impractical to expect that commented content will never contain multiple adjacent hyphens. Sure, we can live without hyphens in comments; but we shouldn't dictate content based on markup (no matter how small a detail it is).
  • And finally... it's irritating, so I'm going to be cranky at the spec ;)

Besides that, as the HTML5 spec notes, HTML has always been implemented by browsers as a language in its own right. There's no need to slavishly follow anything else. The more human interpretation could just as easily have been specified.

But then, comments appear to be a blind spot in W3C specs - how I curse the lack of a single-line comment in CSS... but I digress.

will html5 clear it up?

No. HTML5 actually makes things a little harder to remember, by addingdocumenting (hat tip zcorpan) the restriction that a comment can't end with a hyphen either:

Comments must start with the four character sequence U+003C LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS (<!--). Following this sequence, the comment may have text, with the additional restriction that the text must not contain two consecutive U+002D HYPHEN-MINUS (-) characters, nor end with a U+002D HYPHEN-MINUS (-) character. Finally, the comment must be ended by the three character sequence U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN (-->).

What's the saying about laws and sausages?

conclusion

Perhaps there's some deep-seated syntactical reason for the double hyphen approach. The HTML4 spec's warning about "a common error" hints at some underlying logic without actually revealing it. Maybe it's related to the formatting of DTD comments. Maybe it was just a Netscape quirk which got turned into a standard. Maybe it's totally random.

It seems unlikely that Mozilla is going to change this particular detail, particularly as it's not a "bug" to follow the specifications; and besides, Firefox 3 Beta 2 still has the quirk.

Still, Firefox does correct omitted background colours by defaulting to white - so they're not above "helpful" rendering tricks. So maybe there's some hope on that front.

But in the meantime, no matter what we think we just have to live with it. It's yet another bit of web development minutiae to file away in your head, for the day you see it happen.

Labels: , , , , , , , ,

Comments

  1. Blogger Colin Morris, January 09, 2008 12:03 AM:

    Interesting, as my reading of the spec talking about 'whitespace' makes me think that any non-space between -- and > means that character string shouldn't be interpretted as a closing comment tag. -- > or --\t> is fine, but -- hi bob > shouldn't.

    Silly nerds.

  2. Blogger zcorpan, January 09, 2008 4:52 AM:

    The dashes thing is because of how comments work in SGML. This is an SGML feature -- not something HTML4 specifies. The HTML4 DTD uses this feature quite extensively, e.g.:

    <!ATTLIST MAP
    %attrs; -- %coreattrs, %i18n, %events --
    name CDATA #REQUIRED -- for reference by usemap --
    >

    For comment declarations you can only have whitespace between the actual comments, so having multiple comments in a comment declaration is not as useful as other declarations.

    <!-- foo -- -- bar -->

    You can thus have subsequent hyphens, so long as you use the right amount of them...

    As you point out, this is highly confusing and has been dropped in HTML5. Firefox is actually wrong per HTML5 and Mozilla have said that they will change their implementation.

    > HTML5 actually makes things a little harder to remember, by adding the restriction that a comment can't end with a hyphen either:

    False. SGML/HTML4 has this restriction as well (and XML, for that matter).

  3. Blogger 200ok, January 09, 2008 10:16 PM:

    @colin:
    Pretty much how I read it, yeah :)

    @zcorpan:
    As you point out, this is highly confusing and has been dropped in HTML5.

    Hmm.. the HTML5 spec as currently published does clarify the closing comment tag, but it still prohibits double hyphens inside the comment. Is there a change in the pipeline or am I missing something here?

    Firefox is actually wrong per HTML5 and Mozilla have said that they will change their implementation.

    Cool :)

    False. SGML/HTML4 has this restriction as well

    Hmm, well the HTML4 spec doesn't actually say so in the relevant section. So I'll tweak the post to say HTML5 *documents* the restriction.

  4. Blogger zcorpan, January 10, 2008 12:39 AM:

    Yes, HTML5 forbids authors to use double hyphens in comments, otherwise you would run into trouble in Firefox... But it requires browsers to only treat --> as the end of a comment instead of doing what Firefox does.

    HTML4 says that only whitespace is allowed between the -- and the >, and a single hyphen is not whitespace, therefore it is not allowed.

    More on this: http://www.howtocreate.co.uk/SGMLComments.html

  5. Anonymous indnajns, May 23, 2008 9:28 PM:

    I hope no one is under the delusion that Firefox is somehow "upholding the standard" more-so than IE. I came up against this problem and have to say that Firefox picks and chooses which "bad comments" it will choose. One line with extra hyphens will comment out "properly" while the next line looking exactly the same won't and shows as text. At least IE is consistent in it's execution. As the spec says, (at least HTML4, couldn't follow the HTML5 spec verbage) "Information that appears between comments has no special meaning ..." The spec itself is contradictory.