djhayman10 days ago

Believe it or not, all the browsers seem to be adhering to the spec.

Here is the HTML5 spec: http://www.w3.org/TR/2014/REC-html5-20141028/syntax.html

The important part is "8.2.5.4 The rules for parsing tokens in HTML content" here: http://www.w3.org/TR/2014/REC-html5-20141028/syntax.html#parsing-main-inhtml

So during parsing, the engine starts off in the "initial" insertion mode, and eventually gets to the "in body" insertion mode once it encounters normal HTML element tags: http://www.w3.org/TR/2014/REC-html5-20141028/syntax.html#parsing-main-inbody

The step that we are concerned with in the "in body" insertion mode is:

'A start tag whose tag name is one of: "b", "big", "code", "em", "font", "i", "s", "small", "strike", "strong", "tt", "u"'

It says:

  • Reconstruct the active formatting elements, if any.
  • Insert an HTML element for the token. Push onto the list of active formatting elements that element.

The step "Push onto the list of active formatting elements" links here: http://www.w3.org/TR/2014/REC-html5-20141028/syntax.html#push-onto-the-list-of-active-formatting-elements

It says:

'If there are already three elements in the list of active formatting elements after the last list marker, if any, or anywhere in the list if there are no list markers, that have the same tag name, namespace, and attributes as element, then remove the earliest such element from the list of active formatting elements. For these purposes, the attributes must be compared as they were when the elements were created by the parser; two elements have the same attributes if all their parsed attributes can be paired such that the two attributes in each pair have identical names, namespaces, and values (the order of the attributes does not matter).'

So, according to the HTML5 spec, if you nest more than three of the same "active formatting elements", the browser engine starts deleting the oldest entry in the list. This causes a mismatch when it encounters the closing tag, so the tag ends up outside. (You can see this in the developer tools: it looks like , when it should be .)

Deeply nested phrasing content is parsed incorrectly in every major html parser except Firefox. Who is right?

While developing Scrimba, we bumped into a really weird parsing issue, that can be reproduced in every major html parser out there. This leads us to believe that it is in fact the correct behaviour, even if it seems so obviously wrong. Firefox is the only browser that parses the above html "correctly". Press play to follow along a more thorough demonstration of this peculiar issue.

Can anyone out there explain what is happening here? If it is in fact part of the spec, what is the reasoning behind it? Feel free to jump into the screencast above at any time and run the examples yourself.