Skip to content

Instantly share code, notes, and snippets.

@waldyrious
Last active March 17, 2024 10:00
Show Gist options
  • Save waldyrious/9260278 to your computer and use it in GitHub Desktop.
Save waldyrious/9260278 to your computer and use it in GitHub Desktop.
Minimal XHTML5 document
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width" />
<title>Minimal XHTML5 document</title>
<link rel="stylesheet" href="mystyle.css" />
<script src="myscript.js"></script>
</head>
<body>
<p>
This is a
<a href="https://mathiasbynens.be/notes/xhtml5">minimal</a>
<a href="https://blog.whatwg.org/xhtml5-in-a-nutshell">XHTML5</a>
<a href="https://www.w3.org/TR/html-polyglot/">document</a>.
</p>
</body>
</html>
@waldyrious
Copy link
Author

waldyrious commented Jan 16, 2021

Note: instead of named entities, use the corresponding numeric references — e.g. &nbsp;&#160; (or &#xA0;)

Also don't forget to escape & as &amp; in URLs with query strings.

Note: XML's five predefined entities can be used normally: &amp;, &lt;, &gt;, &quot; and &apos;.

If named entities are desired, it is possible to use them by explicitly using the xhtml1 DOCTYPE declaration (both the strict and the transitional DTDs work):

<!DOCTYPE html
     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

(Note that browsers don't actually fetch the DTDs; they have hardcoded list of entities for each DOCTYPE, which are enabled if the specific DOCTYPE string is passed.)

@waldyrious
Copy link
Author

waldyrious commented Jan 17, 2021

Useful links:

  • Mathias Bynens: The XML serialization of HTML5, aka ‘XHTML5’
  • The WHATWG Blog: XHTML5 in a nutshell
  • W3C: Polyglot Markup: A robust profile of the HTML5 vocabulary (no longer maintained)
  • W3C: XHTML 1.0: The Extensible HyperText Markup Language — A Reformulation of HTML 4 in XML 1.0
  • WHATWG: HTML Standard § Introduction § HTML vs XML syntax

    This specification defines an abstract language for describing documents and applications. [...] [Two] concrete syntaxes [for] this abstract language [...] are defined in this specification.

    The first such concrete syntax is the HTML syntax. [...] If a document is transmitted with the text/html MIME type, then it will be processed as an HTML document by web browsers.

    The second concrete syntax is XML. When a document is transmitted with an XML MIME type, such as application/xhtml+xml, then it is treated as an XML document by web browsers, to be parsed by an XML processor. Authors are reminded that [...] even minor syntax errors will prevent a document labeled as XML from being rendered fully, whereas they would be ignored in the HTML syntax.

  • WHATWG: HTML Standard § Introduction § History

    [In 1998] the W3C membership decided to stop evolving HTML and instead begin work on an XML-based equivalent, called XHTML. This effort started with a reformulation of HTML4 in XML, known as XHTML 1.0, which added no new features except the new serialization, and which was completed in 2000. After XHTML 1.0, the W3C's focus turned to making it easier for other working groups to extend XHTML, under the banner of XHTML Modularization. In parallel with this, the W3C also worked on a new language that was not compatible with the earlier HTML and XHTML languages, calling it XHTML2.
    [...]
    The scope of the HTML5 specification [includes] what had previously been specified in three separate documents: HTML4, XHTML1, and DOM2 HTML.

  • WHATWG Wiki: HTML vs. XHTML

@waldyrious
Copy link
Author

waldyrious commented Aug 20, 2021

To make this stick, either use the .xhtml extension for the file (which leads the browser to treat it as XHTML), or configure the server to send the content-type header — for example, in PHP:

<?php header('Content-Type: application/xhtml+xml;charset=UTF-8'); ?>

I don't get why the xmlns attribute in the <html> element isn't sufficient for browsers. 🤷

@waldyrious
Copy link
Author

waldyrious commented Jan 10, 2023

For local/CI validation, xmllint can be used, though it requires specifying a DTD:

▶ xmllint --noout --valid index.xhtml
index.xhtml:2: validity error : Validation failed: no DTD found !

▶ xmllint --noout --dtdvalid "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" index.xhtml

I found that using <meta charset="utf-8" /> as above results in the following output by xmllint:

▶ xmllint --noout --dtdvalid "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" index.xhtml
index.xhtml:4: element meta: validity error : Element meta does not carry attribute content
index.xhtml:4: element meta: validity error : No declaration for attribute charset of element meta
Document index.xhtml does not validate against http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd

Using <meta http-equiv="content-type" content="text/html; charset=UTF-8" /> does work, but both forms are spec-compliant so they should be accepted (and indeed the major web browsers do accept them).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment