XHTML
What is XHTML?
XHTML is a recasting of HTML in XML. After XML became a recommendation, the question arose: if we have XML, do we still need HTML since now anybody can design their own Web-delivered language? To answer this question, we held a two-day workshop in San Francisco in May 1998, and we came to the conclusion that, yes, there is still a need for HTML. There are large numbers of people who are happy with HTML and don't want to have to design their own language. Furthermore, there are millions of documents out there in HTML, and there are implicit semantics in HTML documents that can be useful (for instance, for search engines that can give more priority to text in <H1> elements).
Some online references for XHTML
New features in XHTML 2.0 draft
- <blockcode>, an analogue of the venerable <blockquote> is added for programmers. It can carry a class attribute, which may be used to indicate the type of code contained in the block.
- XHTML 2.0 draft contains a provision for a caption element, which may reside within either table or object elements. A first-class caption construct was added as generic way to markup a caption for images
- The Citation <cite> element has also returned. <cite> takes a cite attribute, which should be the "source" of the citation
- XHTML 2.0 has restored the style attribute
- XHTML 2.0's section 6.4 Edit Collection adds back some support for Web content editing. The collection, according to the new draft, "allows elements to carry information indicating how, when and why content has changed." Particular XHTML 2.0 elements (including inline elements like <span>) can have an edit attribute, which can have one of four permissible values: inserted, deleted, changed, moved. One of these values, deleted, carries with it a "default presentation" which, in CSS terms, is display: none.
XHTML Document Structure
See XHTML Part 3: What's New at the O'Reilly Network.
In August 2001, W3C published A tutorial on XHTML Modules and Markup Languages. This tutorial explains how to create XHTML Family modules and markup languages, based on Modularization of XHTML.
XHTML documents must reference one of the three XHTML DTDs: Strict, Transitional, or Frameset. The XHTML DTDs are currently approximations of the HTML 4.0 DTDs. Since XHTML is still a W3C working draft, it may be modified before XHTML becomes a W3C recommendation. You can convert your HTML documents to XHTML at O'Reilly & Associates XHTML conversion using Tidy. There is a good tutorial at W3Schools XHTML School
Here is an overview of some of the new rules; for a more complete breakdown, read the O'Reilly & Associates XHTML (Extensible Hypertext Markup Language) article by Peter Wiggen
- The strict XHTML doctype declaration:
<!!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
is used if all formatting is in Cascading Style Sheets (CSS). That is, <font> and <table> tags are not used to control how the browser displays the documents.
- The transitional doctype declaration:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
is used when you need to use presentational markup in your document. Most of us will be using the transitional DTD for quite some time, because we don't want to limit our audience to users with browsers that support CSS.
- The frameset declaration:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
is used when your documents have frames.
- The root element of the document must be <html> and must designate the XHTML 1.0 namespace. Include a new namespace attribute xmlns in the opening HTML tag. The namespace attribute defines which namespace the document uses:
<html xmlns="http://www/w3/org/TR/xhtml1">
(that's the letters XHTML and the number 1)
- Empty elements must be terminated: <br /> <hr /> <img src="image.gif" />
- Attribute value pairs cannot be minimized: (<option value="somevalue" selected="selected">, <input type="radio" ... checked="checked" />, <dl compact="compact">, etc.)
Some HTML user agents are unable to interpret boolean attributes when these appear in their full (non-minimized) form, as required by XML 1.0. Note this problem doesn't affect user agents compliant with HTML 4. The following attributes are involved: compact, nowrap, ismap, declare, noshade, checked, disabled, readonly, multiple, selected, noresize, defer.
- Use external style sheets if your style sheet uses < or & or ]]> or --. Use external scripts if your script uses < or & or ]]> or --. Note that XML parsers are permitted to silently remove the contents of comments. Therefore, the historical practice of "hiding" scripts and style sheets within comments to make the documents backward compatible is likely to not work as expected in XML-based implementations.<script> and <style> elements must be marked as CDATA sections (an easy alternative to using the CDATA wrapper is to use external script and style sheet documents). An example:
<script language="JavaScript">
<!-- <![CDATA[
document.write("<h2>Table of Factorials</h2>");
for(i = 1, fact = 1; i < 10; i++, fact *= i) {
document.write(i + "! = " + fact);
document.write("<br />");
}
// Code courtesy of JavaScript the Definitive Guide
// ]]> -->
</script>
- In XML, URIs that end with fragment identifiers of the form "#foo" do not refer to elements with an attribute name="foo"; rather, they refer to elements with an attribute defined to be of type ID, e.g., the id attribute in HTML 4. Many existing HTML clients don't support the use of ID-type attributes in this way, so identical values may be supplied for both of these attributes to ensure maximum forward and backward compatibility (e.g., <a id="foo" name="foo">...</a>).
Further, since the set of legal values for attributes of type ID is much smaller than for those of type CDATA, the type of the name attribute has been changed to NMTOKEN. This attribute is constrained such that it can only have the same values as type ID, or as the Name production in XML 1.0 Section 2.5, production 5. Unfortunately, this constraint cannot be expressed in the XHTML 1.0 DTDs. Because of this change, care must be taken when converting existing HTML documents. The values of these attributes must be unique within the document, valid, and any references to these fragment identifiers (both internal and external) must be updated should the values be changed during conversion.
Finally, note that XHTML 1.0 has deprecated the name attribute of the a, applet, form, frame, iframe, img, and map elements, and it will be removed from XHTML in subsequent versions.
- Using Ampersands in Attribute Values -- when an attribute value contains an ampersand, it must be expressed as the character entity reference &
Clean up your Web pages with HTML TIDY: http://www.w3.org/People/Raggett/tidy/ . This can also be run online, and a corrected, formatted file is returned.
Validate XHTML code at the W3C validator (remember to have the correct DTD statement on the first line of the document!). Recommend putting a link to http://validator.w3.org/check/referer on your web page
New features in XHTML 2.0 draft
- <blockcode>, an analogue of the venerable <blockquote> is added for programmers. It can carry a class attribute, which may be used to indicate the type of code contained in the block.
- XHTML 2.0 draft contains a provision for a caption element, which may reside within either table or object elements. A first-class caption construct was added as generic way to markup a caption for images
- The Citation <cite> element has also returned. <cite> takes a cite attribute, which should be the "source" of the citation
- XHTML 2.0 has restored the style attribute
- XHTML 2.0's section 6.4 Edit Collection adds back some support for Web content editing. The collection, according to the new draft, "allows elements to carry information indicating how, when and why content has changed." Particular XHTML 2.0 elements (including inline elements like <span>) can have an edit attribute, which can have one of four permissible values: inserted, deleted, changed, moved. One of these values, deleted, carries with it a "default presentation" which, in CSS terms, is display: none.
