Join Bill Weinman for an in-depth discussion in this video overview, part of XHTML Essential Training.
- View Offline
Now let's take a quick look at how the language of XHTML works. Just an overview. Just in general. Open your Examples folder and go to the Simple folder inside of it, and bring up the plain.xhtml file and your text editor. And this is a very, very simple minimalist XHTML document. It is technically the minimum allowable number of things that you have to have in XHTML in order for it to be valid. Unfortunately it does not validate as it is at the validator at the World Wide Web Consortium, which you can find at validator.W3.org.
Why don't we just bring that up and show what that does. Validator--Oh, I've already got it in my list. We have a local file here, and that is in our-- let's see here, that's on the Desktop in Examples/Simple-- and this is playing. It's this file here, right. Put that in just like it is and it says it has trouble, "Not able to extract a character encoding." By itself it does not validate even though it's technically legal XHTML.
We have another document here that has a little bit more. This one's called simple.xhtml.html. And this has a few other things in it that makes it valid. It has the XML version here and it has the XML namespace here. And those are the things, that while they're not technically required by the specification, they do make it validate. And we can show that in the validator.
This a really useful tool by the way. You might want to bookmark this one. This one comes up and says, "This page is valid XHTML 1.0 transitional." You can put a little button on your Web page that says that it's valid. It's not a bad thing at all. Just make sure when you write a document you put it through the validator, and make sure that it's all legal, and it'll work in all browsers. You'll notice that we named the file .html and not .xhtml in our file directory, and there's a good reason for this. If you name a file with .xhtml, in many current browsers it'll bring up an XML parser instead of an HTML parser and it won't show what you mean for it to show. Just as an example let's take this document that we have right here and just rename it with .xhtml and you'll notice that Windows complains that this is going to change how it works within the system there.
Oh, it's open. I have to close it. Now it will let me. Now let's go ahead and bring that into, let's say Internet Explorer. We'll just drag it in there. And notice that is not what you intended when you wrote the file, so we take the same file and just rename it. and there it is. So now it does what you expect, and it does in both browsers. Now the reason for that is that many programs especially under the Windows operating system decide how to handle a file based on the file name extension. And importantly, this also works for Web servers. So when you name your file on a Web server it will often decide exactly what type of file it is based on the part of the file name that comes after the last dot which is called the filename extension.
So when the documents is named with .xhtml in the file name extensio--that's the part that comes after the last dot-- the computer program that's looking at the file will use the HTML parser, like the Web browser, like Internet Explorer on this computer here. If the file name extension is .xhtml, it will use the XML parser which is what you saw there. It just sort of shows the structure of the documents, not really useful in the Web presentation sense of useful. So for this reason, I recommend that you name your files .html and not .xhtml. Let's take a look at the structure of the document.
At the beginning of the file is this line here, which is an XML declaration. It's required by the XML spec and it's not required by the XHTML spec, but the validator requires it and it's just a good idea to put it there. So I always start with that. The doc-type element is required by the XHTML spec. It doesn't require this second line part, but that helps the Web browsers to use a more strict form of the rendering engine in the parser. And most Web browsers use that as a sort of a signal to say, "This is a well-formed document and I'm going to treat it as such." The HTML container which starts here and ends there is required. And this attribute, right there, the XMLNS attribute stands for XML namespace is required. It tells the parser which namespace is being used in the XML language. The namespace being XHTML. Very technical. Basically just put it in there, it's required and it's a good idea. And there is the head container, which starts here and ends there.
Just like in HTML, that's the part where you put the title and meta tags, and things like that. Things that have to do with the document but aren't actually displayed in the body of the document. Unlike HTML, the head container is required in XHTML. From here on out all of the elements we have are required. Well, paragraphs aren't required, but a document doesn't mean much without them. The Title element is also required, and it's also a container. It has a begin tag and end tag, and the title goes in there. It's a real good idea to always include a descriptive title in your documents.
It tends to display in the Title bar of the Web browser. It just can't hurt. Finally, here's the body container which contains the body of the document which is the part that's actually displayed. Here we've specified a background color of white. We could have put the word white in there. It would have worked the same. But I tend to use the hexadecimal codes, and I'll talk more about that in a later lesson when I talk about the body. Then there's the paragraph container which contains the paragraph. Now unlike HTML, in XHTML the end tag is required.
All tags have to be terminated in XHTML. Now for a paragraph or something that's a container, it gets terminated like that. For something else, like say, a horizontal rule, it gets a shortcut terminator which looks like that with just the slash and the end tag. This space is not required but it is traditional, and virtually every single XML page that I've ever seen has this space, so it's just not a bad idea to put it there. Some broken parsers might mistakenly require it. So it's just not a bad idea to follow tradition where tradition exists. Let's save this and see the horizontal rule pop up there.
And so with tags that don't have an end tag, that are just stand alone, this is the correct way to do it with the shortcut end tag. That is the same syntactically as if I had written it like this. (Typing.) It's just a shortcut for that and we'll see that that also works in the browser. It shows up the same way. So that's tradition-- like that. So those are the elements of an XHTML page. Now let's just take a brief look at what the major differences are between the XHTML and HTML. We talked about the required end tags. There are no elements that you can write without a properly terminated tag, and the ones, you can use a shortcut terminator for things that don't have any content in them. But they always have to have the terminators. That end P is not optional like it is in HTML. The other interesting difference is, is that all tags must be lowercase in XHTML.
Uppercase is not allowable. Mixed case is not allowable. All tags must be all lowercase, all the time. Like any good rule there is an exception and there's the doc-type, which is commonly written with a capital, and perfectly acceptable to the validator with the capital. It even comes out of Tidy with the capital. I don't understand why there's an exception, but there it is. But for the rest of the elements, for everything that's part of the document itself--and you can arguably say that since that's outside of the HTML container, that that's not necessarily part of the actual XHTML document-- all the rest of it needs to be with lowercase tags. The tags themselves have to be in lowercase. Another difference with XHTML from HTML is that all attributes must be in quotation marks.
In HTML it was possible to say something like, "align=right," and you didn't have to put right in quotation marks because the rule was with HTML if it only had letters, numbers, or underscores, then it didn't need quotes. Well, again to make the language simpler, more direct, more specific, the desires of XHTML have decided that all attributes must be quoted with either single or double quotes. I tend to use double quotes when I'm writing HTML directly. Under rare circumstances when I'm writing a program that writes XHTML I may use single quotes. But I tend to always use the double quotes just for consistency. So by putting that attribute there, we'll see that it right aligns. And we'll talk more about the P tag in particular in the lesson that's about the P tag.
So those are the basic differences between XHTML and HTML, and those are the required elements of an XHTML document, a well formed XHTML document.