Start free trial Sign in

From the course: XML Essential Training

What is XPath? - XML Tutorial

From the course: XML Essential Training

Start my 1-month free trial

What is XPath?

“

- [Facilitator] A little earlier in the course I talked about some complementary technologies for working with XML. One of those is XPath, which is what we're going to cover in this section. XPath is a W3C standard for accessing data in XML content. It's a pretty fundamental part of some other XML technologies, like XSLT, which we'll look at a little later, Xquery, and some others. Essentially, XPath provides a way to extract information from XML content using path notation. You can think of this as being analogous to the way that directory paths work in a file system. You can learn a lot more about XPath and how it works at this URL on the W3C's website. What I'm going to do in the rest of this section is just give some practical examples for how XPath works, and then we'll take it for a spin. Probably the best way to see what I mean by defining a path into a document is to take a look at a few real examples. Suppose we had an HTML document that represented in tree form like this. Now, suppose we wanted to get a reference to this title tag right here. We could define a path into the XML content by writing something like this. A slash, then HTML, then another slash, then head, and then another slash, and then title. And that essentially means, starting at the HTML tag, go down to the head tag and then go down to the title tag. All right, suppose we wanted to do something a little more advanced. Suppose we wanted to get references to these three paragraph tags. Now remember, in file systems, file names have to be unique within a given directory. But, in XML, you can have multiple tags of the same name under the same parent tag. So, in this case, we would write slash HTML, and then slash body to get down to the body tag, and then slash p. That would give us all three paragraph tags. If we only wanted, say, this paragraph tag, then we would modify the path expression by putting what's called a predicate expression at the end of the path. In this case, a bracket with the number one. Now, that might be a little confusing if you're used to other programming languages where arrays are indexed using zero as a base. XPath is one based. So, that would give me the first paragraph tag. So, let's review some important XPath concepts and then we'll take it out for a drive, because actually using XPath is probably the best way to learn it. XPath has a very compact syntax and it's pretty quick to learn. The path expression is a series of what are called location steps. So, for example, when I wrote slash HTML and then slash body, each one of those is a location stamped in the document. There's something called the context node. That's where the path evaluation starts from, and in more advanced scenarios using XPath, you can change the context node to be whatever you want. Now, I'm not going to get into that here. But, you should just be aware that it's possible, and that it's essentially the node we start evaluating the path from. There's also the notion of an axis. The axis is the relation between the context and the nodes that are selected by the path. So, you start at the context node. You end up at the selected nodes, and then that path is called the axis. Another term to be familiar with, and I've already used it, is predicates. I showed an example in the previous slide when I put that little bracket with the number one on the paragraph tag to narrow down my selection. Predicates are further refinements to the selection process. You can think of them almost as a kind of filtering method. All right, let's take a look at some path examples, and then we'll take out XPath for a spin. The first is the slash character. That path expression simply says, select the root tag in the document. The next example of slash rootTag also means select the root tag, but only if it happens to be named in this example "rootTag." The next example with two forward slashes says, select all the elements named "tagName," regardless of where they are found in the document. So, if I had named tagName elements scattered throughout the file, this would get all of them. It's like a global search. The text function selects the text content of the current node. The at symbol selects the attribute of the current node with the given attribute name. Finally, two periods means, select the parent of the current node. So, now let's see some more advanced examples. So, for example, I can write slash doc, slash chapter with a predicate of five, which selects the fifth chapter under the doc tag. If I wanted to get the last paragraph element in the document, and I didn't know how many there were, I could use the last function to do so as part of a predicate. The next example, slash body, slash p, with a predicate expression of at class equals a, selects the paragraph tags under the body element as long as they have a class attribute that's equal to a. Then finally, that last example, the two forward slashes, and then p, and then at class, followed by at style says, select all the paragraph tags in the document as long as they have both the class and the style attribute regardless of what their values are. So, this should give you a little bit of a semblance of how powerful XPath can be, while still remaining relatively simple to write. But again, probably the best way to learn this is to actually take it out and use it. So, let's do that in the next video.

Contents