- [Instructor] Sometimes when you're processing structured data like XML or HTML, you don't want to build a parser that just runs through the document one line at a time. What you'll need to do is have the entire document in memory, and this usually happens when you want to edit the document's contents, and manipulate it at will. You don't want to have to run through the file line-by-line every time. So what you'll need to do is operate on the document's DOM. In this example, we'll see how to use the XML miniDOM class that Python provides, to load an XML file and then operate on the document while it's in memory.
So let's begin by opening the xmlparsing_start file. And the first thing I'm going to do is import the module that lets me operate on an XML DOM. So I'll write import xml.dom.minidom. So the XML file I'm going to be parsing, is this one over here named samplexml.xml. So if you open it up and look at it, you can see it's a pretty standard XML file. It's just got some basic information about a person in it, happens to be me, and you can see there's my name, where I live, some various skills I have, that kind of stuff.
Again, just a simple XML file for parsing. So in my main function, I'm going to use the parse function on the XML miniDOM to load and parse the XML file. So I'll write the code for that. And I'll write xml.dom.minidom.parse. And I'm going to give it the name of the file that I want to parse, and that is samplexml.xml. So this will parse the XML file, and create an in-memory DOM object that I can manipulate.
And because the name of the file that I want to parse happens to be in the same directory as the code, I don't have to do any path manipulation. Once we've parsed the document, let's print out the node name of the root of the document, along with a tag name of the document's first child element. So I'll write print doc.nodeName, and print doc.firstChild.tagName. Now, if these property names don't look familiar to you, they are standard names that are used in the document object model, things like nodeName and firstChild and tagName.
These are all standard properties of DOM elements. Let's just run what we have. So I'll go to the debugger, and open the output window. And sure enough, you can see that the node name of the document is #document. And that's what the WQECE says it should be. And the first child tag is person. And let's just make sure that's correct. Yep, sure enough, tag name is person. So we're off to a good start. Now, let's get a list of XML tags from the document, and print each one.
So I'm going to use the DOM standard function, called get elements by tag name to do this. So I'll get a list of all the skills. And that's going to be doc.getElementsByTagName, and the tag I want is the skill element. So I'll just simply print this out. First, I'll print out the length of the skills list that we got back right here. And then, for skill in skills, I'll simply print out each one.
I'm going to use the getAttribute function to do this. And I'm going to get the name attribute. And you can see that on each skill, there's a name attribute. Alright, so let's go ahead and save that. And let's run it. And sure enough there, we've got the four skills, that's the length of the array, And we can see all of the skills listed right here. Again, everything seems to be working. So now that we've got the document in memory, let's create a new XML tag, and add it into the document.
And we can do this because, remember, we've got it in memory, so we can operate on it. So I'll write newSkill equals document, and now I'm going to use the createElement function, which, again, the WQECE standard function. So I'll create a new skill tag, and I'll set its attribute, and I'll set the name attribute to be the value jQuery. And then, I'll tell the first child to append the new skill I just created.
So I'm going to appendChild, and that's the new skill tag. Alright, so let's take a look at the XML one more time. So I'm going to create a new skill tag and I'm going to append it to the end of this list. And then, we're going to print out the existing list of skills, and then print out the new list after it's been added. So let me go back up here, and let me copy these lines of code. So we'll print out the list before and after.
- Installing Python
- Choosing an editor or IDE
- Working with variables and expressions
- Writing loops
- Using the date, time, and datetime classes
- Reading and writing files
- Fetching internet data
- Parsing and processing HTML