Explore NavigatableString objects.
- [Instructor] Let's explore some navigable string objects. The first thing we need to do is import Beautiful Soup. So we'll say from bs4 import Beautiful Soup. And then we need to create a Beautiful Soup object. And we'll call the Beautiful Soup constructor and pass in a tag, now we used this tag in the last demonstration. I'm just going to copy and paste it in, run that. And now we have a soup object. Moving on to navigable strings, let's create an object called tag and we'll set it equal to soup.b and then we'll call the type function on it and verify that we actually have a tag object.
Yep, Python is returning that we have a tag object here. So that's good, to verify the name of the tag, we can simply say tag.name and we can see that the name of the tag is b. If you just want to isolate a string object from within this tag object, all you'd have to do is say tag.string and it will return the string that's inside the tag. So let's try here with tag.string, it's returning the string that's inside of the tag named b.
Now let's call the type function on our tag string. And we can see here that we have what's called a navigable string, okay so let's set a variable navstring and say that navstring is equal to tag.string. Navstring is equal to tag.string. This is our navigable soup object and let's print it out and see what we get.
We get our product description, that's the string inside of our b tag, and if you wanted to replace the string object, you can just call the replace with method of the navigable string and pass in the replacement string. So here, let's write the name of our object, nav string and then we call the replace with method off of it and pass in a null value and then reprint it. So, tag.string and now as you can see, the product description tag has been replaced with a tag that reads null.
Now I want to show you how to work with navigable string objects, let's go back to our product description markup. And convert that to a parse tree again, I'm going to copy and paste that in from our last demonstration. So, we'll create an html document here and then from that, we'll create a Beautiful Soup object that contains all of this html and run it. If there is one or more string objects within a parse tree, you can easily isolate them. One way to do that is by calling the strips string generator to return all of the strings within the object.
With strings consisting entirely of white spaces are ignored and white space at the beginning and the end of the strings is removed. So, in this example, for each string object in a parse tree, this stripped strings generator passes through, strips white spaces and then prints each string that contains a printable representation. Run that and we get these strings returned.
Notice how the parser leaves the stray u characters at the start of each line, those are nuisances which must be removed in the preparation phase before actually using this data. The last thing I want to show you in this segment is how to access parent tag objects within a parse tree. Let's create a new object called title tag. And we'll set it equal to the title tag from within the parse tree. Soup.title and print it out, this returns the title tag and the text it contains.
Now, if we wanted to access the parent of the title tag, all we have to do is say titletag.parent and that will return the title element's parent. In this case the head, so we'll try that out. We'll say, title_tag.parent and here we get our head tag. Now let's print out the navigable string object of the title tag, we'll say title tag, that's our object. And then we want to print out the navigable string object, run that and we get back Best Books.
That's the string object contained within the title tag. Lastly, we can retrieve the parent of that navigable string, let's do that now, we'll say title_tag.string.parent. And the parent of the navigable string, Best Books is a title tag which is, of course, self evident, right. So remember that you use these navigable string objects to be able to retrieve chunks of text that are stored within tags.
Now that you understand how to work with objects within Beautiful Soup, let's move on to parsing data.
- Getting started with Jupyter Notebooks
- Visualizing data: basic charts, time series, and statistical plots
- Preparing for analysis: treating missing values and data transformation
- Data analysis basics: arithmetic, summary statistics, and correlation analysis
- Outlier analysis: univariate, multivariate, and linear projection methods
- Introduction to machine learning
- Basic machine learning methods: linear and logistic regression, Naïve Bayes
- Reducing dataset dimensionality with PCA
- Clustering and classification: k-means, hierarchical, and k-NN
- Simulating a social network with NetworkX
- Creating Plot.ly charts
- Scraping the web with Beautiful Soup