From the course: Web Scraping with Python

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Solution: Using CNN's sitemap

Solution: Using CNN's sitemap - Python Tutorial

From the course: Web Scraping with Python

Start my 1-month free trial

Solution: Using CNN's sitemap

(upbeat electronic music) - [Instructor] So you hopefully picked up on the hint in the challenge setup. The client wants some confirmation that we scraped all of the article data in an orderly way. So this means that you want to look for some sort of a complete comprehensive article listing to take advantage of, like a site map, right? So, remember robot.txts is the best place to find these sitemaps, and often the place you want to start for any scraping project. From there, we find the site map index, but all of these site maps off of this are kind of out of order and there's some repetitions, and it's not super clear what's going on here. One thing that should be obvious though, is that if you take the article site map, there we go. This is the year and this is the month. So this is October, 2020. And what we can do is actually just change the year in here. So we can go back to 2015. There we go. Articles…

Contents