From the course: Web Scraping with Python
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Solution: Using CNN's sitemap - Python Tutorial
From the course: Web Scraping with Python
Solution: Using CNN's sitemap
(upbeat electronic music) - [Instructor] So you hopefully picked up on the hint in the challenge setup. The client wants some confirmation that we scraped all of the article data in an orderly way. So this means that you want to look for some sort of a complete comprehensive article listing to take advantage of, like a site map, right? So, remember robot.txts is the best place to find these sitemaps, and often the place you want to start for any scraping project. From there, we find the site map index, but all of these site maps off of this are kind of out of order and there's some repetitions, and it's not super clear what's going on here. One thing that should be obvious though, is that if you take the article site map, there we go. This is the year and this is the month. So this is October, 2020. And what we can do is actually just change the year in here. So we can go back to 2015. There we go. Articles…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.