From the course: Web Scraping with Python

Unlock the full course today

Join today to access over 22,500 courses taught by industry experts or purchase this course individually.

Solution: Scraping news sites

Solution: Scraping news sites - Python Tutorial

From the course: Web Scraping with Python

Start my 1-month free trial

Solution: Scraping news sites

(upbeat music) - [Instructor] Like I said before, I decided to scrape news articles from Associated Press, CNN, and Yahoo News. To be honest, I got a little lucky with these sites. I did scope out a few different sources and picked ones that seemed moderately easy to scrape, but sometimes you really don't know what you're going to get until you actually build it. So everything went pretty smoothly, all things considered. I created a NewsArticle item, and that contains the title, description, date, author, full text, all that stuff. CNN was probably the most straightforward site to scrape. The only tricky thing I had to do was something that we already covered in chapter one, and that's use the metadata to get information. I was able to get really clean versions of the description and date published from the metadata in the header. The author's name was also there, but still required a little cleaning, which I did in the…

Contents