From the course: Advanced NLP with Python for Machine Learning
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Reading text data into Python - Python Tutorial
From the course: Advanced NLP with Python for Machine Learning
Reading text data into Python
- [Instructor] So to illustrate how to read in semi-structured text data, we're going to be using a dataset from the extremely useful UCI Machine Learning Repository. This dataset has also been used for Kaggle competitions. The dataset is a collection of text messages, each with a label of either spam or ham. We'll be using the same dataset for the duration of this course, and it's all contained in your exercise files, so you won't need to download it. To start off, we're going to import pandas and then we'll use the read_csv method to read in the CSV into a data frame. And we'll quickly notice that this file is not well-formatted. In fact, to even read this in, we need to indicate that it's using encoding='latin-1. So let's go ahead and read this in and just take a look at the first five rows. So we can see that the first column is our label, so it's going to be either spam or not spam, which is labeled as ham here.…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.