From the course: Advanced NLP with Python for Machine Learning

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Reading text data into Python

Reading text data into Python - Python Tutorial

From the course: Advanced NLP with Python for Machine Learning

Start my 1-month free trial

Reading text data into Python

- [Instructor] So to illustrate how to read in semi-structured text data, we're going to be using a dataset from the extremely useful UCI Machine Learning Repository. This dataset has also been used for Kaggle competitions. The dataset is a collection of text messages, each with a label of either spam or ham. We'll be using the same dataset for the duration of this course, and it's all contained in your exercise files, so you won't need to download it. To start off, we're going to import pandas and then we'll use the read_csv method to read in the CSV into a data frame. And we'll quickly notice that this file is not well-formatted. In fact, to even read this in, we need to indicate that it's using encoding='latin-1. So let's go ahead and read this in and just take a look at the first five rows. So we can see that the first column is our label, so it's going to be either spam or not spam, which is labeled as ham here.…

Contents