From the course: NLP with Python for Machine Learning Essential Training

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Exploring the dataset

Exploring the dataset - Python Tutorial

From the course: NLP with Python for Machine Learning Essential Training

Start my 1-month free trial

Exploring the dataset

- [Instructor] In the last lesson, we learned how to read in our data set the difficult way to arm ourselves with some text manipulation tools, and then we learned the easy way, using Pandas' read csv function. Now we're going to read that data back in the easy way. So again, we're importing Pandas and storing it as pd, and then we're calling pd dot read csv, where all we have to do is pass in the name of the data set, so that's sms spam collection dot tsv. And then we have to explicitly tell it what the separator is. So we tell it backslash t, which indicates tab separated. And then lastly we tell it that it shouldn't expect a header. In other words, if we don't include this, it'll just take the first row as the column names. Then we're storing that as full corpus. Now, as we saw in the last lesson, there won't be any column names. So let's go ahead and actually tell it what to name our columns. So let's call it full corpus dot columns, which just stores the column names, and then we…

Contents