From the course: Advanced NLP with Python for Machine Learning
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Prep the data for modeling - Python Tutorial
From the course: Advanced NLP with Python for Machine Learning
Prep the data for modeling
- [Narrator] As a recap, we now know four different ways to capture the information in text data and then fit a model on top of it. So we reviewed TF-IDF and then we learned about Word2Vec, Doc2Vec, and recurrent neural networks. In this chapter, we're going to compare the ability of our different techniques to classify text messages in our dataset as spam or ham. In order to expedite this process, we're going to clean and split our data and then save that as their own datasets so we don't have to repeat that process in each video. This also ensures that each model is training and evaluating on the same exact data. So let's start by reading in our data, converting the spam/ham label to a numeric/binary label, and cleaning our data. Now let's split our data into training and test sets. I want to note that we're just using a single holdout test set for the duration of this course, rather than a test set and a validation set…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
-
(Locked)
Prep the data for modeling2m 52s
-
(Locked)
Build a model on TF-IDF vectors6m 34s
-
(Locked)
Build a model on word2vec embeddings6m 41s
-
(Locked)
Build a model on doc2vec embeddings3m 59s
-
(Locked)
Build an RNN model5m 11s
-
(Locked)
Compare all methods using key performance metrics4m 16s
-
(Locked)
Key takeaways for advanced NLP modeling techniques3m 6s
-
(Locked)
-