From the course: NLP with Python for Machine Learning Essential Training

Unlock the full course today

Join today to access over 22,700 courses taught by industry experts or purchase this course individually.

Implementation: Removing punctuation

Implementation: Removing punctuation - Python Tutorial

From the course: NLP with Python for Machine Learning Essential Training

Start my 1-month free trial

Implementation: Removing punctuation

- [Instructor] We've talked about a few different concepts throughout this chapter. In the last lesson, we put those together at a conceptual level, laying out what the full machine learning pipeline looks like. In this lesson, we're going to actually write the code to handle to cleaning portion, or the pre-processing as it's typically referred to, of this machine learning program. These are four steps that you'll see in a lot of text cleaning pipelines: removing the punctuation, tokenization, removing stop words, and lemmatizing or stemming. We're going to focus on the first three steps in this chapter, then we'll cover lemmatizing and stemming in the next chapter, as those are a little bit more advanced and not implemented in every pipeline. So, the beginning of this script will ingest some raw text, and by the end, we'll have text that's all cleaned up and prepared for vectorization step laid out in the last lesson. It's worth noting that there are custom packages to do most of…

Contents