In this video, create a TF-IDF matrix on a text corpus using Python.
- [Instructor] In this video, we will look at code examples … for building a TF-IDF matrix. … NLTK does not support a simple TF-IDF function, … hence, for this purpose, … we will use scikit-learn library in Python. … From scikit-learn we import the TF-IDF vectorizer package. … We create a simple corpus with a list of sentences. … We are keeping the corpus simple and small … so we can view and understand the TF-IDF array easily. … Next, we initialize the TF-IDF vectorizer. … We also provide a stop-word dictionary setting … so the vectorizer automatically removes stop-words … from this corpus before building TF-IDF. … To create the TF-IDF array, … we simply call the fit_transform method. … Once this is complete, … we print all the featured names or words … from which the array was built. … Next, we print the dimensions of the array. … And finally, we print the array itself. … Let us execute this code and review the results. … We first see the list of tokens from the corpus. … There are only seven tokens and the stop-words …
- Text mining today
- Reading text files using Python
- Cleansing text data
- Build n-grams databases for text predictions
- Preparing TF-IDF matrices for machine learning
- Scaling text processing for performance
Skill Level Intermediate
Processing Text with R Essential Trainingwith Kumaran Ponnambalam55m 57s Intermediate
1. Text Mining
2. Reading Text
3. Text Cleansing and Extraction
4. Advanced Text Processing
5. Best Practices
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.