In this video, learn some best practices for scaling text processing for performance.
- [Instructor] How do we process large quantities … of text data in a scalable manner? … Big data is revolutionizing the way we process text. … We should take advantage of big data technologies … to process text. … First, use technologies that allow for parallel access … and storage of data. … Technologies like Kafka, HDFS, and MongoDB … support of a number of nodes and channels … to allow for parallaxes, movement, and storage of data. … Process each document independently … with a map function. … Activities like cleansing and tokenization … can be done this way. … This allows for multiple nodes to process documents … in parallel and hence, speed up the pipeline. … Use reduce functions late in the processing cycle … after all filtering and cleansing is done. … Reduce functions like aggregations create choke points. … So we want to use as small data sets as possible. … …
- Text mining today
- Reading text files using Python
- Cleansing text data
- Build n-grams databases for text predictions
- Preparing TF-IDF matrices for machine learning
- Scaling text processing for performance
Skill Level Intermediate
1. Text Mining
2. Reading Text
3. Text Cleansing and Extraction
4. Advanced Text Processing
5. Best Practices
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.