Learn what natural language processing (NLP) is by first looking into its subfields and its relevance to data science. Explore software tools available for helping your NLP task as a data scientist.
- [Voiceover] Natural Language Processing, or NLP, refers to a collection of different ways for a computer to make sense out of its interactions with a human being through a natural language. NLP is a comprehensive discipline in computer science and involves topics such as artificial intelligence, computer linguistics, and human computer interaction, or HCI.
There are NLP subfields that are particularly relevant to a data scientist. Tokenization, parsing, sentence segmentation, and named entity recognition are some of them. Tokenization and parsing isolate each text symbol from a text and conduct a grammatical analysis. Sentence segmentation separates one sentence from the other in a text. Named entity recognition identifies which text symbol maps to what types of proper names.
A significant portion of data you're dealing with as a data scientist is unstructured. That is, they are text extracted not from a database, but from sources such as social media sites, text documents, pictures, and so on. Therefore, one of the biggest challenges of a data scientist is to sort through this unstructured data and pre-process it so that data mining and analytics tools can take over to extract the ultimate knowledge they are seeking.
Luckily for the data scientists, there are already well-developed NLP tools patched into program languages such as Python. Some of these tools are also built into an operating system such as Unix or Linux.
Jungwoo Ryoo is a professor of information science and technology at Penn State. Here he reviews the history of data science and its subfields, explores the marketplaces for these fields, and reveals the five main skills areas: data mining, machine learning, natural language processing (NLP), statistics, and visualization. This leads to a discussion of the five biggest career opportunities, the six leading industry-recognized certifications available, and the most exciting emerging technologies. Along the way, Jungwoo discusses the importance of ethics and professional development, and provides pointers to online resources for learning more.
- A history of data science
- Why data analytics is important
- How data science is used in fraud detection, disease control, network security, and other fields
- Data science skills
- Data science roles
- Data science certifications
- The future of data science
Skill Level Beginner
Insights on Data Science: Lillian Piersonwith Lillian Pierson, P.E.23m 51s Intermediate
Learning Data Science: Understanding the Basicswith Doug Rose1h 16m Appropriate for all
1. Define Data Science
6. Future of Data Science
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.