Learn what natural language processing (NLP) is by first looking into its subfields and its relevance to data science. Explore software tools available for helping your NLP task as a data scientist.
- [Voiceover] Natural Language Processing, or NLP, refers to a collection of different ways for a computer to make sense out of its interactions with a human being through a natural language. NLP is a comprehensive discipline in computer science and involves topics such as artificial intelligence, computer linguistics, and human computer interaction, or HCI.
There are NLP subfields that are particularly relevant to a data scientist. Tokenization, parsing, sentence segmentation, and named entity recognition are some of them. Tokenization and parsing isolate each text symbol from a text and conduct a grammatical analysis. Sentence segmentation separates one sentence from the other in a text. Named entity recognition identifies which text symbol maps to what types of proper names.
A significant portion of data you're dealing with as a data scientist is unstructured. That is, they are text extracted not from a database, but from sources such as social media sites, text documents, pictures, and so on. Therefore, one of the biggest challenges of a data scientist is to sort through this unstructured data and pre-process it so that data mining and analytics tools can take over to extract the ultimate knowledge they are seeking.
Luckily for the data scientists, there are already well-developed NLP tools patched into program languages such as Python. Some of these tools are also built into an operating system such as Unix or Linux.
Author
Released
1/26/2018Jungwoo Ryoo is a professor of information science and technology at Penn State. Here he reviews the history of data science and its subfields, explores the marketplaces for these fields, and reveals the five main skills areas: data mining, machine learning, natural language processing (NLP), statistics, and visualization. This leads to a discussion of the five biggest career opportunities, the six leading industry-recognized certifications available, and the most exciting emerging technologies. Along the way, Jungwoo discusses the importance of ethics and professional development, and provides pointers to online resources for learning more.
- A history of data science
- Why data analytics is important
- How data science is used in fraud detection, disease control, network security, and other fields
- Data science skills
- Data science roles
- Data science certifications
- The future of data science
Skill Level Beginner
Duration
Views
Related Courses
-
Twelve Myths About Data Science
with Ben Sullins36m 5s Intermediate -
Insights on Data Science: Lillian Pierson
with Lillian Pierson, P.E.23m 51s Intermediate
-
Introduction
-
Welcome1m 9s
-
-
1. Define Data Science
-
Introduction1m 24s
-
A brief history2m 37s
-
Fundamentals3m 15s
-
Big data analytics1m 44s
-
Enabling technologies2m 51s
-
-
2. Marketplace
-
Introduction to marketplace1m 26s
-
Fraud detection2m 5s
-
Social media analytics2m 9s
-
Disease control1m 24s
-
Dating services1m 50s
-
Simulations1m 28s
-
Climate research1m 24s
-
Network security1m 16s
-
-
3. Skills
-
Required skills2m 42s
-
Data mining and analytics1m 49s
-
Machine learning1m 33s
-
Statistics1m 10s
-
Visualization1m 35s
-
-
4. Roles
-
Introduction to roles1m 49s
-
Data scientist or engineer1m 48s
-
Data visualization developer2m 26s
-
Salaries1m 32s
-
-
5. Certifications
-
6. Future of Data Science
-
Emerging technologies1m 44s
-
Emerging careers1m 34s
-
Ethics1m 51s
-
Professional development1m 45s
-
Conclusion
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.
CancelTake notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.
Share this video
Embed this video
Video: Natural language processing (NLP)