Learn how data mining and anlytics work by analyzing different aspects of these two fascinating disciplines of data science. Jungwoo covers core areas of data mining and analytics, such as text retrieval, classification, prediction, and clustering.
- Data mining and analytics involve a myriad of data manipulation techniques. Text retrieval is one of the most well-known data mining techniques. It builds on many foundational concepts and methods developed by Natural Language Processing, or NLP. Classification constructs a model that labels a group of data objects into a specific category. In the classification model, the classes with their own labels are discrete in nature.
For instance, the same classification model can categorize people into groups of trustworthy and untrustworthy users of an online banking system. Prediction builds a model that produces continuous or ordered values that form a trend. For instance, a prediction model can provide estimated mean time to failure, or MTTF, values for a computer. Clustering is a process of grouping similar data objects into a class.
Clustering helps reveal features that distinguish one class of data objects from the other leading to new discoveries on a data set. Uses of clustering analysis range from pattern recognition and image processing to market research. For example, clustering can reveal people of similar purchasing behaviors. As you might have noticed already, the difference between classification and clustering is that classification starts with predefined labels while the labels are created after the fact for clustering.
Jungwoo Ryoo is a professor of information science and technology at Penn State. Here he reviews the history of data science and analytics, explores which markets are using big data the most, and reveals the five main skills areas: data mining, machine learning, natural language processing (NLP), statistics, and visualization. This leads to a discussion of the five biggest career opportunities, the four leading industry-recognized certifications available, and the most exciting emerging technologies. Along the way, Jungwoo discusses the importance of ethics and professional development, and provides pointers to online resources for learning more.
- A history of data science
- Why analytics is important
- How data science is used in social media, climate research, and more
- Data science skills
- Data science certifications
- The future of big data