Learn all of the various components that make up the data science field by defining data science and see examples of data science fields as Jungwoo explains our big transformation into the data science industry.
- [Voiceover] Data science is a highly comprehensive term that encompasses a multitude of disciplines and concepts including big data, machine learning, data mining and data analytics. Big data is especially relevant to data science these days. Think of the sheer amount of data becoming available to various organizations and individuals today. As a result of this trend, data science has to increasingly deal with big data.
Essentially big data refers to a data set whose nature including its value, variety, and velocity defies the conventional ways of processing and requires extraordinary treatment. Therefore, big data is a relative term. It is a moving target. One terabyte may be considered to be big today, but it may not be anymore in the near future as the storage and processing technologies become cheaper and faster.
Machine learning frees humans from doing the mundane tasks of trying numerous possibilities of solving a problem to isolate the best solution. The relevance of machine learning and data science stems from the fact that humans are not good at repetitive work and bound to make mistakes when it comes to handling data. This is especially true when the repetition is driven by the size, complexity, and speed of the data as in the case of big data.
Due to its large scale, to obtain any meaningful insight from big data, data science today can no longer rely on humans, but beginning to depend heavily on algorithms that in turn drive computers, hence the name machine learning. Data mining is one of the aspects of data science. It is a process of discovering a pattern in a data set. In the beginning of a mining process, you don't know what you're looking for.
You employ various algorithms, such as those used in machine learning, to unearth a previously unknown pattern or relationship. Therefore, data mining uses machine learning as a tool in its search for new knowledge without any preconceived notions or hypotheses. The data set being used for data mining often reaches the realm of big data.
Unlike data mining, data analytics starts with a specific hypothesis. That is, its purpose is testing the hypothesis. For example, the hypotheses used by data analytics could be something like social media content such as Tweets can predict the risk of a heart disease for individuals.
Jungwoo Ryoo is a professor of information science and technology at Penn State. Here he reviews the history of data science and analytics, explores which markets are using big data the most, and reveals the five main skills areas: data mining, machine learning, natural language processing (NLP), statistics, and visualization. This leads to a discussion of the five biggest career opportunities, the four leading industry-recognized certifications available, and the most exciting emerging technologies. Along the way, Jungwoo discusses the importance of ethics and professional development, and provides pointers to online resources for learning more.
- A history of data science
- Why analytics is important
- How data science is used in social media, climate research, and more
- Data science skills
- Data science certifications
- The future of big data