Learn the various foundational technologies that make data science possible, such as cloud computing, machine learning, data mining and visualization.
- [Voiceover] There are a number of underlying technologies that make data science a reality. These include data infrastructure, data management, and visualization technologies. Data infrastructure technologies support how data is shared, processed and consumed. One of the most popular data infrastructure technologies data scientists use today is distributed computing in general and in particular cloud computing.
There are key underlying technologies that enable cloud computing. Virtualization is one of them, distributed file sharing is another. In particular, redundant array of independent disks or RAID and Hadoop distributed file system or HDFS are prominent ones. Data Management is handled by database management systems or DBMS.
Data Science requires highly scalable, reliable, and efficient ways to store, manage, and process data. Which is why DBMS plays a critical role in data science. As big data becomes mainstream, unstructured data is also becoming more prevalent. In fact, the majority of business related data is unstructured. It consists of word processing, presentation, log files, and so on.
However, a significant portion of our data is still stored in conventional relational DBMS and in a structured data format. As a result, the new generation of data science professionals have to be versatile enough to be able to deal with both unstructured and structured data sets. Knowledge in SQL is still invaluable in the context of data management.
Once data analysis is over, the newly acquired insight needs to be conveyed to the leadership and the rest of an organization. No matter how significant the discoveries are, if data scientists fail to communicate them effectively, especially in the context of strategic goals of the organization, their impact will be minimal. This completely beats the purpose of various data science efforts made in support of the organization.
Jungwoo Ryoo is a professor of information science and technology at Penn State. Here he reviews the history of data science and analytics, explores which markets are using big data the most, and reveals the five main skills areas: data mining, machine learning, natural language processing (NLP), statistics, and visualization. This leads to a discussion of the five biggest career opportunities, the four leading industry-recognized certifications available, and the most exciting emerging technologies. Along the way, Jungwoo discusses the importance of ethics and professional development, and provides pointers to online resources for learning more.
- A history of data science
- Why analytics is important
- How data science is used in social media, climate research, and more
- Data science skills
- Data science certifications
- The future of big data