Learn about the concept and importance of data science. Jungwoo introduces core data science terms and solutions.
- Data science is a broad term that refers to a collection of related disciplines focusing on the use of data to create new information and knowledge. The ultimate goal here is to provide useful insights for better decisions. Big data is a subset of data science whose goal is to overcome the challenges of analyzing the huge volume and variety of modern data generated at exceptionally high speed.
Computers have helped automate the mundane task of number crunching and freed us to concentrate on more creative and meaningful aspects of data science such as setting up a model and interpreting the results of a data analysis. However, they also contributed to the production of significantly more data by becoming data sources themselves.
Think about all the computing devices around us today such as cellphones, security cameras, Fitbits, et cetera. They are ubiquitous and constantly generate data and the number of these things connected to the internet, also known as Internet of Things or IoT, is ever growing. Ironically, once again computers are coming to our rescue to solve this problem.
This time, they not only help us deal with the sheer amount of data, but also allow us to make better decisions by automating some of the data interpretation. In some cases, computers can even make decisions for us based on the algorithms trusted to make accurate predictions. Data analytics is the name for this more enhanced way of taking advantage of our exponentially increasing computing power and storage capacity.
There are some older trends making a comeback in the data science revolution. For example, virtualization, cloud computing, distributed processing, distributed file systems, and machine learning. To keep abreast of the rapidly changing landscape of data science, it is crucial for us to develop a decent understanding of how these tools of the data science trade work and that's the journey we are about to embark on.
- Enabling technologies in data science
- Cloud computing and virtualization
- Installing and working with Proxmox, Hadoop, Spark, and Weka
- Managing virtual machines on Proxmox
- Distributed processing with Spark
- Fundamental applications of machine learning
- Distributed systems and distributed processing
- How Hadoop, Spark, and Weka can work together