In this video, we'll list the various modules in the data science life cycle and identify the ones we'll be discussing in this course.
- [Instructor] Data science is essentially a pipeline that contains a number of modules which work on data progressively to deliver insights and actions. The process starts with acquisition of data from various sources. Connectors to these sources understand, acquire, and transform data as they are pushed into the pipeline. Next comes data transport. Depending upon the data source and destination, this could be within a LAN or around the globe.
Data transport ensures reliability while delivering data at the speed required by the business. Then there is storage. Raw data acquired from the sources is stored in persistent stores, like databases. Processing jobs glean, process, and transform data and place it back into persistent stores. Data in these stores are used for exploratory analytics, to extract insights about the business or entities of interest. Data is also used for predictive analytics to predict future actions or behavior.
So, how does the Google Cloud Platform, or GCP, support these modules? GCP provides end to end support for all the modules and activities in data science. It can be used either as an infrastructure, a platform, or a service in these pipelines. There are multiple options available for each module. For example for data storage, GCP supports more than five types of data stores. GCP is fully managed and minimizes administration and monitoring effort for these modules.
It provides horizontal scaling as data volumes grow and processing jobs multiply. This course focuses on the predictive analytics module of data science. We look at ways to build, train, and test models and do batch and real time predictions using tools available on the GCP platform.
- Evaluating the machine learning tools in GCP
- Understanding the predictive analytics process
- Building models
- Training models with jobs
- Building and running predictions
- Best practices for cost control, testing, and performance monitoring