Discover the various modules in the data science lifecycle and explore which ones are covered in this course.
- [Instructor] Data science is essentially a pipeline that contains a number of modules, which work on data progressively to deliver insights and actions. The process starts with acquisition of data from various sources. Connectors to these sources understand, acquire, and transform data as they are pushed into the pipeline. Next comes data transport. Depending upon the data source and the destination, this could be within a LAN or around the globe.
Data transport ensures reliability, while delivering data at speed required by the business. Then there is storage. Raw data, acquired from the sources, is stored in persistent stores, like databases. Processing jobs clean, process, and transform data and place it back in the persistent stores. Data in these stores are used for exploratory analytics to extract insights about the business, or entities of interest.
Data is also used for predictive analytics to predict future actions or behavior. So how does the Google Cloud Platform, or GCP, support these modules? GCP provides end-to-end support for all modules and activities in data science. It can be either used as an infrastructure, a platform, or a service in these pipelines. There are multiple options available for each module. For example, for data storage, GCP supports more than five types of data stores.
GCP is fully managed and minimizes administration and monitoring effort for these modules. It also provides horizontal scaling as data volumes grow, and processing jobs multiply. This course focuses on exploratory data analytics module of data science. We look at ways to segment, profile, analyze, and visualize data using tools available on the GCP platform.
- Setting up Cloud DataLlb for exploratory data analytics
- Segmentation and profiling
- Reading and writing data from BigQuery
- Managing cloud storage buckets
- Creating visualizations of BigQuery data with the GCP Charting API
- Managing Datalab instances