From the course: Data Science on Google Cloud Platform: Designing Data Warehouses

Data science modules covered

From the course: Data Science on Google Cloud Platform: Designing Data Warehouses

Start my 1-month free trial

Data science modules covered

- [Instructor] Data science is essentially a pipeline that contains a number of modules, which work on data progressively to deliver insights and actions. Let us review the list of modules and the scope of what is part of this specific course. The data science process starts with acquisition of data from various sources. Connectors to the sources understand, acquire, and transform data as they are pushed into the pipeline. Next comes data transport. Depending upon the data source and the destination, this could be within a LAN or around the globe. Data transport ensures reliability while delivering data at speed required by the business. Then there is storage. Raw data acquired from sources is stored in persistent stores like databases. Processing jobs cleanse, process, and transform data and store them back into persistent stores. Data in these stores are used for exploratory analytics to extract insights about the business or entities of interest. Data is also used for predictive analytics to predict future actions or behavior. So how does Google Cloud Platform, or GCP, support these modules? GCP provides end-to-end support for all modules and activities in data science. It can be used either as an infrastructure, a platform, or a service in these pipelines. There are multiple options available for each module. For example, for data storage, GCP supports more than five types of data stores. GCP is fully managed and minimizes administration and monitoring effort for these modules. It also provides horizontal scaling as data volumes grow and processing jobs multiply. This course focuses on data warehouses, which are part of the storage module of data science. We will explore different storage options in GCP. Then we will do an in-depth study of the Google BigQuery product, which is the data warehouse product within GCP.

Contents