In this video, explore a scenario about enterprise data warehouse.
- [Instructor] Now we'll take a look at data warehousing. This is an area where I've been doing work for almost 10 years, so it's been really interesting to see the evolution of cloud services, GCP included. Now, BigQuery, the core service in data warehousing scenarios, has been around for a really long time, however its applicability to data warehousing is relatively recent and that's because the growth of both partner services and more importantly, key sets of services that I like to say surround BigQuery, has been coming really quickly. It makes me really excited about the customer choices that are available for using BigQuery as a data warehousing solution. Now, in this sort of typical architecture, you see that, of course you use a couple of cloud storage staging locations to pull in the external data for the data warehouse. What's interesting, in addition to using BigQuery, is the set of services in the middle, and after we look at enhancements to BigQuery itself for data warehousing generally and enterprise scenarios even more globally, we'll then look at some of the newer services, or enhanced services, for getting the data ready for use in a data warehouse and these include cloud data prep, data fusion and data flow. So I'll first refer you to a really useful white paper. It's called, "BigQuery for data warehouse practitioners". If you do have experience with on-premise enterprise data warehousing, this white paper clearly explains and translates familiar terms such as Data warehouse, Data mart, Data lake and so on and so forth to different services on the Google Cloud platform ecosystem. I also really like how the article explains evolving into using more and more of the GCP services. It's starts with a really simple architecture where you pull in data from source systems and you use GCS as a staging area and you use GCS as a staging area before you then apply BigQuery data warehouse querying patterns. Now as you continue to read the article, you'll see that it gives you a path to using BigQuery for staging and then to moving into working with other products which might be more useful in terms of storage for intermediate querying. And most of these are no sequel databases such as Cloud Bigtable or Cloud datastore. The article talks about both batch and streaming which is a relatively recent addition to the BigQuery set of capabilities and really enables a number of new scenarios with my customers. The article also does a good job explaining proper schema modeling for handling change over time and then it gets into querying patterns. Now speaking of querying patterns, although BigQuery is sequel-compliant, there are some patterns that are going to give you a better result in terms of time for the query to execute and service cost because you can impact this based on the amount of data that's scanned. So I recommend this BigQuery cookbook which talks about best practices for Query optimization. Now to bring us to a concrete use case, what I've done in preparation is I've uploaded a CSV file just a really small one, have little over a thousand rows up into a bucket in GCS. What I've then done, is I've created a data set inside of BigQuery from that data in the GCS bucket. And I've written a query against that data and then saved another version of the query as a view. So just showing you the basic capabilities and if you're unfamiliar, again you're going to want to consult the core documentation. The service itself, is completely serverless which is a tremendous advantage in terms of moving from generally on-prem server base data warehousing. Using BigQuery as a data warehouse allows you to serverlessly bring in all of your enterprise data and execute a number of sequel queries against it. Now in addition to the serverless component of working with BigQuery, it also has integrated visualization tools. Once I execute a query, I'll have the option to view in Data Studio. Now I've done this in advance and this is what a query result looks like. This is basically a pivot chart. You have all the fields from the query here with visualization types. Now there are other products that are available for integration with BigQuery. I've used for example, Tableau with several Enterprise customers which is a third-party product but the fact that Data Studio is now available and integrated as an explorer really increases the usability for several data warehousing scenarios.
- Enterprise concerns
- Enterprise scenarios
- Setting up your organization’s account
- Managing billing
- Enterprise compute services
- Enterprise storage and database services
- Enterprise data pipelines
- GCP developer and DevOps tools