In this video, review and compare technology options available for various services identified in the cloud data archive architecture.
- [Instructor] In this video, I will evaluate suitable technologies for data storage in GCP and choose the best option available. First, let us look at the file storage for shared transcripts. What are the requirements for this technology? First, it should be low maintenance. The file storage is for archival only for statutory purposes. So, once the file is archived, it should not require any management overhead. Second, it should be massively scalable to accommodate current and future growth. Increase in storage demands should not require re-architecture of the solution. Finally, even though it is archival only, it should be low cost. Monthly recurring costs should be kept as low as possible. What other technology options we have for this on GCP? First, we have Dataproc. Dataproc is nothing but a version of HDFS available on GCP. HDFS is an open source technology that will require management overhead for creating and managing clusters, nodes, files, and folders. While HDFS provides for portability, it does not help with low maintenance especially compared with other technologies available on GCP. HDFS is massively scalable. It can scale horizontally to accommodate bigger bytes of data. HDFS is not really cost effective. Irrespective of whether the file needs frequent access or not, the cost of the solution is the same. In our case, we need a store and forget data (mumbles) and there are cheaper options available. Next, we look at Google Cloud storage. Cloud storage is a managed object depository with excellent manageability and scaling. Cloud storage is managed by GCP, so there is no manual management needed to monitor and add additional nodes for storage. It is auto-scaling. Cloud storage is massively scalable to better bytes of data. So there is no worry of running into limitations. Finally, cloud storage provides multiple storage classes, each optimized for different access use cases. In our use case, coldline storage will suffice and it comes with the lowest cost. Coldline storage is optimized for objects that are accessed infrequently which is exactly our use case. Based on analysis, we will choose Cloud storage as the technology for storing chat transcripts. We now move on to evaluate a database to store database transcations. The requirements for this database are again, the same as the chat transcript storage. We need a solution that is low maintenance. It should be massively scalable. It should also be cost effective. In addition, we don't need transactions or support for updates. First, let us look at Google BigQuery. BigQuery is an app and only data (mumbles) that can be used to archive data and run sequel queries on them. BigQuery is fully managed and hence, low maintenance. There is no need to manage databases, clusters, or nodes. It is massively scalable to terabytes of data. BigQuery is cheaper than most GCP, RDBMs, most database products. This is because BigQuery isn't app and only database with no transaction support. But it works well as an archived data warehouse. We next look at Cloud Spanner. CloudSpanner provides the best of both worlds. It provides consistency and transactions like RDBM's and can scale like most sequel. It is also a GCP managed service so it is low maintenance. It can also scale to terabytes of data. But CloudSpanner is a relatively expensive solution. It provides the best of RDBM's and no sequel walls. But the key question is, does your use case require it? Given that our use case does not need transactions, no frequent updates, CloudSpanner becomes an expensive option. So which one do we choose? We will choose BigQuery. It provides all the features we need, while we don't have to pay for things we don't need, like transaction support. It provides a sequel interface that can be used to execute ad hoc workloads if needed.
- Benefits and shortcomings of GCP
- Enterprise and multicloud integrations
- Comparing GCP technology options
- Outlining solutions for various problems
- Analyzing use cases and best fits