A data warehouse plays a critical role in an enterprise’s business intelligence system. They provide a single source of information for all of an enterprise’s analysis and reporting needs. In this online video tutorial, learn what a data warehouse is and how they fit into the larger BI framework.
- [Instructor] Let's kick off this course with an explanation about what a data warehouse is. A data warehouse plays a critical role in an enterprise's business intelligence system. In a typical scenario, important business data is initially collected in operational or production databases. These can be separate databases for customer transactions and sales data, or human resources and employment records, or corporate financial information. They can even be external data sources maintained by a vendor or a government regulator. All of this data is consolidated through an ETL process that extracts, or copies, the data from the disparate source databases, transforms it into a common format and arrangement, and then loads, or writes, it into a single destination database. This single repository is the data warehouse. Then, once everything is brought into a common, unified location, the data could be used for analysis and reporting, with the knowledge that everyone is pulling business insights from the same set of information. The data warehouse represents the unified truth that everyone in the enterprise can agree on. Data warehouses are designed to collect data over long periods of time, and aggregate and format that data so that it can better support the analysis and reporting requirements of the enterprise. In this way, they differ from the transactional databases that are used for real-time data collection. However, data warehouses use the same table structure with attribute columns and data types and relationships. They also use the same underlying database engine, in our case, SQL Server, and the same query language, SQL. The difference is, is that they only receive data through a standardized method, the extract, transform, and load process, and once loaded, that data doesn't change, where production databases store information that might be entered and modified by customer or employees, or automated sensors, or any number of different data entry points. The ETL process alone is used to populate the data warehouse. This allows it to better organize the data, make it more consistent and predictable across the various inputs, and aggregate it into meaningful groups to support further analysis. The ETL is typically run on a regular schedule, most often nightly, so that the data warehouse can support consistent analyzes for everyone across the organization. So, what are the benefits of incorporating a data warehouse into your enterprise business intelligence strategy? The first is that relieves system resource pressure from your operational databases by removing reporting and analysis workloads. This allows your primary database to focus on the transactional needs of the business, and keeps those resources dedicated to writing and updating records in order to keep up with the real-time workloads of the business. Implementing a data warehouse also helps remove issues related to database isolation level locking contention, which can be caused when complex analysis queries are run against the database, preventing database updates while the query executes. Next, a data warehouse presents a single, consistent point of contact for all analysis and reporting needs, regardless of the original source of the data. This makes queries easier to write and maintain, and greatly simplified the process of gaining insights for users and decision-makers across the organization. Data warehouses also present an opportunity to tune the database for analysis query performance. Because a data warehouse database is solving an inherently different problem from the operational database, different decisions can made about the structure of tables and relationships, and the types of resources dedicated to the system. We're going to talk a lot about this benefit throughout the course. And finally, data warehouses allow an enterprise to maintain historical data, even when the operational databases do not. Again, this relieves pressure from the operational databases by allowing them to focus strictly on the current state of the data. As you can see, a data warehouse is just another database in the enterprise, but they serve a different purpose from the relational databases that you might be used to working with. The primary goal of a data warehouse is to support the decision-making needs of the enterprise, and as such, they require a different approach to their design.
- Transactional databases vs. data warehouses
- Star and snowflake schemas
- Creating a data warehouse
- Designing tables and views
- Rebuilding columnstore indexes
- Creating an Azure SQL Data Warehouse
- Establishing control flow beyond ETL
- Enforcing data quality
- Configuring Master Data Services
- Consuming data from the warehouse in BI services