Learn about the basic principles of a star schema.
- [Instructor] So in the Front Office Database, we generally build Star Schemas, also known as data marts. And a Star Schema is really just this data mart design pattern, consisting of fact and dimension tables that resemble a sort of star-like shape. In Facts what we have are the Subject of our Analysis, something like sales, website visits, online orders, and Dimensions are the Context of our Analysis, so if we're looking at sales, we may think of sales by country, sales by product line, sales by customer.
That is the Context of our Analysis. Now Facts are generally Numerical, they're numbers that we aggregate, we summarize them. Whereas, Dimensions are most often Textual. There's also date values, which arguably are text or not. And facts hopefully are stored at what we call the Atomic Grain. This is the lowest level of detail, the detail that the event actually occurs in. So if somebody visited the website, I want to know the exact time stamp, the page that they landed on, the browser they're using, all of that detail.
I don't want to know how many people visited for the total day, that would be an aggregation, some summary of my atomic fact level data. And dimensions also are slowly changing. That means that as things change over time, we're going to want to record history of that, and this is one of the designing challenges of dimensions. There are multiple types of fact tables, but often we think of them as just an event log, a transaction record of events that have occurred. And dimensions contain most of the attributes.
So when we think about dimensions, it's all attributes, so it's the names of things, it's the categories, it's all the different stuff there. Now facts can be additive or semi-additive, meaning we can add these numbers up. But sometimes we can't, so for example, if in our fact table, we had the monthly count of something or a snapshot. You can't necessarily add different snapshots across time, because they're non-additive. So when we build fact tables, we need to think about all the numbers we're storing in there, and what the impact is on our analysts.
And lastly, dimensions are what we use to slice and dice, any time you hear slice and dice in a BI tool like Tableau, or Power BI or even Excel, what you're talking about is using a dimension value to filter the data. Alright, so let's take a look here and see what this might actually look like, if we visualized a Star Schema. Well, in the middle we would have our fact table, and here we can think of facts as our sales, so sales event, sales orders. And part of the dimensions, right, part of the context here might be Clients, so which clients actually placed those orders? Then you could have Products, so which products did they order? And what Location are they at, or what location did they purchase from? Then we can think of what was the Source, was this a web order, was it an in-store order if we have stores.
And when did it occur? So what time did this sale actually happen? And if we lay it out like this, you can see that it kind of resembles a star. And this is where Star Schemas get their name. Essentially in the middle you have facts and just one join away, in database terms, you have all of the context you could possibly want, about those events that occurred.
- Working with systems and schemas
- Managing of a good data pipeline
- Setting up an environment
- Loading and profiling data
- Testing quality
- Adding data types
- Handling missing values and inferred members
- Performing master data lookups
- Loading schemas and tables
- Creating views