One of the most useful aspects of Tableau Prep is the Profile pane. In this video Matt explains how this will allow you to quickly see the shape and makeup of your data, without needing to do anything else.
- [Instructor] After you've connected to the data and added a table to the flow, you can then use the Profile pane to explore the data. This example is going to use the Demo Flights July 2016 file in your exercise folder. And the first thing we need to do in order to examine this data is to add a clean step. So for that, we click on the plus, and Add Step. This now opens up both the Preview pane and the data grid. The Profile pane gives us some high level information about our data. The first thing it tells us is the number of rows in our data set.
We can see here, we've got around about 126,000 rows. We've got nine fields, so that's nine columns worth of data. So we can see how big our data set is and how many columns we've got to work with. Each one of the columns is shown below, both with the data type and the name of the column. And here we can see the distribution of that data. So for example, Airline Description has all of the unique names for the airlines. And the bar shows how many rows have that particular value.
We can sort them from highest to lowest. And we can see, then update the data set, Southwest has the highest number of rows, about 29,000. Now what's really cool is that we can click on one of these bars and it highlights the other rows in the other columns. So for example, in the Destination State we can see that for all of the flights that go to California, some 16,000 rows, of those, 4935 are due to the Southwest Airlines.
This allows us to see some of the detail in the data without having to see row by row information. We can also see how many unique values that we have. For example, out of those 126,000 rows we actually only have 293 Destination Cities, 52 Destination States, and so on. So the Preview pane gives you a top level overview of the whole data set. Where we have numbers rather than strings, Tableau looks at the distribution. For dates, it list every single unique date.
Tableau will group up to a certain level of information, which we can change. So here I'm looking at Date, but I could change it to Date & Time. It's going to take the time portion of that date into account when it shows us the distribution. Another really, really useful use of the Profile pane is to show null values. We can see that for Tail Number for example, null is the biggest value. This can indicate a problem with our data. Do we expect there to be any null values in there? Maybe, maybe not.
If there is, that's fine, but if not, it can highlight those. If we click on null, we can then look through the rest of our data to see which ones these occur in. We can then fix that in subsequent steps.
- The data prep cycle
- Connecting to data
- Examining data in the preview pane
- Cleaning data
- Combining data using joins and unions
- Reshaping and pivoting data
- Previewing and sharing data
- Data sampling to improve performance