In this case study, see how to use Excel PivotTables to review salary information for government employees in San Francisco, including employee names and titles, as well as base and overtime pay from 2011–2013.
- [Instructor] Next up we've got our San Francisco salary case study. Quick summary of the data set, we're looking at salary information from San Francisco government employees between the years 2011 and 2013. So, just over 24,000 rows, or records of data here, including a number of different dimensions and measures. On the dimensions side, we've got employee name, the year, their employee ID, and their job title. And for measures, we have base pay over time pay, and other pay. So the concepts that we'll cover in this particular case study are sorting and filtering, grouping, calculated fields, and table layouts.
So let's jump in. Alright, so in your Pivot Table Case Study Workbook, go ahead and jump to the San Francisco Salaries tab, we're going to Control A to grab all that data, we're going to insert a new pivot table onto a new worksheet. Let's go ahead and just rename that sheet Salary Pivot, and assign it Tab, color of light green, again, just to kind of differentiate between the raw data and the pivot tabs. So, first thing's first, let's pull an employee name, just so that we're looking at data at the employee level to start.
And I can check the boxes to bring in all of our measures, base pay over time pay, and other pay. And I'll just go in and format all three of those fields as currency, and I don't really need decimal points here, so. Here we go. Alright, so now we're formatted. I also want to insert a new calculated field that we'll just call Total Pay, cause I want to see the sum of all three of those measures, the base pay plus any overtime pay that anyone earned, plus any other pay as well.
So there you go, total pay equals base plus overtime plus other. And now I'm going to approach this in kind of an unbiased way, and just start to explore the data a little bit and see what I find. So as a first pass, you know, maybe we just sort our employee names descending by that new total pay field that we just created. We want to see who were the top earners during this entire three year period. So, Judy Melinek earned 553K in total pay during this period.
Mike Bryling earned 460. And then, then now instead of looking at the overall sample, I want to see the top earners for just a given year. So I'll bring year into my filters box. Let's say just for 2013, who earned the most? So, Gary Altenburg, for instance, had a great year in 2013, he earned a total pay of just under 363,000, but what's kind of interesting about Gary, is that he earned 129,000 in base pay, and then another 221,000 from overtime, which is a huge amount of overtime pay, especially compared to some of these other employees who didn't earn anything in overtime, like Amy Hart, for instance, or Sharon, or James Dudley.
So, that's kind of an interesting trend, and that's something that, as an analyst, I'm interested in diving a little bit deeper into. So, what we can do is actually grab job title and pull that in, just to get a sense of what Gary and Amy actually do. And so when we do that, we see that Gary, turns out he's a Lieutenant, looks like at the fire department, whereas Amy is some sort of department head. You know, and I suppose that makes sense, if firefighters tend to earn more overtime pay, just because of the nature of their work.
You know, I'd buy that, so, kind of an interesting insight there. Now if we pull job title out, we can continue to explore that overtime pay trend using another calculated field. So, what I want to do now is, instead of just look at the volume of overtime in terms of the number of dollars, I want to look at the rate. So, overtime percentage, which can be calculated just as overtime pay out of the total. And this will give me a different kind of lens to look at my data through.
And let's just make it a percentage with a decimal point. And now what this new field allows me to do, is sort my employees descending by that overtime percentage and see kind of who's earning the biggest chunk of their paycheck from overtime, versus base pay or other pay. And in this case, Janie Jennings is at the top of the list, 82.2% of her pay was categorized as overtime. So, 33,000 out of 40,358 dollars.
But there's kind of a lot of noise here in this list, I've got some employees that are only earning, you know, a few hundred, or a few thousand dollars, meaning that they're probably part-time or contract employees. And so we can drill into our sorting filtering options, and what I would do in this case, is apply a value filter greater than, and say, okay, total pay needs to be greater than 50,000, and press OK. That will eliminate a little bit of the noise and give me a more apples-to-apples comparison of the employees that I'm looking at now.
So, once we've done that, now Kimberly King-Stit is at the top of the list, she earned 63% of her pay from overtime. And there's our good friend Gary, right there at number two on the list. So, obviously at this point, the number of different paths that we could take with this analysis are pretty much infinite. Even just the number of ways we can filter and sort this employee list are essentially endless. You know, so for instance, if we wanted to only look at employees whose name starts with Steve, we can use a label filter to do that.
So, label filter begins with Steve. Now we're only looking at the Steves in our data set. But you may notice that now my 50,000 dollar value filter has been overwritten, so I've got some people in here who earned 14,000, you know, or 5,000. And, again, that's, like we talked about earlier in the course, because by default Pivot Table options will only allow you to have one set of filters applied to a column.
I can change that in my tools, I go into Options, Totals and Filters, Allow multiple filters per field, press OK. And now we still have our Steve label filter, so let's go ahead and reapply the greater than value filter, and say total pay is greater than 50,000 again, press OK, and now we've got both of our filters applied, we're looking at only total pay greater than 50, and only names that begin with Steve. Both these check marks indicate that both values have been applied.
So, at this point, I really don't care about employees only named Steve, so I'm going to clear all these filters from the employee name. And let's look at the data at a little bit of a higher level. So I'll pull employee name out, let's aggregate things by job title. So one thing that's interesting here, is that when you look at a metric like base pay, you know, it's tempting to look at this and say, oh wow, account clerks make a lot more money than accountants, for instance, but before you do that, you really need to think about what you're looking at here.
And in this case, the sum of base pay column is a column that's summarizing or adding together the base pays of any employee in the data set that falls into each of these buckets. So, my hypothesis would be that there are many more employees that are getting labeled as account clerks than accountants, which is why we see a base pay of 974,000 dollars for account clerks, and only 65,000 for accountants. And we can check that just by dragging employee name as secondary row labels right here into our view.
And that's exactly what's going on here, so we've got, it looks like 23 employees with a job title of account clerk, and we've got only one accountant, Carlito. So this 973 figure is adding all of the individual salaries of each of these 23 employees. So, pull employee out, and if we do want to get a better sense of apples-to-apples, earnings by job title, which, you know, I think is an interesting thing to look at here, what I can do is change the summarization mode from sum to average.
So rather than edit this one, I'm going to pull in a second instance of base pay so that we can compare. And this is the one that I'll change, the summarization to an average and format it as currency. So now it's taking the average of all 23 of those employees' salaries, and equating that to 42,000, which, in fact, is significantly lower than the accountant average salary of 65,392. So, kind of a subtle but really, really important clarification to make, you know, as you're interpreting numbers in Pivot Table.
From there now that we have our average field in place, we can sort our titles descending by that average of base pay, and now we get kind of a cleaner, more accurate list to make those comparisons. So we see titles like Department Head, Medical Examiner, Deputy Chief up at the top of the list, and then as you scroll down, you start to see more titles like Secretary, General Laborer, Inventory Clerk, Cashier, and so on, and so forth.
So, already some interesting, kind of findings and insights bubbling up. Next up, what I want to do is actually group some of these titles together, since there are quite a few areas of overlap. So I'll sort alphabetically, kind of scroll through to see kind of what these job titles are coming through as, and probably notice that there are a bunch of variations of similar types of job titles. So it looks like there are at least 10 or so airport jobs with slightly different titles, so what I want to do in this case, is create groupings that roll up the job titles to a slightly higher level, we call it job category as opposed to job title.
So to give you an example, why don't we start with some of these at the top of the list, and go through accountant-related titles, right click, group those together, and instead of Group One, call it Accounting. And if you recall, by default, this Pivot is set up as a compact view, meaning that both my title field and the new grouped field that I created, are all kind of nested in the same column. I don't want that to be the case, I want to be able to deal with both of those fields separately.
So I'll go into my tools, design options, report layout, outline form. And that just breaks out those two fields, so that now I can independently edit my original job title field and my new grouped field. So instead of Job Title Two, let's call this Job Category. Now we can kind of just continue the process like I did with the Accounting titles. You know, maybe I want these admin ones together, right click, group those. Instead of Group Two, call it Admin.
And then let's do one more with these airport jobs. I've got something like 12 different airport related titles in here, let's group those together, and call it Airport. So there you go, and now since I've changed to outline mode, if I want to roll up by category level now, I can just pull job title out, and organize the data by that new field that I've just created. So, Admins as a whole, which include those, I believe four roles or job titles that fall within it, you know, generated a total base pay of 2.273 million, and an average of 71,000.
Now, this manual approach to grouping works just fine, but as a side note for those who are interested, a more elegant and sufficient solution to do this would actually be to create a separate lookup table that matches all of those job titles to specific categories. And then from there using either lookup functions or data modeling tools, to actually integrate those job categories as a new field in the raw data itself. And that would save me the trouble of manually scrolling through and grouping values together in the Pivot.
But, that's a topic for another day, and another course. So there you have it, that's our starting point for exploring the San Francisco salary data. Go ahead and play around with it, do some exploring, give the homework a shot, and give me a shout if you have any questions.