From the course: Learning Data Analytics: 1 Foundations

Cleaning data using Excel macros

From the course: Learning Data Analytics: 1 Foundations

Cleaning data using Excel macros

- You may find in an organization, they're using older methods of cleaning data. They may be using Excel Macros. It could be that that vowel just hasn't been touched in a while, or they haven't adopted tools like Power Query. It's important to know when you inherit a file, you inherit the knowledge, and the style of the person who is responsible for that file. However, macros are a great way to determine what needs to be cleaned, and also a great opportunity for process improvement. Okay. Let me show you how to record a few macros, and show you how to look inside a macro. Okay. I'm in my spending trends survey, and every time I download this data, I see the same type of information is structured exactly the same. The only difference is there's new data each time. Okay. So I have my developer tab turned on in my ribbon. You can go to customize your ribbon to turn it on if it's not there. I'll choose record macro. This will be step one. This step one will copy the sheet. Okay, I'll go ahead and right click the sheet. I'll choose move or copy. I'll create a copy. I'll move it to the end and I'll click OK. I'll go ahead and click back on sheet and stop recording. Again, I want to keep a copy of that sheet, so I can refer back to it to test my macros later. Now I'm ready to start the cleaning process. The very first thing I'm going to do is delete the columns I don't need. I'll start by recording a macro. I'll call this one, step two. I'll call it delete columns or delcal and make that plural. Okay, I don't need to collect her ID for my reporting. I also don't need the end date. I'll go ahead and highlight IP address, all the way to that custom data in G. I'll right click and delete that. Okay, great. This has me set up for exactly what I need. I'll go ahead and stop recording. I'll record my next step. Call this one step three, SAS data. Okay, I'll go ahead and hit that select all and double click my column headings. And that will size my data. I'll click back on a one. I'll stop recording. Okay. Again, every time I download this data it has row two. Row two is additional information about the headers. So I need to go ahead and address that in my cleaning. Okay. Let me go ahead and do record macro. I'll call this one step four, clean headers. I'll click OK. All right, I'll go ahead and copy this, self-describe and paste it there into E1. I'll go ahead and delete row two because I no longer need it. I'll go ahead and address these headers. I'll name them, age, gender, education. Perfect. All right, I'll go back over to A1 and then I'll stop recording. Now, I'm ready to test my macro, but at the same time, I actually want to record myself running my steps. So I'll go ahead and delete my sheet. I'll go ahead and name this back to sheet because that's what it'll be looking for is a sheet name named sheet. I'll go record my macro. This is called run all. I'll click OK. I'll go to macros. I'll run step one. You see I'll have sheet two down below. I'll go to macros again. I'll run step two. Do you see how it deleted my columns? I'll go to macros. I'll do step three. It sized my data. Perfect. I'll go to macros one last time and clean my headers. I'll choose run. Awesome. Then I can stop recording. Now going forward, if I want to run that macro, I can just run the run all. Let me show you that. Okay. I'll change the sheet name back to sheet. Again, because when I download it from my survey tool, that's what it's called. I'll go to macros. Run all is selected and I'll choose run. And just like that, I have clean data. So you can see why companies would adopt macros. However, with the advent of power query, these options are made much simpler. Let me show you how you can look at the inside of a macro so you can determine what cleaning steps have occurred. So I'll go to visual basic. I'll expand modules and I'll say module one. When I double click module one, it shows me which all columns, sheets, information I changed. These are a whole lot easier to read when you actually perform those steps, but you will notice it tells you, what's selected if it were copied, and was it pasted and where did it go? You can say where we change the name of things. Again, this is great information to have because you can apply it later. Or if you need tweak the macro, you can simply include it in the code. After 20 years of using Excel, and with data, I know you're going to encounter files with macros just knowing how to look inside the code will let you know what cleaning routines were performed, how to extend them or how to replace them.

Contents