Recognize that even when the number of variables is manageable, the number of values often isn't.
- Another problem with big data…is the number of variables that you have to analyze.…It can be very difficult to choose…a manageable number of variables…for an analysis when you have 10, 20, 50,…or even more variables available.…Even when you have only one or two…variables available, the sheer number of values…of one of those variables could be overwhelming.…For example, suppose that you have only…two variables in a sales database…as your product and price.…Analysis of the number of units sold…for each product seems a reasonable place to start…but what if you have 60 products?…In that case, a breakdown of the number sold…of each product on a percentage basis…might not tell you very much.…
You might even find that every product…on your list is responsible for one or two…percent of your total sales.…If you combine that sort of difficulty,…too many variables, or too many values…with a large number of sales records…in the analysis that you undertake…that uses the individual customer…or the individual sale as its unit of analysis,…
In this course, Conrad Carlberg explains how to carry out cluster analysis and principal components analysis using Microsoft Excel, which tends to show more clearly what's going on in the analysis. Then he explains how to carry out the same analysis using R, the open-source statistical computing software, which is faster and richer in analysis options than Excel. Plus, he walks through how to merge the results of cluster analysis and factor analysis to help you break down a few underlying factors according to individuals' membership in just a few clusters.
- Reviewing the problems created by an overabundance of data
- Understanding the rationale for clustering and principal components analysis
- Using Excel to extract principal components
- Using R to extract principal components
- Using R for cluster analysis
- Using Excel for cluster analysis
- Setting up confusion tables in Excel
- Using cluster analysis and factor analysis in concert