From the course: Data Science Foundations: Data Mining in Python
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Penguin dataset - Python Tutorial
From the course: Data Science Foundations: Data Mining in Python
Penguin dataset
- [Instructor] For our demonstrations of clustering, we're going to use a dataset about penguins, the flightless waterfowl from the Antarctic and Southern Hemisphere. To access this data set, you're going to need to install a library called Palmer Penguins. You might be able to use the shorter command uncomment it and run the cell. Or in my case, I use the longer, more explicit command, but it only needs to be done once per machine. And once you've had that installed, then you can go ahead and load the following libraries including this one that lets us load the penguins dataset. Now, we're going to do a few things here. We're going to load the dataset, we're going to remove some of the variables that are not helpful for what we're going to do. Rename the class variable, drop the rows that are NaN or not a number of missing data and then we'll take a look at the first five rows and that's what all of this are going to do.…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.