From the course: Spark for Machine Learning & AI
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Organizing data in DataFrames - Apache Spark Tutorial
From the course: Spark for Machine Learning & AI
Organizing data in DataFrames
- [Narrator] Before we start discussing MLLib, let's take a look at a commonly used data structure called dataframes. Now I'll start Spark. So first I'll show you where I am. I'm in the Spark bin directory. So I will issue the PySpark command. And while that's starting I just want to mention that dataframes are a table-like data structure. They have named columns. But dataframes are used in R and in the Python Pandis library. They're also used in Spark and they're similar to what's available in most Python and in R. Okay, looks like our PySpark interpreter is ready. Now I'm going to clear the screen by using Control, L. And that will give us a fresh screen to start with. This is a Mac and Linux command, but it does not work in windows. The first thing I want to do is load a text file, and this text file is available in the exercise files, so if you have access to exercise files, you can go ahead and follow along and load this file. And this is a file of employee data. So I'm going to…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.