From the course: Spark for Machine Learning & AI

Unlock the full course today

Join today to access over 22,700 courses taught by industry experts or purchase this course individually.

Components of Spark MLlib

Components of Spark MLlib - Apache Spark Tutorial

From the course: Spark for Machine Learning & AI

Start my 1-month free trial

Components of Spark MLlib

- [Instructor] The MLlib package has three types of functions. The first is machine learning algorithms. The set of algorithms currently includes algorithms for classifications, which is for categorizing something, such as a customer likely to leave for a competitor. Regression, which is used for predicting a numeric value like a home price. Clustering is used to group similar items together. Unlike classification, there are no predefined groups, so this is really useful when exploring data. And finally, there's topic modeling, which is a way to identify themes in a text. The second group is workflows. Workflow components help organize commonly used steps, like pre-processing operations and tuning. This makes it easy to run a sequence of steps repeatedly while varying some parameters of the process. Utilities are lower level functions that give you access to distributed linear algebra and statistics functions. In this essentials course, we'll concentrate our efforts on working with…

Contents