Learn about the advantages of using Scala for data science tasks.
- [Instructor] Data scientists can use virtually any language for analytics, but Python and R are the most popular. Scala is a valuable language for data science because it is a language designed for scalability. This is especially important when working with large datasets. Scala runs on the Java Virtual Machine and therefore it can run anywhere Java runs. It uses both functional and object-oriented programming paradigms. Functional programming is a style of computation that uses functions to compute values and reduce the amount of state information that has to be maintained.
Scala also employs object-oriented techniques such as structuring programs around data and methods. Scala programs can work with relational databases using SQL from Scala. JDBC drivers that are used with Java can also be used with Scala programs for querying data and issuing database commands. Scala is designed to take advantage of multiple cores. Abstractions like parallel collections make it easy to parallelize computations over large datasets.
Apache Spark is a widely used big data analytics platform that's written in Scala. Although Spark supports Java, Python and R programs, Scala is a popular language for Spark applications that want to take full advantage of fast execution times. Now this concludes our brief look at the advantages of Scala for data science.
Dan also focuses on using Scala with Spark, a distributed processing platform. He first describes how to work with Resilient Distributed Datasets (RDDs)—a fundamental Spark data structure—and then explains how to use Scala with Spark DataFrames, a new class of data structure specially designed for analytic processing. He wraps up the course by providing a summary of advantages of using Scala for data science.
- The advantages of Scala for data science
- Scala data types
- Scala arrays, vectors, and ranges
- Parallel processing in Scala
- Mapping functions over parallel collections
- When and when not to use parallel collections
- Using SQL in Scala
- Scala and Spark RDDs
- Scala and Spark DataFrames
- Creating DataFrames