Start free trial Sign in

From the course: Scala Essential Training for Data Science

Advantages of parallel collections - Scala Tutorial

From the course: Scala Essential Training for Data Science

Start my 1-month free trial

Advantages of parallel collections

“

- [Instructor] Let's consider the advantages of parallel collections. Multi-core processors are common today. Many desktop machines have two or four cores and servers typically have multiple times as many. Scala makes it easy to take advantage of multiple cores and hyper-threaded processors with the use of parallel collections. A common programming practice is to use for-loops to process each element of a collection, one at a time. This works well for small collections, but when we have thousands or more items in a collection, the processing time can begin to add up. Like an assembly line, we can process data in a collection faster if we work on multiple elements at a time. A parallel collection is a collection that allows us to do just that. Let's consider a case where we have an array of 1000 numbers and we need to multiple each number by two. Let's say we use a for-loop and multiply each number one at a time. Then it will take, let's say, a thousand units of time. Now if we split the array in two, and process both collections at once, we could finish in 500 units of time. On a quad processor with hyper-threading, we could run eight processes in parallel and finish the task in 125 units of time. The primary advantage of using parallel collections is that it allows us to finish computation faster than we would with sequentially processed collections. Another advantage is ease of use. Other programming languages have support for parallel processing, but Scala makes parallel processing as easy as sequential processing. The overhead of using Scala parallel collections is fairly low. For some collection types, using the parallel collection version does not incur any noticeable overhead when compared to using the sequential version. Scala has a variety of parallel collection types, including the parallel array, or ParArray, ParVector, ParHashMap, and ParSet. Additional parallel collections are described in the Scala documentation. In our discussion here, we'll focus on using parallel arrays and parallel vectors.

Contents

- (Locked)
  
  Review of Scala for data science
  
  2m 2s