From the course: Big Data Analytics with Hadoop and Apache Spark

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Compression

Compression

From the course: Big Data Analytics with Hadoop and Apache Spark

Start my 1-month free trial

Compression

- [Instructor] When storing big data compressing data is important as it saves significant disc space and hence the users operational cost. In this video I will review the various file compression options available. The most popular compression codecs available are Snappy, LZO, GZIP, and bzip2. You can also develop up your own codec if required. Snappy is a compression codec developed by Google. It provides moderate compression but excellent read/write performance. Snappy compresses the entire file as opposed to compressing it element by element. It is not splitable and hence not suitable for parallel operations. LZO is similar to Snappy in that it provides moderate compression and excellent processing performance. It can also be used to split files and hence has an advantage with parallel processing, but it requires a separate license that needs to be carefully evaluated for possible costs. GZIP is a popular codec that…

Contents