From the course: Big Data Analytics with Hadoop and Apache Spark
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Compression
From the course: Big Data Analytics with Hadoop and Apache Spark
Compression
- [Instructor] When storing big data compressing data is important as it saves significant disc space and hence the users operational cost. In this video I will review the various file compression options available. The most popular compression codecs available are Snappy, LZO, GZIP, and bzip2. You can also develop up your own codec if required. Snappy is a compression codec developed by Google. It provides moderate compression but excellent read/write performance. Snappy compresses the entire file as opposed to compressing it element by element. It is not splitable and hence not suitable for parallel operations. LZO is similar to Snappy in that it provides moderate compression and excellent processing performance. It can also be used to split files and hence has an advantage with parallel processing, but it requires a separate license that needs to be carefully evaluated for possible costs. GZIP is a popular codec that…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.