From the course: Cloud Hadoop: Scaling Apache Spark
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Spark architecture for genomics - Apache Spark Tutorial
From the course: Cloud Hadoop: Scaling Apache Spark
Spark architecture for genomics
- [Instructor] In this next scenario, we're going to look at genomic variant pipelininig that includes Hadoop and Spark. In earlier movies in this course, we talked about augmenting the Hadoop library, such as Spark, with additional open source or commercial libraries, and I actually showed and talked a little bit about ADAM for genomic processing. You may remember that the ADAM set of libraries, which wrap around Spark, include domain specific implementations of items, such as schemas for the incoming files, which are of a specific format. You can see SAM, BAM, or VCF. These files would be coming in from genomic sequencing machines, such as those made by Illumina. This is a simplified pipeline. You see the source files coming directly into Amazon S3, this is an Amazon implementation, and then the focus here is showing that the ADAM libraries are running on top of an Amazon EMR cluster, which is running Spark. In…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
-
(Locked)
Spark SQL8m 34s
-
(Locked)
SparkR6m 54s
-
(Locked)
Spark ML: Preparing data4m 21s
-
(Locked)
Spark ML: Building the model3m 50s
-
(Locked)
Spark ML: Evaluating the model3m 41s
-
(Locked)
Advanced machine learning on Spark1m 35s
-
(Locked)
MXNet25s
-
(Locked)
Spark with ADAM for genomics2m 5s
-
(Locked)
Spark architecture for genomics2m 1s
-
(Locked)
-
-
-