Want to get up and running with Apache Spark as soon as possible? This practical, hands-on course shows Python users how to work with Apache PySpark to leverage the power of Spark for data science.
- [Jonathan] Over the last couple of years Apache Spark has evolved into the big data platform of choice. It's used in startups all the way up to household names such as Amazon, eBay and TripAdvisor. There are a few really good reasons why it's become so popular. It's simple, it's fast and it supports a range of programming languages. If you know Python, then PySpark allows you to access the power of Apache Spark. Don't worry if you're a beginner. In my course on PySpark we'll be using real data from the city of Chicago as our primary data set.
We learn the basics of pulling in data, transforming it and joining it with other data. My aim is that by the end of this course you should be comfortable with using PySpark and ready to explore other areas of this technology. Hi, I'm Jonathan Fernandes and I work in big data and AI for a consultancy. I have used the concepts I am teaching in this course on a daily basis for several customers. I have created this course to get you learning and using Apache Spark as quickly as possible.
- Benefits of the Apache Spark ecosystem
- Working with the DataFrame API
- Working with columns and rows
- Leveraging built-in Spark functions
- Creating your own functions in Spark
- Working with Resilient Distributed Datasets (RDDs)