Learn why to use Presto.
- [Instructor] There are four main reasons I think why you would want to use Presto. And I eluded to them a little bit in the previous clip about where Presto came from. The first one is speed. And I think this is where Presto really shines because when you're working with large data volumes speed can really slow the analysis process down. Many times you just generally accept that it isn't going to be possible with such large volumes of data. However this can cost your company a lot of money. If I'm a highly-paid data scientist making over six figures per year, and if you add up all the time I spent waiting for a query to finish and for me to get the answer I'm looking for, it will cost the company a ton of money over the years that I work there.
So speed is incredibly important and it is often worth whatever the cost is to speed up those queries. Especially if you have a data-centered company like a high tech organization from Silicon Valley. Another important factor when thinking about why you would use Presto is the open source nature of the platform itself. Big data really works best when you have an open platform that a large community can get behind and support. And as well as advance the platform without worrying about licensing restrictions or the proprietary nature of a vendor software.
Open source has won this battle. And Presto is obviously one of those systems that has a large community just like Hadoop and Hive and many of the other big data platforms out there. One of the features that is really interesting with Presto, and we'll talk a little more about this coming up, is the pluggable nature of the platform. That nature is one that will allow you to extend Presto to query your data no matter where it lives. So if your data is in a relational database, such as MySQL or Oracle, you can still use Presto to query it.
And in fact you can combine those data sources in a single query in Presto. So with this pluggable architecture, Presto becomes a query abstraction layer. Which let's you access all of your data sources regardless of the underlying data platform. Now this is incredibly powerful and it will save your analyst tons of times and eliminate the need to move the data between these systems to run your queries. The last reason I'll mention to use Presto is the scalable nature of it. Because it's a distributed engine, it allows you to scale as large as you need to.
And I'll talk a little bit more about this in a second regarding how Facebook, Netflix and Airbnb query their huge data volumes using this powerful platform.
Data science expert Ben Sullins helps you get up to speed with Presto, and leverage it to accomplish a wide-range of data science and analytics tasks. He uses different interfaces with Presto—such as R and Tableau—and digs into the expressive SQL language that Presto offers for your analysis. At the end of this course, you'll know the key concepts of Presto and how to use them to take full advantage of your modern big data system.
- What does Presto do?
- Running Presto
- Connecting from Tableau and R
- Connecting to Hive, MySQL, and the local system
- Retrieving data
- Combining data sources
- Basic SQL functions
- Advanced SQL functions
- Migrating from Hive