Learn what Presto does and its primary functions.
- [Narrator] Presto as I mentioned is a scalable query engine optimized for high-speed analytics on large data volumes. Specifically what Presto does is it enables you to query data where it lives. You no longer will need to move data between different systems or pull your data out of Hadoop and put it into a third party system just in order to run fast queries on it. You can query the data where it resides without having to move it and even combine multiple data sources in a single query. Now, this is a pretty unique feature and something that Presto does with exceeding capacity.
It also runs the queries in that MPP, that distributed fashion which makes things really fast. And when it comes to scale it's hard to imagine a company larger than Facebook that uses Presto as their main analytical engine. At Facebook, they're running over 30,000 queries every day from a 300-plus petabyte data warehouse, and they have over 1,000 employees every day using Presto to find answers on their huge data volumes. Now, that is of course unless you're Google and they use other languages, but primarily you're going to find no one out there with a larger data volume to query than Facebook.
Netflix also uses Presto and they have a bit smaller of a data warehouse around 25 petabytes. But one of the interesting things about how they use it is they query their data living in Amazon's S3 buckets. These are essentially network drives and the data there are stored in flat files. So as I mentioned, you can connect Presto to almost any type of data or data store. That includes flat files living on a network share. AirBnB is another one and they are actually pushing the envelope forward with Presto and really doing quite a bit here.
They have a much smaller data warehouse compared to Facebook and Netflix, but still 1.5 petabytes is quite large. And the additional software and tools they've built are really impressive. Let's dive over now to their website and take a look and see what they have going on. So here on AirBnB.io you can see all of the Open Source projects that they're making. And a few of them to point out. One is Superset which is a visual web-based platform that helps you analyze your data and actually create dashboards and interactive charts and graphs and everything else all in the web on this Open Source platform.
Also, if I go down I can find Airpal. Airpal is a Web UI for PrestoDB. And what it does is it allows you and your analysts specifically to run queries against Presto from the web without having to go through all the setup steps of having it installed locally with all the drivers and all that.
Data science expert Ben Sullins helps you get up to speed with Presto, and leverage it to accomplish a wide-range of data science and analytics tasks. He uses different interfaces with Presto—such as R and Tableau—and digs into the expressive SQL language that Presto offers for your analysis. At the end of this course, you'll know the key concepts of Presto and how to use them to take full advantage of your modern big data system.
- What does Presto do?
- Running Presto
- Connecting from Tableau and R
- Connecting to Hive, MySQL, and the local system
- Retrieving data
- Combining data sources
- Basic SQL functions
- Advanced SQL functions
- Migrating from Hive