Use SageMaker to create a Jupyter notebook instance. Create a starter notebook in Python.
- [Instructor] So as we get started working with SageMaker, we're going to work with Jupyter Notebooks. So if you're not familiar, these are based on an open standard, Apache Jupyter, and they're presented as web interfaces, so web pages. And they're designed as an alternative IDE, or integrated development environment. They are most commonly used by data scientists, or people working with machine learning. Why would you use something like a Jupyter Notebook? Well, I think of them as a more powerful terminal. You can document your data science experiment and you can document it very thoroughly, because Notebooks allow for you to work with text in the form of markdown, so you can annotate, or write in English what you're doing; code, code frameworks, and you can see the example Notebook I have here is set up to integrate with Python3, so you can actually run code, so like a terminal; and most importantly, visualizations.
This is really key. When we're working with so much data and complex algorithms, having the visual component is a key aspect of both presenting our findings and also as we're working with the data in our experiment to understand what our data looks like. Some aspects of SageMaker Jupyter Notebooks are the following. When you create an instance of a Jupyter Notebook, SageMaker will launch a machine learning compute instance and it's associated network interface.
It'll install Anaconda packages and libraries for a number of run times. It'll attach a five megabyte machine learning storage volume, kind of a scratch disk for you to work with your data. And it includes a large number of example Jupyter Notebooks with the included algorithms to help you to understand how to work with the algorithms that Amazon has optimized for use with SageMaker. So in order to work with Jupyter Notebooks, I'm in the Amazon consol at SageMaker.
And I'm going to click on the orange Create notebook instance. I'm going to give this a name of Demo. And you'll notice when I scroll down, I can select from three different instance types. I've run this previously, so when I did that I created the IAM role by default. Now when you're accessing services outside of SageMaker, and that's typically data, which could be in S3 if it's files, or it could be in, for example, Redshift if you have data, warehouse data.
You need to set up your IAM role with appropriate permissions for those external data stores. You can optionally associate your notebook with the VPC, and you can optionally encrypt. I'm going to click the orange Create notebook instance button. And now my instance is creating. I'm going to close this dialogue box, and you can see that the state is pending. It takes a couple of minutes for this to set up, so I'll come back once this is set up, and we'll open it and look at a Notebook.
Now our instance is in service, so let's go ahead and click Open. And we're in our Jupyter environment. We're going to click the dropdown by New, and you can see that we have a number of run times or environments available here. We have connectors for Spark, so Sparkmagic for PySpark and PySpark3, and regular Spark and SparkR. We have the libraries for MXNet, 27, 36, Python2, Python3, TensorFlow 27 and 36.
So that's pretty powerful in and of itself that we have all these run times available. So I'm going to go ahead and create a new folder, and rename it to Demo. And then open it. And inside of that folder, I'm going to create a new Python3 Notebook.
And I'm going to rename this HelloJupyter. And I'll start with my code cell here. And then when I'm done with my cell, I can press Shift + Enter, or I can click this button to run any cells. And there's my code output. Now if I wanted to insert a new cell, I could say Insert Cell Above.
And I can continue to work in Code, or I could switch this to Markdown. And then execute that by pressing Shift + Return, and that will render the Markdown. So when I'm done with this, I can work with it like a regular file, so I'm going to Save and Checkpoint. And notice that I can download this as an IPython Notebook, Python, and the rest of the file formats, as well.
Now it's a best practice when you're working with Jupyter Notebooks, to save your work. Typically, integrate with your source code repository, so I work with GitHub or Git, and this is a really important best practice when you are working with your machine learning models, because then, of course, you have a history of all of the iterations. If you're new to Jupyter Notebooks, another thing that I want to point out is we have this Kernel menu item here, and in certain cases you're going to need to either Interrupt, Restart or Reconnect, so on and so forth.
So you want to consult the Jupyter documentation to learn more about Notebooks if you're new to them. But it's just a tip from working with the real world, sometimes you need to clear out what is running in the Kernel in order for your Notebook to work properly, particularly when you're experimenting.
- Describe business scenarios that benefit from machine learning. Identify the different types of algorithms used in machine learning. Explain how Rekognition is used to predict image and video labels. Demonstrate how to use custom machine learning algorithms with SageMaker. Compare and contrast deep learning and traditional learning. Summarize how VariantSpark is used when working with genomic scale data.