From the course: SQL Server Machine Learning Services: R

What is Machine Learning Services (MLS)?

From the course: SQL Server Machine Learning Services: R

Start my 1-month free trial

What is Machine Learning Services (MLS)?

- [Instructor] Let's kick things off with a discussion of exactly what Machine Learning Services means in the context of SQL Server, and how that relates to the R language. First, what is R? R is an open source statistical computing language, which is popular with data scientists and researchers. It provides a large number of functions that can be used to develop statistical and predictive models, perform classification studies, or clustering analysis, and generate publication quality graphs and plots. The R language is also highly extensible and additional packages can be installed, that customize the functionality to meet specific needs. In the past, in order to use R to analyze data stored in a SQL Server database, data scientists would need to export that data first. But that changed in 2016 when Microsoft added support for in-database processing with R, to all editions of SQL Server. In SQL Server 2017, support for the Python language joined the party, and in 2019 support for Java was added. Today, Microsoft collects all three of these languages under a new feature that they call Machine Learning Services. Now, while the name, "Machine Learning Services," imply support for developing advanced machine learning models with your data, and you can certainly do that with R, it's more simply the ability to run any scripts on your relational data, from basic arithmetic on up. And with a the large number of add-on packages that R supports, the doors are wide open to what you can do with this capability. So don't let the name, "Machine Learning Services," scare you off, that this is some high level complex and niche functionality. Support for R is flexible. It can be used by anyone from an entry level analyst, up to a seasoned pro with many years of R experience under their belt. R support in Machine Learning Services provides a number of important new capabilities, when it comes to processing large amounts of data. I mentioned that processing now occurs in database, and here are three reasons why that's significant. First, because data no longer needs to be extracted from the database, before it can be processed through a script, you can maintain the benefits of SQL Servers security wrapper. The data stays in the database, where access can be controlled, logged and audited. This is particularly advantageous when working with sensitive financial or personal data, where compliance with handling protocols needs to be strictly maintained. Second, copying large amounts of data out of a database could put significant strain on network resources. Because of this, in the past, many data analysts would only process a smaller representative sample of the data, rather than work with the full data set. Since the data no longer needs to leave SQL Server first, those restrictions are removed, which opens the doors to a fuller and more complete analysis. And finally, because the data stays in place, R scripts can take full advantage of the performance benefits brought by SQL Server technologies, such as in-memory tables and columnstore indexes. In order to run R code in SQL Server, you'll use the same query editors that you're probably already familiar with. You don't need any additional software or application development tools to get started. For most people, that means opening up a new query window in SQL Server Management Studio, or Azure Data Studio, and just start typing the code that you want to execute. Further, you can save your R scripts right inside of the database as stored procedures, so that other database users can easily execute your scripts without having to write the code themselves. And since R support is available in every edition of SQL Server, starting in 2016, even the free to use Express Edition, no matter the scale of your application, you can take advantage of R integration.

Contents