Learn how F# type providers can give strongly typed access to external data sources. See how the CSV type provider can let you access Quandl.com data in just a few lines of code.
- [Narrator] So how are we going to go and get all that data programatically? Well, we're going to need another project. Add new project. Let's go to another library. 461. It's going to be called "Data." As always, I'm going to remove everything in the scaffold. I'm going to paste some code in and just explain that to you. By the way, if you don't want to type this code, you can paste it from the file 03Snippets.txt, which is provided with the exercise files.
We've opened some name spaces. We've declared a couple of types. And we've got something to build our URL. Now, to get rid of those errors, I'm going to need to add references to some stuff within the project. Common and Parser. And I'm going to need to put this all in a module. Two data types are fairly self-explanatory, there's a Price data type, which simply has a date field and a closing price field, and a Prices data type, which has a ticker that the prices are for, and an array of price instances, so that's an array of this type here.
And then this build URL function is just a convenience thing. It takes a ticker, that's a string, and a from and an until date, that are date times, and builds up what is essentially exactly like that URL we looked at in the previous video, where we were able to download some data from Quandl.com. Up here, we've got a little convenient function called tilde-tilde, just for brevity, really. It takes the date/time and formats it appropriately for the URL. So that's all just convenience stuff.
So then the question arises, how are we going to get the data? Well, we could just download it using a web client, and then somehow parse the string, break it up on commas and quotes and that kind of thing. But that's a lot of work, and it's very error-prone. Instead, we're going to use a thing called a type-provider, which gives you strongly typed access to external data sources. Now there are all sorts of type providers, there's relational database type providers, there's an XML type provider. But we're going to use the CSV type provider. So just to reiterate, that will give us strongly typed access in our code to data that comes from CSV.
To use it, we're going to NuGet in a reference, to the fsharp.data. Verify we've got dependency behavior Highest, Install that. I forgot to rename this to Data.fs, let's do that while we're there. I'll need to open the fsharp.data name space. Using a type provider always follows a similar pattern.
You define a literal. I'll explain the reason for that in a second. And in the case of the CSV type provider, that is going to be the path of a sample file, which the type provider can use to deduce the schema of the dataset, so it's like a prototypical file that needs to look like all the others this type provider's ever going to encounter. We're going to put that somewhere near our source. I can use this built-in value called source_directory, and let's just call it StockPrices.CSV.
Let me just add an underscore there, that should make the squiggly go away. And the next step in using a type provider is to declare a type. Give it some name that's appropriate to the dataset you're dealing with. And make it derive from the CSV provider type which is in the fsharp.data.namespace, and then as a generic argument, you use the template literal that we defined just before. And that, by the way, is the reason it needs to be a literal because it needs to be visible at compile type.
Now we're getting a red squiggly there. And if you read that carefully, that's Cannot read sample CSV. Because we could not find the file. And that, of course, is because we haven't even got a file in there. So what we got to do, go to our downloads folder. And there's the file I downloaded in the previous demo from Quandl. Going to copy that. Go to our source directory, paste it in, and rename it.
Because the fact that it happens to be MSFT's prices isn't really relevant as long as it's a fairly generic dataset. In shape. And by the way, that red squiggly would've eventually gone away by itself. But I just did a recompile just to get rid of it quickly. Just to illustrate the fact that this stuff is being done at compile time, if I just deliberately mangle that file name slightly, the red squiggly will come back. So that tells you something about the way the type provider is working.
It's actually checking for the existence of that sample file at compile time. Okay, now, all that's a bit mysterious, but it'll become clear in a moment why we've done all that. Let's make ourselves a GetData function. You're just going to make an argument of query, which is the query class or the query type that comes from the past project before. We're going to build ourselves a URL. And that's going to use the ticker from the query.
The from date from the query, and the until date from the query. So I'm just going to verify the type of that. Yes, that's a string, so that's good. We're going to get ourselves some data. And we're going to use the Quandl type we defined just before. And we're going to use the load method from that, and we're going to pass in the URL that we defined just previously. So if we check the type of that, that's a CSV provider brackets, mystery.
So where's that actually getting it? Well, we say let prices equals, now watch what happens here, data dot rows. So this data object has mysteriously got all sorts of exciting things without really very much code at all. And if we have a look at rows, that's the sequence of something, and the sequence is kind of like a ghostly array that doesn't exist until you actually pull the rows from it. So I'm going to type that into the seek module, which a bit like the array module, has a bunch of functions you can use to iterate over the sequence and to find the largest value in the sequence, and that kind of thing.
We're going to use seek.map and a lambda function. And in the body of that lambda, we're going to build ourselves an instance of price. Date equals r dot. When I dot into that, look, I've got all the values from the CSV. I didn't have to do any special work for that, that all came magically from the schema by virtue of the fact that I created Quandl using the CSV provider with reference to an example file that had headings in.
So that all means, I've got a date field. And I've got a close field. And the only problem with the close field, if I hover over it, is that it's a decimal, which actually is a good way of representing money, but it's going to confuse some of the other stuff we do downstream from this, so I'm simply going to cast that into a double. That's the end of my record definition, and the end of my lambda. So let's hover over prices to see what's in there.
And it's a sequence of price objects, that's pretty good. So we're generating a sequence of these guys, but if we actually want it to be a real array, a concrete array, we simply say, Array dot ofSeq So I'll check that again. And now prices is exactly what we want to be, an array of prices. So that means we can go ahead and create an instance of the prices object by saying, remember, curly brackets almost always tells you you're dealing with a record definition.
Prices equals prices. So let's look at the signature of GetData function. And lo and behold, it does exactly what we want. It takes a query object and returns a prices object, and the prices objects has a ticker field and a prices field. Pretty good.
Released
12/20/2016- Defining values and calling functions in F#
- Defining and identifying discriminated unions
- Working with if-else expressions
- Writing unit test
- Using type providers to access data
- Analyzing data with collection functions
- Plotting data using the R type provider
- Using railway-oriented programming to handle errors
- Integrating with Twitter
- Deploying an F# application to Azure
Share this video
Embed this video
Video: Using the CSV type provider to get data