Learn how to call into the library Math.NET to use its implementation of the Least Squares algorithm for fitting a line through our sample data and thus obtaining the slope.
- [Instructor] Now that we've finished extracting the data we want, we've filtered it, we've parsed the data, and so we have the data types we're looking for. Now, we will perform our calculation. And so for this example, in the unit tests, expand the list from the Test Explorer, there is a FitLineToFilteredSample unit test. Double-click. And so what you'll see is we're trying to do more than one thing here. And so in this unit test, we are going to retrieve data from a file again, the same sample file before.
We'll use the StreamReader object to do ReadLines for us. We'll say text.ReadLine to skip the first row. And then we want to do extract, transform, load. However, the calculation will be done with a library called MathNet and specifically, its numerics part of the library. So for the extract, same as before, we will read data. But now, we want to do extract, transform, and load, so that the data is in a shape that's useful to the MathNet library.
Now, remember, we have a new get library, MathNet.Numerics. Fit is the class and Line is the method. And this method expects two collections, two arrays of double precision floating point numbers. And so the first array is the X axis of our graph and the second array is the Y axis. And so we have to produce our data in this format for this library to be able to calculate the slope that we're looking for. So I'll comment this out temporarily.
So let's do our extract, transform, load, same as before. Now, you'll remember we have the WeatherData that can read a range, no problem, from the text. We'll read a range, start to end. However, that's just the first part of extraction. I want this data to be shaped slightly differently. So we'll use the link notation from weather object in that collection. This will be the extraction part. And we will say, select a new anonymous object with a Minute field, which will be equal to some value.
And wo.Barometric_Pressure as the second property. So just pay close attention to what's happening here. So on line 105, we are calling ReadRange and we will get an IEnumerable of all of the data in the file, but filtered based on the start and end dates. Now, for each of those, for each of the values in that IEnumerable, when we look at the value, we will retrieve not the underlying weather observation, but rather, a new object that doesn't have a name to its type, but it has these two properties.
One is called Minute and the other is called wo.Barometric_Pressure. And what do I mean with Minute? Well, when we're looking at the underlying data, the time stamps are down to the seconds. But we're not really interested in seconds, are we? Barometric pressure changing by the second may be an interesting science experiment, but it's not going to help us forecast the weather. So I'm not even sure I want it by the minute. I probably want to see a change in barometric pressure over a matter of hour. So let's change this to hours and say, you know what, we know the start date time.
So when we're looking at the weather observation object and its time stamp, which will be filtered to start like they're on a start date, and we say minus the start variable. These are two date times. And so the difference between two date times are a duration. And so we can ask the duration, which in .NET is called a TimeSpan object, for the total number of hours as a number. And so what's going to happen here is we have a new object with the property Hours, which is the difference between the start and the observation, so zero and upwards, as a number in hours, and the pressure at that point in time.
And so when we start doing our math, we will now start seeing the slope as the rate of change in barometric pressure over an hour or per hour. Now, that's close to what we want, but we're not quite there yet. So remember, we need those two arrays. One for an X, but I don't know how many values there are. And an array for Y.
And I will need to populate those two with the two values we just selected out of our IENumerable. So foreach, weather observation in the data. I say weather observation, but it is of this new type. I actually don't know what the data type is here. It's an anonymous type, but it is an object. So with the wo variable, I can access the hours and the pressure properties. And I can say arrX, add please, wo.Hours.
And arrY.Add, wo.Barometric_Pressure. And you can see here the naming is done for us. Hours, we had to name the property up here on line 108. Barometric pressure was the name of the property we specified for the second property of the anonymous object and so that is automatically the name of that second property. Now, having done that, we now have our two collections, our two arrays. So arrX, oh, but it's a list, so .ToArray.
And arrY, it also needs to be an array. And we're done. And so by doing this, what we're able to obtain is the result that we started by extracting it, transforming it, and then loading into a data structure of our choice. So the ETL part happens mostly with link. Now, do note, the way I have written this was done very carefully in this order, so that I only scan the file once.
You can change this code in interesting ways that cause that IENumerable to be enumerated more than once. I was very careful here to only have that happen once. So what does Line return? A tuple. If you're new to C Sharp or you haven't kept up, a tuple is sets of values treated as one. And so here I have the intersect and the slope coming back as a tuple, as a pair of values, two doubles.
And by making the assignment to this parenthesis pair with a var keyword in front, what I'm actually getting back from here, from the Line, is variable one and variable two. And so now, I can actually read the slope. Check.That, slope, IsNotZero. Not that I know what its value is at this time. We'll set a break point and find out. I'll set a break point at the end of the method.
Notice, we're still in the unit test. So FitLIneToFilteredSampleText. Right-click the unit test, debug. And we have an exception. The checked value is not zero. It is negative slope for the values specified. And so at least now, we know we have a value. And so our method ran successfully. Now, I haven't looked to see if that's the correct value, but it certainly is the value computed from the data we've specified.
So I'll stop. And I'll put in here IsLessThan zero. Clear the break point, right-click, run our test. And success. And so now, we've written a piece of code that essentially does extract, transform, and load. And I'll just clean up the formatting here. So we extract by using link. We go through the IENumerable, we apply our filtering.
We do transformation. And so in this case, we use a select new to create an anonymous object. And then we load into the data structure required by the method that's going to do the math. And we use the library MathNet.Numerics to actually fit the line to our data and tell us what the slope is for that selection of data. And now, we can move on to visualization.