Join Barton Poulson for an in-depth discussion in this video Calculus, part of Data Science Foundations: Fundamentals.
- [Voiceover] Calculus is in the foundation of a huge number of things in data science. Specifically, anytime you're dealing with rates of change and you're dealing with optimization, you need to be dealing with calculus. There are three places in particular that it comes in in data science. Number one is it's part of the statistical foundations of the practices, so for things like least squares regression and probability distributions, calculus is used to develop those. Also, for change, measuring quantities or rates that change over time is inherently a problem for calculus.
And then maxima and minima, any time you're trying to maximize or minimize something, you will generally be involving calculus. Now there are two different kinds of calculus that we generally need to be concerned with. The first is differential calculus. This is for rates of change at a specific time. Sometimes people refer to it as the calculus of change. There's also integral calculus, which is where you determine a quantity of something at a specific time given the rate of change. People sometimes call this the calculus of accumulation.
Let me show you a little bit how it works. Here I have a parabola, y is equal to x squared. And I can put a point here at x is equal to negative two. Now it's easy to find y, we just run through the x squared calculation and that gives us a value of four. But what is the slope at x equals negative two? You see, it's a curve, so the slope is constantly changing. We wanna do that, we have to get the derivative, and this is a very simple kind of derivative. This is the formula for it, where we're gonna be taking xn, that's x squared in our case.
And we're gonna replace the n's with twos to get the derivative. We do that there and we put it over here. And then here, we get two minus one is one. And then one disappears. And so the derivative of x squared is 2x. That gives us the slope at that particular point. To do that, we can then run it through, stick an x, run the calculations and we get a slope of negative four, which you can see with this line right here. And if we move the point across to x is equal to three, we can get the slope the same way, 2x comes to six, and there's the slope.
Now, in terms of real life, let's imagine that you have developed some sort of amazing online matchmaking service, and you want to find an ideal price to maximize your revenues. That's a problem for calculus. Here's what we're going to do, we're gonna say that you currently are selling annual subscriptions for 500 dollars. You can charge something like that for some of these services. And let's say you get 180 sales or new subscriptions per week. And you have some data that suggests that for every five dollars you discount from the price, you will get three more sales.
Also, because it's an online service and you're not manufacturing physical processes, let's say that the overhead increase is negligible. The question now is what is the best price to maximize revenue? Now, the way we're going to express price is 500 dollars, the current price, minus five dollars times d, which is units of discount. We use that d to keep units the same across price and sales. And then sales is equal to 180 new subscriptions per week, plus three for each unit of discount, d.
Now, let's come and see sales as a function of price, but to do that we need to rewrite the equations with an intercept and a slope. We'll start with the y-intercept. To do that, we need to know where the expected sales would be when the price is zero. Of course, we wouldn't price our service at that, but we need to know it for the mathematics. So how many units of discount do we need to get down to zero in terms of price? We'll solve this price equation for zero. First we subtract 500 from each side.
That cancels out, and then we divide each side by negative five dollars to isolate the d. This gives us a value of 100 for d. That is 100 discounts in steps of five dollars will get us a price of zero. That's where we're going to find the y-intercept. We can substitute that into the sales to get the actual value for the y-intercept. We take that value of d, the units of discount , plug it into the sales equation and see what the value would be when the price is zero. We'll put the 100 in the place of d, we'll multiply, and add.
That's that y-intercept, or the expect sales when price is at zero. Again, we wouldn't sell for that price, but it's necessary for finding the slope of the line. Now, the slope of the line is equal to the change in y, the vertical axis for each change in x, the horizontal axis. Sales is our y or outcome, while price is our x or predictor. And we can take the coefficients that each of them use for the discount to get the slope. So we take the three from sales and the negative five from price, and divide them to get the slope -0.6.
We can combine that with the intercept of 480 that we got earlier. And we get a final equation for sales as a function of price. This, in turn, allows us to get a formula for revenue, which is the goal of this exercise. Revenue is equal to sales times price. We can substitute in the equation for sales that we just got, and we get an equation for revenue that is expressed solely in terms of price. When you multiply through to get this. And now we can get the derivative because we're trying to maximize something.
That's the calculus part of this problem. The derivative of 480 times price is just 480. So the price part drops out. The derivative of the second part works just like the example we had earlier in finding the slope of a point on a curve. Move the two down here, multiply, and that's the derivative. Now we can take that and solve for zero. Here's why we wanna do that. Our original revenue equation gives an inverse parabola. Y, which is revenue in our example, is at a maximum when the slope of the curve is zero, because that means you're at the very top.
So here's a line where the slope is zero, which means it's totally flat. And we want to find the location for this dot, the point on the curve that meets the flat line where the slope is zero. So back to our equation, we set it equal to zero and we'll solve for price. Subtract 480 from both sides, then divide by -1.2 to isolate price. This gives us a price of 400 dollars as the point where revenue will be maximized. We can figure out the expected sales at that point by going to our equation that express sales in terms of price.
Plug in the 400, multiply, and subtract to get a value of 240 for the expected sales of new subscriptions each week. Let's see how this would affect our revenue. Here's the formula for our current model. Current revenue is equal to 180 times 500. And that gives us revenue of 90,000 dollars per week. We can compare that to our new model, and its expected revenue of 96,000. To get the improvement, let's divide the two numbers, and 1.07 means revenue should go up seven percent with the new pricing plan.
To recap, we lower the cost of the service by 20 percent from 500 dollars per year to 400. That's expected to increase weekly sales 33 percent from 180 new subscriptions per week to 240. And together, those changes would increase revenue by seven percent. Now, I've gone through this whole procedure manually and you can do the calculations by hand, but you'll be glad to know that you can also do them on the computer. I'll show you briefly how to do them in R. Here in R, what I'm going to come down and do is first write the formula for sales as a function of price.
And actually save that into an object called sales. And then I write another one, a formula for revenue, as a function of price and sales. And then what I'm going to do is I'm gonna make two graphs side by side, I do this little parameter manipulation to do that. And I'm going to graph sales as a function of price. So there's our first graph, let me zoom in on that. And there it is, you can see that as price goes up, sales goes down. And as prices goes down, sales goes up.
Now we're going to graph revenue as a function of price and sales. That's right here next to it. That's the curve, the inverse parabola that we saw earlier. We're gonna draw a line at the price for maximum revenue. Now, we happen to know that that's 400. We're going to draw a line where the slope of the revenue curve is zero. We happen to know that's 96,000. There it is, going across the top. We'll put a dot at that intersection. There you can see it. Now, I'm gonna restore the graphical parameters, you wanna do that whenever you mess with them.
And then I'm gonna do all the calculus with one simple move, I'm going to use a built-in function, optimize. I tell that I'm trying to optimize revenue, search somewhere between 100 dollars and 700 dollars, and we're looking for a maximum. And when I do that, you can see the ad it's giving me right here. The value of 400, that's our sales price to maximize our revenue at 9,600. That's the same answer we got through the hand calculations, working through the derivative and getting an optimal solution.
So, given this extended discussion, what conclusions can we reach? First, calculus is vital to data science. Second, it's the foundation of statistical analyses. And third, it's used directly for problems of optimization.
- The demand for data science
- Roles and careers
- Ethical issues in data science
- Sourcing data
- Exploring data through graphs and statistics
- Programming with R, Python, and SQL
- Data science in math and statistics
- Data science and machine learning
- Communicating with data