This video distinguishes the subset of powerful analytical algorithms most useful for edge analytics applications.
- [Instructor] When we started building enterprise skill data lakes using big data technology, we opened up an entirely new world of insights coming from data-driving analytics. Before big data, when we primarily relied on data warehouses and data marts and also statistical packages for our analytics, we were usually constrained by a variety of factors such as data volumes, data structures, available computing power, disjoined data, and things along those lines. With a well-constructed data lake, though, we have the entire continuum of analytics at our disposal.
We can produce descriptive analytics to tell us what happened in the past as well as what is happening right now. Predictive analytics tell us what is likely to happen, and we can also change certain data, and in a sense, go back in time to see what probably might have happened if we had done something differently. We can turn our computing power loose on mountains of data in search of hidden patterns and causal relationships and other interesting and important discoveries. We could even bring all of this together with prescriptive analytics to use the data and algorithms to tell us what options we have, and then which options are best for us.
We're able to deliver this wide range of powerful analytical capabilities from data lakes because we can store, manage, and access incredibly large volumes of data, and then, depending upon the specific models and algorithms and factors such as the amount and variety of data being fed into those models, it's conceivable that our analytics could run for hours or even days until they produce a final result. You know, this might seem like a long time, but for certain use cases that are more strategic in nature then operational, when an immediate answer of some type is not necessarily called for, this can be perfectly normal and acceptable.
In fact, if we're engaged in exploratory or discovery analytics, in other words, turning our computing power loose on incredibly large volumes of data because we're trying to find that proverbial needle in the data haystack, it's not uncommon to expect that our analytics engines may be engaged in a given task for quite some time. In fact, it's also not all that unusual for some models to even come up empty. Maybe despite clues to the contrary, there aren't any important patterns or causal relationships to be found in all of that data.
It might not be the answer that we're looking for, but it's the correct one, which is why we perform our analytics in a very disciplined manner. So what does all of this have to do with edge analytics? Well, it's simple. Even though we may be getting accustomed to this world of analytical opportunities with data lakes, when it comes to edge analytics, though, we don't have that same broad, unencumbered realm of possibilities at our disposal. Now, this isn't necessarily a bad thing, just it's a statement of fact.
The more that we understand what we can and can't do or should and shouldn't try to do with our edge analytics, the better off we'll be. It illustrate the constraints of our edge analytics, let's again consider the autonomous vehicle, otherwise known as the self-driving car. In an autonomous vehicle, we have a number of sensors and cameras, radar, lasers that are all working nonstop to detect and process data, and then, as part of the closed loop system, direct various systems and subsystems on the vehicle to either do something or not do something as dictated by the current conditions.
Here are three simple examples of edge analytics in an autonomous vehicle and the types of responses we can expect, based on those analytics. Let's say that our sensors detect another car suddenly swerving into our lane just ahead of our vehicle. In other words, for whatever reason, somebody's cutting us off and doing so much too closely, the vehicle's reaction will be to brake or should be to brake and possibly, if the coast is clear, maybe to quickly change lanes away from the oncoming car. Or, suppose that the vehicle detects that the roads have suddenly become icy because of a winter storm.
The direction to the vehicle might be, or probably should be, to slow down until it's determined that the vehicle is now traveling at an ultra safe speed, given the road conditions. Now, suppose that our mapping system, in conjunction with the vehicle's camera, detects an unexpected road closure on our pre-programmed route, the reaction to the vehicle might be to reroute itself and take a different street or highway to reach the planned destination. Now, what do all three of these three detection response scenarios and hundreds of others for autonomous vehicles have in common? In every situation, the response needs to occur very quickly, almost immediately.
We don't have situations where a particular analytical model somewhere in that vehicle has hours or even days to come up with an appropriate response. Most of the time, we don't even have minutes. The cycle time from one or more sensors picking up data of interest until that data drives a response from the vehicle is often only seconds. But how do we make this happen? First, let's again look at that broad analytics continuum that our data lakes are capable of supporting.
But let's look at them from an edge analytics perspective. When it comes to edge analytics, we can take several major categories of analytics off of the table because they really don't apply much to what needs to occur. We might still have interest in our backwards-looking descriptive analytics, but we're going to leave the historical reporting and trending and other traditional business intelligence capabilities to the data lake rather than try to support those function at the edge. The same goes for our data-driven time traveling, where we change certain variables in the past and try to ascertain what different outcomes might have result from those changes.
You know, again, this is very important, but this is a class of analytics that is really better suited to the data lake rather than the edge analytics environment. Take also the case of the exploratory or discovery analytics, where we're looking for interesting patterns. For the most part, they should also be relegated to the data lake. Those are the models that may run for hours or days against very large volumes of data and many different data objects. Now, it's always possible that a limited subset of these analytics might apply at the edge, depending upon the particular business or function or scenario or use case, but again, for the most part, these classes of analytics really are best left to the data lake.
So what that does is leave us with three categories of analytics that are absolutely relevant at the edge, and for which we would build what we might call edge-appropriate models, something I'll get to in a moment. We definitely need to ascertain from raw data what is happening right now. With an autonomous vehicle, we need to know that road conditions have changed and what they are right now, what traffic congestion is like right now, if any of the systems in our vehicle have suddenly failed, and so on. We also want to predict that our braking system or our cooling system or some other system on the vehicle is likely to have problems in the very near future based on senor readings.
Or, maybe that a traffic jam is one mile ahead of us, which means that our mapping and routing system needs to adjust its timing, and that the vehicle should very soon reduce its speed. Because of the closed-loop nature of the autonomous vehicle in which all of those data will be processed and then result in actions by one or more subsystems of the vehicle itself, the idea of prescriptive analytics, options, and recommended actions is essential at the edge. Further, if we compare data lake analytics with edge analytics, we can see that, for the data lake, we can operate with theoretically unlimited volumes of data across numerous data objects.
We have a great deal of flexibility for how long our models may run based on the use case, the models themselves, the amounts of data, and other factors, and also, that our analytics ideally should end up with some prescriptive measure for options and actions. Unfortunately, that's not an immutable part of the data lake analytics, even though it should be. With edge analytics though, we may still deal with fairly large volumes of data, but not the mega volumes that a data-lake-driven model might process. Additionally, for any given model, we are likely to be dealing with a relatively bounded number of data objects.
Think of them as data fields in the database rather than the theoretically unlimited number of objects or fields that might be fed from a data lake into analytical models. Also then, when it comes to prescriptive analytics, which again, are the options and specific actions from among those options that need to be taken, well, this is a fundamental aspect to edge analytics rather than an idealized by not necessarily mandatory one. Remember also that, when it comes to edge analytics, we don't have a single pattern for our analytical models.
The use cases and the scenarios, depending upon the individual business and business process at work, that will drive all of the models. But the common factors that we've looked at here will all come into play to some extent.
- What is edge analytics?
- Comparing big data analytics and edge analytics
- Edge analytics frameworks
- Intelligent video
- Server technology
- Data management
- Vendor solutions for edge analytics
- Enterprise edge analytics scenarios