Similar to the single threaded view, you get a summary of the results. You can immediately see a faster execution time, which is six seconds shorter than the single threaded version. You also see different Top Hotspots functions listed, as well as a different CPU histogram. With a multi-threaded application the CPU histogram becomes more interesting to interpret.
- [Instructor] Notice the total elapsed time is now 13 seconds. This is six seconds faster than what we saw in the single threaded version. Here we see that the CPU time is higher than the elapsed time, because this is a multi threaded application, or more threads are executing in parallel. This is shown by the total thread count, which we can see here is 14. If we expand the CPU time, we can see other time metrics. Next is the CPI rate, which stands for cycles per instruction. For example, if on a four course system, four instructions are issued per cycle, so a CPI of 0.25 is ideal.
However, other factors come into play, like non-retired instructions, due to branch mispredictions, or long latency memory issues. So here we are seeing a CPI rate of 0.5. A CPI of one is considered acceptable and I encourage you to carefully review your results by checking out this little question mark. And when you hover your mouse over it, a little yellow dialogue box pops up telling you more about that particular metric. Moving on to the Top Hotspots, we see different functions in the list. At the top, we can see alldiv and alldvrm.
These are used for dividing two long, long integers, which are 64 bits in length. And is a very costly operation. Both of these instructions take up most of the execution time, which is expected since we have that due work function being called in every task to keep the CPU busy for a fixed amount of time. Looking at the CPU usage histogram is a lot more interesting now with a multi threaded application. If we hover our mouse over the long bar, a little dialogue pops up showing us how long one core was being used for.
Here, we can see that one core was only being utilized for nine seconds. Then, two cores were being used for a total of two seconds. This is when two or more tasks could be executed in parallel, like por foundation and por driveway. And then, after the frame is built, we are able to execute three tasks in parallel. Which is why here, we can see that three cores were being utilized for a total of two seconds. The horizontal axis shows you the number of cores. This is a 40 core system, so we can see 40 over here to the right. On the vertical axis, we have the time that that number of cores was being used for.
So make sure you know how to interpret this chart. It comes in handy when optimizing multi threaded applications.
- Installing VTune Amplifier
- Exploring the single-threaded source code
- Analyzing single-threaded apps
- Analyzing multithreaded apps
- Identifying hotspots
- Comparing results of single-threaded vs. multithreaded analysis