The Intel VTune Amplifier XE performance profiling tool lets developers tune their software so that it runs faster, smoother and is more efficient in every way. The tool provides a rich set of performance insight into CPU and GPU performance, threading performance, bandwidth, caching and so much more. It is a very powerful tool that lets you visualize how your application performs and analyze it to improve it.
- [Instructor] Intel's VTune Amplifier Performance Profiling tool lets developers tune their software so that it runs faster, smoother, and more efficient in every way. The tool provides a rich set of performance insight into CPU and GPU performance. Threading performance, bandwidth, caching, and so much more. It is a very powerful tool that lets you visualize how your application performs, and analyze it to improve it. To get a better grasp on the features this powerful tool provides, let's scroll down to the bottom where it says Product Brief, and click to open the PDF.
In this PDF, we see what new features have been added to the tool. Going to the next page, under the Get the Data You Need section, we see Hotspots and Thread profiling with locks and waits analysis. This will be our area of focus for this course. From this initial list of types of data the tool can collect, you're able to gather the data you need for your application, and tune it accordingly. This tool has very low overhead while it's collecting data. This is due to the fact that Intel processors have an on-chip performance monitoring unit, called PMU.
Sampling resolution can be changed so that a sample is collected every one millisecond, but this will increase the overhead of the tool itself. For this course, we leave the sampling resolution at the default of 10 milliseconds. What this means is that every 10 milliseconds an interrupt will be triggered which goes and records the data that has been collected by the PMU. This is done by the sampling enabling product, commonly called SEP, driver. If we scroll down to the Tune the threading with locks and waits analysis, we can see red, yellow, green, and blue bars.
Let's spend a little bit more time and see what the locks and waits analysis is all about. This shows that concurrency analysis is used to show where an application has poor task parallelism. The application becomes serialized if the threads are waiting too long on synchronization objects, like a Mutex or a Semaphore. These objects are called locks. Performance greatly suffers when threads are waiting on these synchronization objects to be available, so that a critical resource that is shared among threads is available. While waiting to access the critical section to obtain this shared valuable resource, the processor is underutilized, and as a result, the execution time of the application is much longer than it really should be.
When the application's execution time is increased due to processor under-utilization, system resources are consumed for a longer period of time. This in turn has a domino-like effect where the application might not release memory as fast as it should, and the application consumes more power for a longer period of time. We will focus on the objects with long wait times which is indicated by the red bars in the VTune results.
By the end of this course you will know how to use the Locks and Waits analysis on your own application and improve the efficiency of parallel task execution on Windows.
- Installing VTune Amplifier
- Choosing options for the Locks and Waits analysis
- Working with the VTune Amplifier GUI
- Viewing the analysis summary
- Removing the lock
- Conducting lock-removed analysis
- Comparing results