We look at the demo app that will be used to perform VTune Lock and Wait analysis on. This Demo app is a sample that comes installed along with Intel VTune. Here we load it up in Visual Studio and ensure all of the project properties are set. We discuss some of the source code and look at the functions we saw earlier in the Top-Down Tree window. The pthread_mutex_lock and render_one_pixel.
- [Narrator] In the exercise files folder for this video, click on tachyon. Then tachyon again. Then vc10. And then let's double click the visual studio solution to open up Visual Studio. We're going to look at the source code briefly for this large project. On the left you'll see three projects. The first that I want to look at is tachyon common. This handles all of the prep work for 3D drawing. It's a very large project and I encourage you to look at the files individually if you are interested.
But this handles 3D rendering and simplifies all the prep work. The project that we're going to look at and optimize is the analyze locks file. For those following along, this application project is included inside the installation directory of Vtune. It comes with Vtune when you install it. So you can still follow along. Now there's this render one pixel function. This handles all the prep work.
Rgb, it has obstructive color, which tells us that each pixel is 24 bits in size. This will just handle all of the prep work needed to render a pixel to a specific coordinate on the display. It even has some anti-aliasing, which helps smooth out the pixels to decrease that stare-like manner that might occur. Scrolling down, we finally get to the draw task function.
It is here that we are going to optimize the application. We can see that there's this call to p thread mutex lock where it acquires the mutex object, rgb mutex. This is what each thread must acquire before it can enter the critical section, which is this highlighted region right here. This is where each thread accesses pixel calculations and then releases the mutex for the next thread in line to enter the critical section. For example, say this application uses four threads.
Three threads must wait while one thread acquires the mutex. It accesses the pixel calculation resources, draws through the screen, through render one pixel, and then gives up the rgb mutex to the next thread in line. This is also known as contention, where one thread must wait before accessing a shared valuable resource. This looks to make the application more serialized than parallel, but we will us the Vtune amplifier tool to confirm this. What we need to do, and what we're going to spend our optimization efforts on, is how this critical section is accessed by each thread.
By reducing the time each thread must wait to access this critical section, we will reduce the application's overall execution time.
By the end of this course you will know how to use the Locks and Waits analysis on your own application and improve the efficiency of parallel task execution on Windows.
- Installing VTune Amplifier
- Choosing options for the Locks and Waits analysis
- Working with the VTune Amplifier GUI
- Viewing the analysis summary
- Removing the lock
- Conducting lock-removed analysis
- Comparing results