The Top-Down Tree tab shows the hotspot functions in the call tree, performance metrics for a particular function and is a very useful window to look at, especially when access to the source code is restricted. By inspecting the Top-Down Tree you can see how the application behaves.
- [Narrator] The topdown tree is very useful to look at it, especially when access to the source code is restricted. By expanding and inspecting the topdown tree in this window you can see how the application behaves. Let's expand this total by clicking on the drop-down arrow and perform a technique called call stack walking. By clicking on the drop-down arrows, we can see the functions that are being called by the application. If we keep expanding the main thread that handles the program execution, we eventually get taken to the main entry point of the application.
By clicking on the common main function here, we will eventually get taken to the main entry point of the application. If we expand this, we can see this video_main_loop. We can see it'll loop once and then we see this UserCallbackDispatcher, DispatchMessage, PeakMessage. What's happening here is that the Windows message pump system is passing messages to this thread's queue. The thread then take messages off the queue and then hands them off to its callback procedures. This is evident by the PeakMessage call, DispatchMessage call and the Callback procedure call.
This is how the application is handling input. Now if we go to the thread video at the very top and expand that, we see this tachyon_video_on_process call. If we expand that, we eventually get taken to this renderscene. This is how the graphics are being displayed. If we expand this, we can see the renderscene, followed by some traceregion calls.
Eventually we get to a TBB parallel_for_on_class_draw_task. TBB stands for Intel Threading Building Block. These are Intel APIs that help with parallel execution. If we expand this and keep expanding we eventually get to a draw_task call. This is how the graphics that we saw on the display are being drawn. If we expand this, we can see pthread_mutex_lock. Next frame.
render_one_pixel. render_one_pixel handles all the prep work needed to render a pixel to specific coordinate on the display. If we expand pthread_mutex_lock, we can see that it makes a call to enter the critical section. What this function tells us is that this is a function that acquires the synchronization object, which here, is the mutex. Mutex is what is used to access the critical section which is where the shared valuable resource is, that must be shared among the multiple threads.
So, a thread needs to first get the mutex object, needs to acquire it by making a call here and then it can enter the critical section. By optimizing how the mutex lock is acquired between the different threads, we can reduce the time that each thread must wait before it can enter the critical section. By reducing the time that each thread must wait, we will reduce the applications' overall execution time.
By the end of this course you will know how to use the Locks and Waits analysis on your own application and improve the efficiency of parallel task execution on Windows.
- Installing VTune Amplifier
- Choosing options for the Locks and Waits analysis
- Working with the VTune Amplifier GUI
- Viewing the analysis summary
- Removing the lock
- Conducting lock-removed analysis
- Comparing results