Cpu profiler windows




















The Back Traces tab shows the hierarchy of callers. Use this view to trace which methods call the selected method. Merge Callees tries to summarize all methods that are called by the selected one. Apply the changes and close the dialog. Flame Graph The Flame Graph tab shows you function calls and the percentage of time for execution of each call. Method List The Method List collects all methods in the profiled data and sorts them by cumulative sample time.

Last modified: 15 December Profiling for tests Memory profiler. Start by analyzing the thread that completed the activity, to determine how that thread spent most of its time: running, ready, or waiting. Significant running time indicates that direct CPU usage might have contributed to the duration of the critical path. Time spent in ready mode indicates that other threads contribute to the duration of the critical path by preventing a thread on the critical path from executing.

Each thread that readied the current thread is probably another link in the critical path, and can also be analyzed until the duration of the critical path is accounted for. The following procedure assumes that you have identified an activity in the Activities graph for which you want to find the critical path. You can identify the process that completed the activity by hovering over the activity in the Activities graph.

Zoom to the affected activity, and apply the Utilization by Process, Thread preset. Remove the Ready us [Max] and Waits us [Max] columns. Search for the NewThreadIds in the process that has the highest amount of time spent in Running, Ready, or Waiting state.

Threads that spend significant time in the Running or Ready states might represent direct CPU usage on the critical path. Note their thread IDs. Stacks that begin with KiDispatchInterrupt are not related to another thread. To determine what the thread was waiting for in these stacks, expand KiDispatchInterrupt and view the functions on the child stack.

KiTimerExpiration indicates that the readied thread was waiting for a timer. Repeat this step until you find a thread that is running, ready, waiting for another reason, or waiting on a different process.

If the thread is waiting on a different process, repeat this procedure by using that process. A search in the Issues pane shows that seven delay-type issues are reported for this activity. Each of these issues can be reviewed as a segment of the critical path. The following key segments are identified:. Thread of process TestBootStrapper. The issues show that this activity was delayed by 5. These delays contribute a large proportion of the activities overall 6. The TestBootStrapper.

In this case, Explorer. Expand the explorer. Sorting by the CPU Usage ms column shows a top child row of milliseconds. Sorting by the Ready us [Sum] column shows a top child row of 46ms. Sorting by the Waits us [Sum] column shows a top child row of milliseconds and a second row of milliseconds. Because these rows contribute significantly to the delay, you should investigate them further. In this example, the first thread spends most of its time waiting for the RunOnce.

You should investigate why the RunOnce. The second thread is waiting on the first thread, and is probably an insignificant link in the same wait chain. Repeat the steps in this procedure for RunOnce.

The primary contributing column is Waits us , and it has four possible contributors. Expand each contributor to see that the first three contributors are each waiting on the fourth contributor.

This situation makes the first three contributors insignificant to the wait chain. The fourth contributor is waiting on another process, TestBootStrapper.

Repeat the steps in this procedure for TestBootStrapper. The results are shown in the following three figures:. Thread spent approximately 1 second running, 2 seconds ready, and 1. Because this thread is also the readying thread for thread , the running and ready times probably contribute to the delay. The assessment reports the following issues whose times match the delays:.

To find other contributing issues, view the event for which thread was waiting. Expand ReadyThreadStack to view contributors to the 1. When you started to investigate the critical path, you analyzed the most significant wait cause in Explorer. To capture this previously disregarded section of the critical path, you must look at the timeline.

Filter to include only the processes identified as part of the critical path. The resulting graph is shown in Figure 36 Critical Path Timeline:.

Figure 36 Critical Path Timeline shows that Explorer. Zoom to the time period after the previously-analyzed wait chain and perform another analysis. In this case, analysis reveals a large number of threads that are internal to Explorer.

In this case, further analysis is not likely to yield actionable insights. Activities are often delayed because a thread on the critical path uses significant CPU time. By using the thread state model, you can see that this problem is characterized by a thread on the critical path that spends an exceptional amount of time in the Running state.

On some hardware, this heavy CPU usage can contribute to delays. Many assessments use heuristics to identify direct CPU usage-related problems. Significant CPU usage on the critical path is reported as an issue in the following form:.

Where P is the process that is running, A is the activity, and x is the time in seconds. If these issues are reported for an activity that incurs delays, direct CPU usage might be the cause. Zoom to an area of interest in the graph and select the Utilization by Process and Thread preset. By default, the table displays rows at the top that have the highest aggregate CPU usage. If the processes and threads that consume the most CPU correspond to any threads in the critical path, direct CPU usage is probably a factor.

This example is shown in Figure 37 Thread After you discover that direct CPU usage contributes to a delay on the critical path, you must identify the specific modules and functions that contribute to the delay. You can expand an assessment-reported direct CPU usage issue to display the critical path that is impacted by the direct CPU usage.

If you expand the node that is associated with the CPU usage, the stacks that are associated with the CPU usage and associated modules will display. If the assessment did not report an issue, or if you require additional verification, you can use the CPU Usage Sampled graph to manually collect information on the modules and functions that are involved in a CPU usage issue.

To do this, you must zoom to the area of interest and view the stacks that are sorted by CPU Usage. Zoom the timeline to display only the portion of the critical path that is affected by the CPU issue. Add the Stack column to the display, and then drag this column to the right of Thread ID left of the bar.

This puts the most interesting stacks on top. To extract a copy of the stack, select all the rows, right-click, and click Copy Selection. Direct CPU usage has higher impact on computers that have lower-end processors. In these cases, you can add more processing power to the computer. Or, you might be able to remove the problem modules from the critical path or from the system. If you can change the components, consider a redesign effort to achieve one of the following results:.

CPU usage by threads that are not on the critical path and that might be unrelated to the activity , can cause threads that are on the critical path to be delayed. The thread state model shows that this problem is characterized by threads on the critical path that spend an unusual amount of time in the Ready state. Many assessments use heuristics to identify interference-related problems.

These are reported in one of the following two forms:. Process P is starved. The starvation causes a delay to the impacted activity A of x ms. Process P is preempted. The preemption causes a delay to the impacted activity A of x ms. The first form reflects interference from threads at the same priority level as the thread on the critical path. The second form reflects interference from threads that are at a higher priority level than the thread on the critical path. If these types of issues are reported for a delayed activity, thread interference can be the cause.

Zoom to the interval and apply the Utilization by CPU preset. Apply the Utilization by Process, Thread preset and sort by the first Ready us column. This is the column that includes the Sum aggregation. Expand the process of the activity that is affected and look at the Ready time for threads on the critical path. This value is the maximum time that the delay can be reduced by resolving any thread Interference issue.

A value with a magnitude significant relative to the delay being investigated indicates that a thread interference problem exists. After the issue is identified, you must determine why the affected thread spent so much time in the Ready state. You must first determine whether the thread is restricted to certain processors.

Although you cannot directly obtain this information, you can examine the CPU usage history of a thread during periods of high CPU utilization. This is the period when threads tend to frequently switch between processors. The value in the Cpu column reflects the number of processors on which the thread ran during the current time interval.

If the value is less than the number of available processors, the thread is probably restricted to certain CPUs. To do this, you must identify the intervals that the thread spent in the Ready state and then examine what other threads or processes were running during those intervals.

Construct a graph that shows when the thread was in the Ready state and apply the Utilization by Process, Thread preset. Click OK. In this case, the thread spent significant time in the Ready state. To determine its typical priority, add an Average aggregation to the NewInPri column. This number indicates that it is probably a background thread that never receives priority elevations.

Select the relevant CPUs. Open the Advanced view and add a filter for the priority that you found earlier to filter out that thread. This scenario is shown in Figure 44 Thread Filter:. The middle graph shows the time that the thread was ready, and the bottom graph shows activity on the CPUs on which the thread was allowed to run in this case, Cpu1.

Zoom into a region where the thread was ready, but did not run, for most of the time during that interval. Threads or processes that have priorities that are equal to the target thread priority show time that the thread was starved. Threads or processes that have higher priority than the target thread priority show time that the thread was preempted.

You can calculate the total time that the thread was preempted by adding the times of all preemptive threads and actions. Figure 46 Usage by Priority When Target Thread was Ready shows that ms of the thread time were preempted, and ms of the thread time were starved.

This figure is zoomed to a ms interval. To determine which threads are responsible for the preemption and starvation of this thread, add the NewProcess column to the right of the NewInPri column and review the priority levels at which processes were running.

In this case, the preemption and starvation were primarily caused by another thread in the same process and by TestResidentApp. You can assume that these processes receive periodic priority elevations above their base priority.

You can resolve preemption or starvation issues by changing the configuration or components. Consider the following remedies:.

Change the time when the problematic processes run; for example, delay their start time to occur when the computer reboots. The Diagnostic Tools window appears automatically unless you have turned it off.

When the app finishes loading, the Summary view of the Diagnostics Tools appears. For more information on the events, see Searching and filtering the Events tab of the Diagnostic Tools window. You can only view this collected data when your application is halted at a breakpoint.

Now, you now have performance data for your application specifically for the region of code that runs between the two breakpoints. If you want to select a more specific region of code to analyze, select a region in the CPU timeline it must be a region that shows profiling data.

When trying to identify performance issues, take multiple measurements. Performance naturally varies from run-to-run, and code paths typically execute slower the first time they run due to one-time initialization work such as loading DLLs, JIT compiling methods, and initializing caches. By taking multiple measurements, you get a better idea of the range and median of the metric being shown, whichs allow you to compare the first time versus the steady state performance of an area of code.

We recommend that you begin analyzing your data by examining the list of functions under CPU Usage, identifying the functions that are doing the most work, and then taking a closer look at each one. Functions are listed in order starting with those doing the most work they're not in call order.

This helps you quickly identify the longest running functions. In this view, the selected function shows up in the heading and in the Current Function box GetNumber, in this example. The function that called the current function is shown on the left under Calling Functions , and any functions called by the current function are shown in Called Functions box on the right. You can select either box to change the current function.

This view shows you the total time ms and the percentage of the overall app running time that the function has taken to complete. Function Body also shows you the total amount of time and the percentage of time spent in the function body excluding time spent in calling and called functions. In this example, out of ms were spent in the function body, and the remaining 22 ms were spent in external code called by this function.

High values in Function Body may indicate a performance bottleneck within the function itself. To see a higher-level view showing the order in which the functions are called, select Call Tree from the drop-down list at the top of the pane. Total CPU indicates how much work was done by the function and any functions called by it.

High total CPU values point to the functions that are most expensive overall. Self CPU indicates how much work was done by the code in the function body, excluding the work done by functions that were called by it. High Self CPU values may indicate a performance bottleneck within the function itself.



0コメント

  • 1000 / 1000