The Profile Window displays all the counter data collected during the profile. This data can be saved from the File->Save As menu and can also be reloaded at a later time. The data does not have to be reloaded on the same system, so you can easily send the profiled data to another developer so they can do their own analysis of the data.
- *.pro files store the counter results, counter information, state grouping info, Analysis, and system / frame information for complete interaction with the data when it is reloaded.
- *.csv files store the counter results and state grouping info. It cannot be reloaded.
Across the top of the Data tab are options for grouping draw calls by state bucket. By default, none of the options are selected, so that the top draw call in the list will be the most expensive draw call. Use the
Select All or
Clear All buttons to quickly set or clear the selected options. The actual state bucket options that appear are based on what API your application is using and what features or API calls are being made.
Clear calls do not have shaders as part of their state. Calls that clear render targets are given a state based on the render target being cleared and those that clear depth / stencil are given state based on the associated buffer.
Here is a list of the available options for each API:
- SrcResource - The input resource to a copy or resolve call.
- DestResource - The resource which is being copied or resolved into.
- RT - The set of bound render targets (must be bound in the same order to be considered matching).
- CS - The bound Compute Shader.
- VS - The bound Vertex Shader.
- HS - The bound Geometry Shader.
- DS - The bound Geometry Shader.
- GS - The bound Geometry Shader.
- PS - The bound Pixel Shader.
- Depth - The bound DepthStencilView if depth is enabled.
- Stencil - The bound DepthStencilView if stencil is enabled.
- PerfMarker - The current hierarchy of PerfMarkers.
- FBO - The ID of the bound framebuffer object.
- Program - The bound shader program.
- Depth - The bound depth attachment and whether or not depth is enabled.
- Stencil - The bound stencil attachment and whether or not stencil is enabled.
The per-draw profile results are grouped into state buckets based on the currently selected state bucket options. State Bucket 0 is reserved for draw calls which do not fall into one of the selected state bucket options (or if no options are selected). When the results are initially displayed, the data will be sorted by the first counter that was profiled, such that the highest value is at the top. If GPUTime is used, this means the top draw call in the list will be most expensive call. When state buckets options are selected, the table will be sorted hierarchically so that state buckets are sorted first, then the draw calls within each state bucket are sorted. Since draw calls which fall into the same state bucket share similar properties, it is often best to attempt to improve the performance of an expensive state bucket instead of a single expensive draw call, as this will yield the best results over the entire frame.
The rows in the table are color-coded such that state buckets are in a shade of orange, and draw calls are in a shade of blue.
State buckets can also be collapsed by clicking on the +/- sign representing the node in the tree. Alternatively, all the state buckets can expanded or collapsed via the context menu on the state bucket cells.
Some columns always appear in the results table, while the other columns are based on the selected counters.
- State Bucket - Shows a short name representing the state group or the name of the draw call. Hover the mouse over the state bucket nodes to see the states that define this State Bucket, or to see the parameters of the draw call.
- Draw Call # - For State Buckets, shows the number of draw calls within the state bucket; for draw calls, shows an index representing the order the calls were executed in.
The data in the state bucket rows are aggregated from all the draw calls within that group, with the exception that counters representing percentages are displayed as an average. This average depends on whether the "GPUTime" counter has been chosen. If "GPUTime" has not been selected, then this average will be non-weighted; in other words it will be the total of all percentages of that column, devided by the number of columns; it won't take into account the amount of time taken for each draw call. If the GPUTime counter is selected, the average will be weighted depending on the amount of time spent in the draw call. As an example, suppose there are 3 draw calls in an application. Two of them take 10ms and keep the GPU busy for 20% of the time and the third draw call takes 1000m but keeps the GPU busy for 80% of the time. The non-weighted GPU busy percentage would be (20+20+80) / 3 = 40%. The weighted calculation would take GPU time into account, leading to an average of (20 * 10) + (20 * 10) + (80 * 1000) / (1000 + 10 + 10) = 78.8%.
A screenshot demonstrating this more fully is shown below:
These values allow you to easily identify which stages are bottlenecking the entire state bucket.
If the API Trace or Frame Debugger is opened, selecting a draw call in the results table will also select that draw call in the other windows. Likewise, the results table will also highlight the selected draw call if it was changed by either of the other windows.
At the top right of the Data Tab is a drop down list for comparing two profiles. The first profile serves as base profile for comparison. By clicking New Profile a new profile is generated and the change between the base profile and the new profile is displayed as a percentage. The green color indicates improvement of the performance and the red color indicates deterioration of the performance. The first number is from the base profile and the second number is from the new profile.
Allows you to select which counter groups are displayed in the tables in the Data view. Only those groups from which a counter was enabled are shown in the list. This provides an easy way to filter out the data that may not be interest to what you are currently investigating.
- Show All - Selects all the available counter groups.
- Clear All - Deselect all the counter groups. Useful for deselecting them all prior to selecting a desired set.
- Hide empty columns - If this is checked, counters which returned no data (ie, 0) for all draw calls will be hidden.
Displays a brief analysis of whether the application is CPU or GPU bound and where most of the time is spent. This analysis is only performed if the
Frame Profiler option is enabled in the Settings Dialog.
a) System Information
Shows the type of graphics card and associated DeviceID of the card that the profile was taken on.
b) Frame Image
Shows a capture of the backbuffer of the frame that was profiled.