Timing | CSBusy DepthStencilTestBusy DSBusy GPUTime GPUBusy GSBusy HSbusy InterpBusy PrimitiveAssemblyBusy PSBusy ShaderBusy ShaderBusyVS ShaderBusyGS ShaderBusyPS ShaderBusyHS ShaderBusyDS ShaderBusyCS TessellatorBusy TexUnitBusy VSBusy |
VertexShader | VertexMemFetched VertexMemFetchedCost VSALUBusy VSALUEfficiency VSALUInstCount VSALUTexRatio VSSALUBusy VSSALUInstCount VSTexBusy VSTexInstCount VSVALUBusy VSVALUInstCount VSVerticesIn |
HullShader * | HSALUBusy HSALUEfficiency HSALUInstCount HSALUTexRatio HSTexBusy HSTexInstCount HSPatches HSSALUBusy HSSALUInstCount HSVALUBusy HSVALUInstCount |
GeometryShader | GSALUBusy GSALUEfficiency GSALUInstCount GSALUTexRatio GSExportPct GSPrimsIn GSSALUBusy GSSALUInstCount GSTexBusy GSTexInstCount GSVALUBusy GSVALUInstCount GSVerticesOut |
PrimitiveAssembly | ClippedPrims CulledPrims PAStalledOnRasterizer PrimitivesIn PAPixelsPerTriangle |
DomainShader * | DSALUBusy DSALUEfficiency DSALUInstCount DSALUTexRatio DSTexBusy DSTexInstCount DSVerticesIn |
PixelShader | PSALUBusy PSALUEfficiency PSALUInstCount PSALUTexRatio PSExportStalls PSPixelsIn PSPixelsOut PSSSALUBusy PSSALUInstCount PSTexBusy PSTexInstCount PSVSALUBusy PSVALUInstCount |
ComputeShader * | CSALUBusy CSALUFetchRatio CSALUInsts CSALUPacking CSALUStalledByLDS CSCacheHit CSCompletePath CSFastPath CSFetchInsts CSFetchSize CSGDSInsts CSLDSBankConflict CSLDSFetchInsts CSLDSWriteInsts CSMemUnitBusy CSMemUnitStalled CSPathUtilization CSSALUBusy CSSALUInsts CSTexBusy CSThreadGroups CSThreads CSVALUBusy CSVALUInsts CSVALUUtilization CSVFetchInsts CSVWriteInsts CSWaveFronts CSWriteInsts CSWriteSize |
TextureUnit | TexAveAnisotropy TexCacheStalled TexCostOfFiltering TexelFetchCount TexMemBytesRead TexMissRate TexTriFilteringPct TexVolFilteringPct |
TextureFormat | Pct64SlowTexels Pct128SlowTexels PctCompressedTexels PctDepthTexels PctInterlacedTexels PctTex1D PctTex1Darray PctTex2D PctTex2Darray PctTex2DMSAA PctTex2DMSAAArray PctTex3D PctTexCube PctTexCubeArray PctUncompressedTexels PctVertex64SlowTexels PctVertex128SlowTexels PctVertexTexels |
DepthAndStencil | HiZQuadsCulled HiZTilesAccepted PostZQuads PostZSamplesFailingS PostZSamplesFailingZ PostZSamplesPassing PreZQuadsCulled PreZSamplesFailingS PreZSamplesFailingZ PreZSamplesPassing PreZTilesDetailCulled ZUnitStalled |
ColorBuffer ** | CBMemRead CBMemWritten CBSlowPixelPct |
ClippedPrims | The number of primitives that required one or more clipping operations due to intersecting the view volume or user clip planes. |
CulledPrims | The number of culled primitives. Typical reasons include scissor, the primitive having zero area, and back or front face culling. |
DepthStencilTestBusy | Percentage of GPUTime spent performing depth and stencil tests. |
GPUBusy | Percentage of time GPU was busy |
GPUTime | Time this API call took to execute on the GPU in milliseconds. Does not include time that draw calls are processed in parallel. |
GSALUBusy | The percentage of GPUTime ALU instructions are being processed by the GS. |
GSALUEfficiency | ALU vector packing efficiency. Values below 70 percent indicate that ALU dependency chains may be preventing full utilization of the processor. |
GSALUInstCount | Average number of ALU instructions executed in GS. Affected by flow control. |
GSALUTexRatio | The ratio of ALU to texture instructions in the GS. This can be tuned appropriately to match the target hardware. |
GSPrimsIn | The number of primitives passed into the GS. |
GSTexBusy | The percentage of GPUTime texture instructions are being processed by the GS. |
GSTexInstCount | Average number of texture instructions executed in GS. Affected by flow control. |
GSVerticesOut | The number of vertices output by the GS. |
HiZQuadsCulled | Percentage of quads that did not have to continue on in the pipeline after HiZ. They may be written directly to the depth buffer, or culled completely. Consistently low values here may suggest that the Z-range is not being fully utilized. |
HiZTilesAccepted | Percentage of tiles accepted by HiZ and will be rendered to the depth or color buffers. |
PAStalledOnRasterizer | Percentage of GPUTime that primitive assembly waits for rasterization to be ready to accept data. This roughly indicates for what percentage of time the pipeline is bottlenecked by pixel operations. |
Pct128SlowTexels | Percentage of texture fetches from a 128-bit texture (slow path). There are also 128-bit formats that take a fast path; they are included in PctUncompressedTexels. |
PctCompressedTexels | Percentage of texture fetches from compressed textures. |
PctDepthTexels | Percentage of texture fetches from depth textures. |
PctInterlacedTexels | Percentage of texture fetches from interlaced textures. |
PctTex1D | Percentage of texture fetches from a 1D texture. |
PctTex1DArray | Percentage of texture fetches from a 1D texture array. |
PctTex2D | Percentage of texture fetches from a 2D texture. |
PctTex2DArray | Percentage of texture fetches from a 2D texture array. |
PctTex2DMSAA | Percentage of texture fetches from a 2D MSAA texture. |
PctTex2DMSAAArray | Percentage of texture fetches from a 2D MSAA texture array. |
PctTex3D | Percentage of texture fetches from a 3D texture. |
PctTexCube | Percentage of texture fetches from a cube map. |
PctUncompressedTexels | Percentage of texture fetches from uncompressed textures. Does not include depth or interlaced textures. |
PostZQuads | Percentage of quads for which the pixel shader will run and may be postZ tested. |
PostZSamplesFailingS | Number of samples tested for Z after shading and failed stencil test. |
PostZSamplesFailingZ | Number of samples tested for Z after shading and failed Z test. |
PostZSamplesPassing | Number of samples tested for Z after shading and passed. |
PreZQuadsCulled | Percentage of quads rejected because they were not actually covered by a primitive. High values here suggest that very small primitives were being rendered and a lower mesh LOD could improve performance. |
PreZSamplesFailingS | Number of samples tested for Z before shading and failed stencil test. |
PreZSamplesFailingZ | Number of samples tested for Z before shading and failed Z test. |
PreZSamplesPassing | Number of samples tested for Z before shading and passed. |
PreZTilesDetailCulled | Percentage of tiles rejected because the associated prim had no contributing area. |
PrimitiveAssemblyBusy | Percentage of GPUTime that primitive assembly (clipping and culling) is busy. High values may be caused by having many small primitives; mid to low values may indicate pixel shader or output buffer bottleneck. |
PrimitivesIn | The number of primitives received by the hardware. |
PSALUBusy | The percentage of GPUTime ALU instructions are being processed by the PS. |
PSALUEfficiency | ALU vector packing efficiency. Values below 70 percent indicate that ALU dependency chains may be preventing full utilization of the processor. |
PSALUInstCount | Average number of ALU instructions executed in PS. Affected by flow control. |
PSALUTexRatio | The ratio of ALU to texture instructions in the PS. This can be tuned appropriately to match the target hardware. |
PSExportStalls | Percentage of GPUTime that PS output is stalled. Should be zero for PS or further upstream limited cases; if not zero, indicates a bottleneck in late z testing or in the colour buffer. |
PSPixelsIn | The number of pixels processed by the PS. Does not count pixels culled out by early z or stencil tests. |
PSPixelsOut | The number of pixels exported from shader to colour buffers. Does not include killed or alpha tested pixels. If there are multiple rendertargets, each receives one export, so this will be 2 for 1 pixel written to two RTs. |
PSTexBusy | The percentage of GPUTime texture instructions are being processed by the PS. |
PSTexInstCount | Average number of texture instructions executed in PS. Affected by flow control. |
TexAveAnisotropy | The average degree of anisotropy applied. A number between 1 and 16. The anisotropic filtering algorithm only applies samples where they are required (e.g. there will be no extra anisotropic samples if the view vector is perpendicular to the surface) so this can be much lower than the requested anisotropy. |
TexCacheStalled | Percentage of GPUTime the texture cache is stalled. Try reducing the number of textures or reducing the number of bits per pixel (ie, use compressed textures) if possible. |
TexCostOfFiltering | The effective cost of all texture filtering. Percentage indicating the cost relative to all filtering being done as bilinear. Should always be greater or equal to 100 percent. Significantly higher values indicate heavy usage of trilinear or anisotropic filtering. |
TexelFetchCount | The total number of texels fetched. This includes all shader types, and any extra fetches caused by trilinear filtering, anisotropic filtering, color formats, and volume textures. |
TexMemBytesRead | Texture memory read in bytes. |
TexMissRate | Texture cache miss rate (bytes/texel). A normal value for mipmapped textures on typical scenes is approximately (texture_bpp / 4). For 1:1 mapping, it will be texture_bpp. |
TexTriFilteringPct | Percentage of pixels that received trilinear filtering. Note that not all pixels for which trilinear filtering is enabled will receive it (e.g. if the texture is magnified). |
TexUnitBusy | Percentage of GPUTime the texture unit is active. This is measured with all extra fetches and any cache or memory effects taken into account. |
TexVolFilteringPct | Percentage of pixels that received volume filtering. |
VertexMemFetched | Number of bytes read from memory due to vertex cache miss. |
VSALUBusy | The percentage of GPUTime ALU instructions are being processed by the VS. |
VSALUEfficiency | ALU vector packing efficiency. Values below 70 percent indicate that ALU dependency chains may be preventing full utilization of the processor. |
VSALUInstCount | Average number of ALU instructions executed in the VS. Affected by flow control. |
VSALUTexRatio | The ratio of ALU to texture instructions in the VS. This can be tuned appropriately to match the target hardware. |
VSTexBusy | The percentage of GPUTime texture instructions are being processed by the VS. |
VSTexInstCount | Average number of texture instructions executed in VS. Affected by flow control. |
VSVerticesIn | The number of vertices processed by the VS |
ZUnitStalled | Percentage of GPUTime the depth buffer spends waiting for the color buffer to be ready to accept data. High figures here indicate a bottleneck in color buffer operations. |
GSBusy | Percentage of GPUTime that GS is busy. |
InterpBusy | Percentage of GPUTime that the interpolator is busy. |
PSBusy | Percentage of GPUTime that PS is busy. |
VSBusy | Percentage of GPUTime that VS is busy. |
CBMemRead | Number of bytes read from the color buffer. |
CBMemWritten | Number of bytes written to the color buffer. |
Pct64SlowTexels | Percentage of texture fetches from a 64-bit texture (slow path). There are also 64-bit formats that take a fast path; they are included in PctUncompressedTexels. |
PctTexCubeArray | Percentage of texture fetches from a cube map array. |
PctVertex64SlowTexels | Percentage of texture fetches from a 64-bit vertex texture (slow path). There are also 64-bit formats that take a fast path; they are included in PctVertexTexels. |
PctVertex128SlowTexels | Percentage of texture fetches from a 128-bit vertex texture (slow path). There are also 128-bit formats that take a fast path; they are included in PctVertexTexels. |
PctVertexTexels | Percentage of texture fetches from vertex textures. |
VertexMemFetchedCost | The percentage of GPUTime that is spent fetching from vertex memory due to cache miss. Improve vertex reuse or use smaller vertex formats to reduce this cost. |
CBSlowPixelPct | Percentage of pixels written to the color buffer using a half-rate or quarter-rate format. |
PAPixelsPerTriangle | The ratio of rasterized pixels to the number of triangles after culling. This does not account for triangles generated due to clipping. |
ShaderBusy | Percentage of GPUTime that the shader unit is busy. |
ShaderBusyGS | Percentage of work done by shader units for GS. |
ShaderBusyPS | Percentage of work done by shader units for PS. |
ShaderBusyVS | Percentage of work done by shader units for VS. |
CSWriteInsts | The average number of write instructions executed in compute shader per execution. Affected by flow control. |
CSBusy | The percentage of time the ShaderUnit has compute shader work to do. |
DSBusy | The percentage of time the ShaderUnit has domain shader work to do. |
GSBusy | The percentage of time the ShaderUnit has geometry shader work to do. |
HSBusy | The percentage of time the ShaderUnit has hull shader work to do. |
PSBusy | The percentage of time the ShaderUnit has pixel shader work to do. |
VSBusy | The percentage of time the ShaderUnit has vertex shader work to do. |
CSFetchSize | The total kilobytes fetched from the video memory. This is measured with all extra fetches and any cache or memory effects taken into account. |
CSGDSInsts | The average number of instructions to/from the GDS executed per work-item (affected by flow control).. |
CSMemUnitBusy | The percentage of GPUTime the memory unit is active. The result includes the stall time (MemUnitStalled). This is measured with all extra fetches and writes and any cache or memory effects taken into account. Value range: 0% to 100% (fetch-bound). |
CSMemUnitStalled | The percentage of GPUTime the memory unit is stalled. Try reducing the number or size of fetches and writes if possible. Value range: 0% (optimal) to 100% (bad). |
CSSALUBusy | The percentage of GPUTime scalar ALU instructions are processed. Value range: 0% (bad) to 100% (optimal). |
CSSALUInsts | The average number of scalar ALU instructions executed per work-item (affected by flow control). |
CSThreadGroups | Total number of thread groups. |
CSVALUBusy | The percentage of GPUTime vector ALU instructions are processed. Value range: 0% (bad) to 100% (optimal). |
CSVALUInsts | The average number of vector ALU instructions executed per work-item (affected by flow control). |
CSVALUUtilization | The percentage of active vector ALU threads in a wave. A lower number can mean either more thread divergence in a wave or that the work-group size is not a multiple of 64. Value range: 0% (bad), 100% (ideal - no thread divergence). |
CSVFetchInsts | The average number of vector fetch instructions from the video memory executed per work-item (affected by flow control). |
CSVWriteInsts | The average number of vector write instructions to the video memory executed per work-item (affected by flow control). |
CSWaveFronts | The total number of wavefronts used for the CS. |
CSWriteSize | The total kilobytes written to the video memory. This is measured with all extra fetches and any cache or memory effects taken into account. |
GSExportPct | The percentage of GS work that is related to exporting primitives. |
GSSALUBusy | The percentage of GPUTime scalar ALU instructions are being processed by the GS. |
GSSALUInstCount | Average number of scalar ALU instructions executed in the GS. Affected by flow control. |
GSVALUBusy | The percentage of GPUTime vector ALU instructions are being processed by the GS. |
GSVALUInstCount | Average number of vector ALU instructions executed in the GS. Affected by flow control. |
HSSALUBusy | The percentage of GPUTime scalar ALU instructions are being processed by the HS. |
HSSALUInstCount | Average number of scalar ALU instructions executed in the HS. Affected by flow control. |
HSVALUBusy | The percentage of GPUTime vector ALU instructions are being processed by the HS. |
HSVALUInstCount | Average number of vector ALU instructions executed in the HS. Affected by flow control. |
PSSALUBusy | The percentage of GPUTime scalar ALU instructions are being processed by the PS. |
PSSALUInstCount | Average number of scalar ALU instructions executed in the PS. Affected by flow control. |
PSVALUBusy | The percentage of GPUTime vector ALU instructions are being processed by the PS. |
PSVALUInstCount | Average number of vector ALU instructions executed in the PS. Affected by flow control. |
VSSALUBusy | The percentage of GPUTime scalar ALU instructions are being processed by the VS. |
VSSALUInstCount | Average number of scalar ALU instructions executed in the VS. Affected by flow control. |
VSVALUBusy | The percentage of GPUTime vector ALU instructions are being processed by the VS. |
VSVALUInstCount | Average number of vector ALU instructions executed in the VS. Affected by flow control. |
DSALUBusy | The percentage of GPUTime ALU instructions are being processed by the DS. |
DSALUEfficiency | ALU vector packing efficiency. Values below 70 percent indicate that ALU dependency chains may be preventing full utilization of the processor. |
DSALUInstCount | Average number of ALU instructions executed in the DS. Affected by flow control. |
DSALUTexRatio | The ratio of ALU to texture instructions. This can be tuned appropriately to match the target hardware. |
DSTexBusy | The percentage of GPUTime texture instructions are processed by the DS. |
DSTexInstCount | Average number of texture instructions executed in DS. Affected by flow control. |
DSVerticesIn | The number of vertices processed by the DS. |
HSALUBusy | The percentage of GPUTime ALU instructions are processed by the HS. |
HSALUEfficiency | ALU vector packing efficiency. Values below 70 percent indicate that ALU dependency chains may be preventing full utilization of the processor. |
HSALUInstCount | Average number of ALU instructions executed in the HS. Affected by flow control. |
HSALUTexRatio | The ratio of ALU to texture instructions. This can be tuned appropriately to match the target hardware. |
HSPatches | The number of patches processed by the HS. |
HSTexBusy | The percentage of GPUTime texture instructions are processed by the HS. |
HSTexInstCount | Average number of texture instructions executed in HS. Affected by flow control. |
ShaderBusyDS | Percentage of work done by shader units for DS. |
ShaderBusyHS | Percentage of work done by shader units for HS. |
TessellatorBusy | Percentage of time the tessellation engine is busy. |
CSALUBusy | The percentage of GPUTime ALU instructions are processed by the CS. |
CSALUInsts | The number of ALU instructions executed in the CS. Affected by flow control. |
CSALUPacking | ALU vector packing efficiency. Values below 70 percent indicate that ALU dependency chains may be preventing full utilization of the processor. |
CSALUStalledByLDS | The percentage of GPUTime ALU units are stalled by LDS input queue being full and output queue is not ready. If there are LDS bank conflicts, reduce it. Otherwise, try reducing the number of LDS accesses if possible. |
CSALUFetchRatio | The ratio of ALU to fetch instructions. This can be tuned appropriately to match the target hardware. |
CSCachehit | The percentage of fetches from the global memory that hit the L1 cache. |
CSCompletePath | The total bytes read and written through the CompletePath. This includes extra bytes needed for addressing, atomics, etc. This number indicates a big performance impact (higher number equals lower performance). Reduce it by removing atomics and non 32-bit types, or move them into a second shader. |
CSFastPath | The total bytes written through the FastPath (no atomics, 32-bit type only). This includes extra bytes needed for addressing. |
CSFetchInsts | Average number of fetch instructions executed in compute shader per execution. Affected by flow control. |
CSTexBusy | The percentage of GPUTime texture instructions are being processed by the CS. |
CSThreads | The number of CS threads processed by the hardware. |
CSLDSBankConflict | The percentage of GPUTime LDS is stalled by bank conflicts. |
CSLDSFetchInsts | The average Fetch instructions from the local memory executed per thread (affected by flow control). |
CSLDSWriteInsts | The average Write instructions to the local memory executed per thread (affected by flow control). |
CSPathUtilization | The percentage of bytes read and written through the FastPath or CompletePath compared to the total number of bytes transferred over the bus. To increase the path utilization, remove atomics and non 32-bit types. |
ShaderBusyCS | Percentage of work done by shader units for CS. |