Performance Profiler
A performance profiler displays performance data in a timeline. It reports how much time is spent per each frame for updating all aspects of your project: rendering nodes that are in view, updating their states, executing scripts with game logic, calculating physics, etc.
With the profiler, you can:
- Detect the bottlenecks of your application
- Check if art assets optimization is required
- Check if code optimization is required
- Compare the profiling results before and after the changes
Activate Profilers
To turn the profiler on, click Optimize -> Performance Profiler and choose the required profiler mode:
The following profiling modes are available:
- Generic profiler shows only the general statistics block.
- Rendering profiler shows the detailed rendering statistics and the timeline chart.
- Physics profiler shows the detailed physics-related statistics (within the Physics radius) and the timeline chart.
- World Management profiler shows the statistics on the whole loaded world.
- Thread profiler shows the statistics on loading threaded resources.
To show profiler statistics in the in-game mode, exit UnigineEditor with the enabled profiler (by typing editor_quit in the console) or type show_profiler command and a value from 1 to 5 in the console. To disable the profiler in the in-game mode, type show_profiler 0 in the console.
You can also enable the additional Renderer profiler block via render_profiler console command. For this block to be shown, the base profiler (in any mode) should be enabled.
Generic Profiler
Total | The total time in milliseconds that both rendering and calculating of the frame took. This is the duration of the main loop in the application execution sequence. You can count the framerate of your application if you sum up the Total time (preparing data on the CPU) and the Present time (when the GPU has finished its rendering work): FPS = 1 / (Total + Present) x 1000 For example, 5 ms of Total time and 5 ms of Present time will result in FPS equal to 100. 20 ms in sum mean 50 FPS. By 40-50 ms the framerate is too low and the application needs to be optimized. |
---|---|
Update | The time to update application logic. This includes executing all functions in the update() loop of the world script. It also includes the update of states of all nodes (for example, update of the skinned animation or of a particle system to spawn new particles). To sum it up, this is the duration of the update loop.
|
Render | The time it took to prepare all data to be rendered in the current frame and feed rendering commands from the CPU to the GPU. If the Render time is too high, it signals that art assets may need to be optimized, for example:
|
Interface | The time required to render all GUI widgets. |
Physics | The time required to perform all physics calculations. |
Present | The time between completing all calculations on the CPU up to the moment when the GPU has finished rendering the frame. (See the illustration). This counter is useful to analyze the bottleneck in your application's performance.
|
Heap | The size of all memory pools allocated for the application. Unigine allocator allocates memory pools in pools which allows the allocation to be faster and more efficient (if USE_MEMORY directive is used, by default). As the memory is allocated in pools, the counter value increases stepwise. |
Memory | The size of all memory blocks allocated on demand. This counter reports the how much memory in allocated pools the application resources really use.
|
System | The size of RAM memory used for the application. |
Allocations | The number of allocation calls during the frame. (This counter reports an allocation call even if several of bytes are requested to be allocated). |
Meshes | The size of the memory used for mesh geometry. |
Textures | The size of the memory used for textures in materials. |
Samples | The size of the memory used for sound samples. |
Rendering Profiler
The following statistics is displayed in addition to the generic one:
RLights | The number of light passes rendered per frame. This means that the counter displays the number of all light sources that are currently seen illuminating something in the viewport. This value also includes additional passes for rendering lights in the reflecting surfaces (if dynamical reflections are used). Plain 2D reflection will multiply the number of rendering passes by two, while cubemap-based reflection with six faces updated each frame will multiply the number of rendering passes by six.
Each light redraws mesh polygons it illuminates. That is why the higher the number of light sources, the higher the number of polygons the graphics card has to render, and the lower the performance. For example, using two omni lights will as much as double the rendered geometry they shine on. |
---|---|
RShadows | The number of shadow passes rendered per frame. Each light requires a shadow pass to calculate the shadows. Again, if there are reflecting surfaces with shadows drawn reflected, this will increase the number of shadow passes. |
RReflections | The number of reflections drawn per frame. In case of cubemap reflections, if all six faces are updated, six reflections are rendered each frame. |
RProcedurals | The number of procedural textures rendered per frame. Procedural textures of post-process materials applied to the other procedural textures are also taken into account. |
RShaders | The number of shaders set per frame. (Shaders are set in each of the rendering passes; hence if only one material used, its shader still needs to be set several times. When nothing is visible and the screen is black, even in this case the composite shader is still used.) |
RMaterials | The number of materials set per frame. (Materials are set in each of the rendering passes.) |
RTriangles | The number of triangles rendered per frame. This includes all polygons that are currently visible in the viewport. In addition, each light source has to redraw the geometry it illuminates, increasing the overall count of rendered triangles. In order to avoid GPU bottleneck, keep the number of dynamic light sources and their radius as low, as possible. |
RPrimitives | The number of geometric primitives rendered per frame. This includes points, lines, triangles, and polygons. The visualizer and the profiler itself also add to this counter. The value differs dramatically if tessellation is used. In this case, RTriangles reports the number of triangles in the coarse mesh, while RPrimitives shows statistics on the number of tessellated primitives.
Primitives statistics are available only under DirectX 10 and 11. |
RSurfaces | The number of surfaces rendered per frame (in all rendering passes). Each light source doubles the number of surfaces if they are lit. |
RDecals | The number of decals rendered per frame (in all rendering passes). |
RDips | The number of draw calls. The higher the number of identical mesh surfaces with the same material, the more effective the instancing is (enabled by default). This means, the number of draw calls is minimized offloading both the CPU and the GPU. You can compare the number of surfaces (RSurfaces) and the number of DIPs used to render them. For example, if there are 30000 surfaces and 1000 DIPs, it means that 30 instanced surfaces of meshes are rendered per only one draw call (RSurfaces/RDdips). Thus the instancing provides performance boost. |
RMTris/sec | The number of millions of triangles rendered by the graphics card per second. |
RKSurf/sec | The number of thousands of surfaces rendered by the graphics card per second. |
RKDips/sec | The number of thousands of draw calls made by the graphics card per second. |
RSpawn | The time in milliseconds that the engine spends on loading meshes and textures. |
Physics Profiler
This profiler shows statistics within the Physics radius.
- PUpdate
- PResponse
- PIntegrate
The following statistics is displayed in addition to the generic one:
PIslands | The number of physical islands within the physics radius that could be calculated separately. The lower this number, the less efficient multi-threading is, if enabled. |
---|---|
PBodies | The number of bodies within the physics radius. |
PJoints | The number of joints within the physics radius. |
PContacts | The total number of contacts within the physics radius; this includes contacts between the bodies (their shapes) and body-mesh contacts. |
PBroad | The duration of the broad phase of physic simulation when potentially colliding objects are found. |
PNarrow | The duration of the narrow phase when exact collision tests are performed. |
PUpdate | The duration of the update phase when objects are prepared for their collision response to be calculated. |
PResponse | The duration of the response phase when collision response is calculated and joints are solved. |
PIntegrate | The duration of the integrate phase when physics simulation results are applied to bodies. |
PSimulation | The duration of all simulation phases added together. |
World Management Profiler
This profiler shows statistics on the whole world.
The following statistics is displayed in addition to the generic one:
WNodes | The total number of nodes in the world (both enabled and disabled). |
---|---|
WBodies | The total number of bodies in the world. |
WJoints | The total number of joints in the world. |
WSpawn | The time in milliseconds that the engine spends on generating content in procedural nodes (such as grass, clutters, world layers). |
Thread Profiler
The following statistics is displayed in addition to the generic one:
World | The time of asynchronous loading the current queue of nodes in milliseconds. |
---|---|
Sound | The time of asynchronous loading sounds in milliseconds. |
PathFind | The time of asynchronous pathfinding calculations in milliseconds. |
FileSystem | The time of asynchronous loading files in milliseconds. |
Additional Renderer Profiler
The renderer profiler allows to find out what aspects of art content could be optimized to increase the overall performance. However, enabling the renderer profiler incurs a very large overhead and the application runs significantly slower while profiling. The reason for that is that the GPU is synchronized with the CPU to measure how long each rendering task takes.
Counters are hidden in case the renderer option they report statistics on is not used.
RPIntersection | The time required to go down the BSP tree and cut off all nodes that are currently not visible in the frustum. |
---|---|
RPReflections | The time required to render reflections. |
RPUpdate | The time required to prepare surfaces for rendering. This includes setting alpha-blend fading and tessellation switches. |
RPSort | The time required to sort all polygons to be rendered in the proper order. |
RPDeferred | The duration of the deferred pass. |
RPQueries | The time required to render nodes with Query flag on. |
RPDeferredLight | The duration of the deferred light pass. |
RPOpacityAmbient | The duration of the ambient pass for opaque objects. |
RPOpacityLight | The duration of the light passes for opaque objects. |
RPDecalsAmbient | The duration of the ambient pass for decals. |
RPDecalsLight | The duration of the light passes for decals. |
RPDeferredLightProb | The duration of the pass for rendering the global illumination created with the probe light. |
RPTransparent | The duration of the pass for rendering transparent objects. |
RPScattering | The time required to render light scattering pass. |
RPVolumetric | The time required to render volumetric shadows. |
RPDOF | The time required to render the depth of field effect. |
RPComposite | The time required to compose the final viewport image in the composite shader (before applying postprocesses). |
RPRefraction | The time required to render refractive materials. |
RPOcclusion | The time required to render ambient occlusion and global illumination. |
RPRender | The time required to render Render postprocess materials. |
RPPost | The time required to render Post postprocess materials. |
RPHDR | The time required to render the HDR effect. |
RPGlow | The time required to render the glow effect. |
RPVelocity | The time required to render the velocity buffer with moving physical objects for motion blur. |
RPAuxiliary | The time required to render the auxiliary pass. |
RPShadowWorldIntersection | The time required to find objects that cast shadows from world light sources and that are currently visible in the view frustum. |
RPShadowWorldRender | The time required to render shadow maps from world light sources, if any. |
RPShadowProjIntersection | The time required to find objects that cast shadows from projected light sources and that are currently visible in the view frustum. |
RPShadowProjRender | The time required to render shadow maps from projected light sources, if any. |
RPShadowOnmiIntersection | The time required to find objects that cast shadows from omni light sources and that are currently visible in the view frustum. |
RPShadowOmniRender | The time required to render shadow maps from omni light sources, if any. |