Microprofile
UNIGINE has support for Microprofile, an advanced external embeddable CPU/GPU profiler with support for per-frame inspection.
The profiler features the following:
- Profiling operations performed by the engine on CPU and GPU.
- Profiling the engine threads.
- Profiling up to 1000 frames.
- The performance data output to a local web server.
Running Microprofile#
The Microprofile tool is available only for the Development builds of UNIGINE Engine: it won't be compiled for the Debug and Release ones. You can use the microprofile_info console command to check if the Microprofile is compiled.
The performance data obtained by the Microprofile can be output to a local web server.
Enabling and Disabling Microprofile#
Microprofile is performance-consuming, that's why we recommend you to enable it only when you work with it and keep disabled otherwise.
One way to disable Microprofile is to untick the corresponding option before running UnigineEditor or the application:
Another way is to use the microprofile_enabled console command.
Visualization Using Built-In Server#
To visualize the performance data using the local web server, perform the following:
- In the console, set the the number of frames to be profiled via the microprofile_webserver_frames console command. You can skip this step: by default, 500 frames will be profiled.
- On the Menu Bar of UnigineEditor, choose Tools -> Microprofile.
The performance data will be displayed in your Web browser.
- You can display only a part of the profiled frames: in the Web browser address bar, add /<number_of_frames> to the current URL.
- Don't forget to refresh (F5) the page in the Web browser while the profiling data is collected as it is not performed automatically.
For example, if you specify localhost:1337/100, only the first 100 frames will be displayed.
Performance Data#
The Microprofile visualizes the detailed per frame performance data on the operations performed by the engine on CPU and GPU and on the engine threads. In the Microprofile main menu, you can change the visialization mode: click Mode and choose the required one. By default, the Detailed mode is set.
In the Detailed mode, each operation (function) and thread is displayed as a separate colored region. The regions are hierarchical: the function called by the other function is displayed under the last one. The size of the region is determined by the time the corresponding operation takes.
In the picture below, the Engine::do_render() function calls the RenderRenderer::renderWorld() functions and so on:
To view the data on a certain operation or a thread, point to the corresponding region. To zoom in/out the displayed regions, scroll the mouse wheel.
CPU Data#
In the Main group of the performance data, the call stack of the operations (e.g., update, rendering, etc.) performed by the engine on CPU is displayed.
GPU Data#
In the GPU group of the performance data, the call stack of the operations performed by the engine on GPU is displayed. In addition to the main performance data, for each function (e.g. environment rendering, post materials rendering and so on), the number of DIP calls and rendered triangles is shown. Also there can be the number of surfaces, lights, shadows rendered by this function, the number of materials and shaders used; the information on the node or material for which the function is called (identifier, name, etc.).
When the region that corresponds to the certain function is pointed, the Microprofile displays when this function is called on CPU and how much time is spent on its performing.
OpenGL or DirectX commands can be combined into GPU Debug groups that are created automatically when defining a profiling scope. All graphic resources loaded from external files, such as textures, shaders, static or skinned meshes, as well as the Engine's internal resources, have their own debug names to simplify identification. This information can be useful when using Graphics API debuggers, such as NVIDIA Nsight or RenderDoc.
Engine Threads Data#
The performance data on the engine threads is visualized in the CPUThread, SoundThread, AsyncQueueThread, WorldSpawnMeshClutterThread, WorldSpawnGrassThread groups.