Asynchronous Data Streaming
Data streaming is an optimization technique intended to reduce spikes caused by loading graphic resources and compiling shaders. With this technique, not all the data is loaded into memory at once. Instead, only the required data is loaded, and the rest is loaded progressively on demand.
Resource loading is performed and transferred to the GPU in separate asynchronous threads. After that, resources are synchronized and added to the virtual scene on the CPU side.
In UNIGINE, asynchronous data streaming is enabled by default. You can disable asynchronous data streaming in UnigineEditor or via the console:
- In UnigineEditor, open the Settings window and go to the Streaming section. Here you can switch the streaming mode for textures and/or meshes.
- In the console, run the corresponding commands that switch the streaming mode for textures and/or meshes.
There are two main streaming modes — asynchronous (Async) and forced (Force). The Force mode ensures force-loading of all resources required for each frame simultaneously (e.g., grabbing frame sequences, rendering node previews, warmup, etc.).
For meshes, there is an additional All mode that disables mesh streaming and provides loading of all meshes available in the project on the application start-up. This mode is good for small projects with few meshes.
The streaming system provides asynchronous loading of the following data to RAM:
- All texture runtime files and textures with the Unchanged option enabled, including cubemaps, voxel probe maps, and shadow maps of baked shadows.
- Meshes of ObjectMeshStatic, ObjectMeshClutter, ObjectMeshCluster, ObjectGuiMesh objects, and DecalMesh.
Procedurally generated objects such as ObjectMeshClutter are generated in a separate thread, significantly reducing performance costs.
You can obtain general information on streamed resources by using the render_streaming_meshes_info and render_streaming_textures_info console commands.
It is also possible to print the list of loaded resources and detailed information on them by using the render_streaming_meshes_list and render_streaming_textures_list console commands.
Asynchronous Shader Compilation#
In addition to the asynchronous loading of meshes and textures, the streaming system provides asynchronous shader compilation and loading.
There are also 2 modes - asynchronous (Async) and forced (Force). In the Force mode, all shaders required for the current frame are compiled and loaded to RAM simultaneously in the current thread. By default, the asynchronous mode is used.
The number of compiled and loaded shaders are available in the Performance Profiler tool.
Common Streaming Settings#
To take advantage of multithreading, set the maximum number of threads used for resource streaming by using the render_streaming_max_threads console parameter. A higher number of threads results in faster streaming but may cause spikes in the case of excessive consumption of GPU resources.
Memory Limits#
You can limit the number of loaded and unloaded graphic resources per frame by specifying the corresponding budgets. Use them to find the balance between loading/unloading speed and performance: keep in mind that increasing the budget increases streaming performance, however, memory consumption increases as well.
Adjustable memory limits and life times enable to avoid situations where resources remain loaded in memory or video memory even after they are no longer used. They are defined for meshes loaded to RAM/VRAM, textures, and particles separately in a percentage of the total RAM/VRAM.
The memory limit is associated with the lifetime: resources are deleted from memory or video memory only when both values are exceeded.
Memory limits and RAM / VRAM occupied by streamed resources are available in the Performance Profiler tool.
The limits are slash separated: the first value displays the current RAM / VRAM usage, and the second value is the limit.
Texture Cache#
The streaming system uses the texture cache composed of minimized copies generated for all textures stored in the data/.cache_textures folder. These copies are used instead of the originals while they are being loaded.
Texture cache is loaded at Engine's startup and always stays in the memory after loading. The following default loading order ensures smooth loading and rendering of resources:
- Texture cache
- Geometry
- Uncached textures cause spikes as texture cache is generated for them on the fly; materials with uncached and unloaded textures applied are rendered black
- Full-size textures
Using the textures_cache_preload flag in the boot config file, you can choose the texture cache loading priority — preloaded or loaded after geometry data.
The video memory amount currently occupied by the texture cache is available in the Performance Profiler tool.
The render_streaming_textures_cache_load and render_streaming_textures_cache_unload console commands enable you to control loading of texture cache. For example, after loading full-size textures, you can unload the texture cache from video memory for better performance.
Mesh Streaming#
Meshes can be loaded to RAM and VRAM separately for more efficient work with geometry. This allows eliminating memory leaks: meshes participating in collisions and intersections can be loaded to RAM only, if they are not currently rendered.
There are 3 modes of mesh streaming to RAM/VRAM:
- Asynchronous mode that provides asynchronous loading of meshes.
- Forced mode for force-loading of meshes required for the current frame at once.
- Mode for loading all meshes available in the project on the application start-up. Actually, this mode disables mesh streaming at all.
The asynchronous loading to RAM and VRAM differs. Even if a mesh hasn't been loaded to video memory in time, it doesn't affect the application behavior (you may only notice some lag). However, if a mesh hasn't been loaded to memory in time, it may lead to incorrect physical behavior of objects in the scene.
First of all, we highly recommend you to use shapes for collision and intersection detection as it is faster. If, for some reason, it doesn't suit you, use the following methods:
- Load meshes and hold them in memory while they exist. API of some mesh-based objects provides this functionality out of the box. It may partially solve the problem with incorrect behavior, however, only a few meshes can remain loaded.
-
Use the prefetch system that allows asynchronous pre-loading of meshes participating in collisions and intersections to memory before they are used:
- Set the Radius prefetch mode.
- Specify the physics radius (for collisions) and/or the radius within which intersections are calculated.
- Specify the prefetch radius that should exceed the collison and intersection radius values.
You can also preload all meshes for which collisions and intersections are calculated (the Full prefetch mode), however it will significantly increase RAM usage.
- API of some mesh-based objects, as well as the MeshStatic class API provides also methods which allow implementing a custom prefetch logic for pre-loading meshes.
Asynchronously streamed meshes shouldn't be modified. The only way to change such mesh is to make it procedural. A procedural mesh is a mesh created via code, such meshes have a specific streaming mode — they are always kept in memory after creation and never unloaded until the object is destroyed via code or the mesh returns to its normal mode (streaming from a source file). The mesh-based objects API allows switching a mesh to the procedural mode and apply changes.
OpenGL Settings#
Settings and workflow for OpenGL API are slightly different than for DirectX API.
Under OpenGL, the Data Streaming System engages two intermediate buffers to provide data transfer between CPU and new resource:
- Async Buffer used for mesh and texture streaming
- Async Buffer Indices used for streaming of vertex indices of meshes
The size of the Async Buffer buffer must correspond to the size of the largest resource (mesh/texture); otherwise, in the case of a larger resource, the buffer will be resized, causing a spike.
The Async Buffer Synchronization parameter stands for the mechanism of buffer synchronization. So, async buffers are created only once and then synchronized, reducing the time on allocating and freeing memory. When the synchronization is disabled, both Async Buffer and Async Buffer Indices are created anew for each new resource. This reduces the number of buffer synchronizations but increases the number of memory allocations.
Sometimes (depending on the hardware/driver used, e.g., when the main thread is affected by synchronization primitives in other threads), memory allocation may be faster than synchronizations; in such cases, when streaming becomes unacceptably slow, it is recommended to disable buffer synchronization.
There are some known issues and workarounds for some hardware/driver software:
- The Mesa 3D GL: The buffer synchronization must be disabled (gl_async_buffer_synchronization 0) for better performance. The updated Open Graphics Drivers are required.
- Intel: It is necessary to consider that VRAM is limited by OS to one-half of RAM.