There is a number of optimization techniques and practices relating to world management. They are used to decrease the rendering load without losing much of the image quality.
Levels of Details#
Smooth alpha-blended levels of details (LODs) are used to decrease geometry complexity of 3D objects, when these objects move away from the camera, making it possible to lighten the load on the renderer.
UNIGINE offers two mechanisms of LODs switching:
- Disable one LOD and enable another at the specified distance defined by two values: maximum visibility distance of the first LOD (Max Visibility) and minimum visibility distance of the second LOD (Min Visibility).
- Smoothly fade one LOD into another at the specified interval defined by two values: minimum fade distance of the first LOD (Min Fade) and maximum fade distance of the first LOD/minimum fade distance of the second LOD (Max Fade).
|Visibility Distances||The LODs visibility parameters are defined by the visibility range. It is set by two parameters: the minimum visibility and maximum visibility distances measured from the camera. If the surface is within the range specified by these bounds, it is displayed, otherwise, it is hidden.
Ranges for surfaces that represent different LODs of the same object should not overlap.
|Fade Distances||Discrete LODs are likely to have one noticeable and quite distracting artifact - "popping", when one LOD is switched to another. Rather than abruptly switching the LODs, they can be smoothly blended into each other over a fade distance making the transition imperceptible.
Smooth LODs with alpha-blend fading are available only when render_alpha_fade is set to 1 (which is set by default).
Just like visibility distances, the range of the fading region is defined by the minimum fade and maximum fade distances:
Although alpha blending between the LODs looks by far better, it also means that within the fade region two versions of the same object are rendered at the same time. For this reason, the fade region should be kept as short as necessary.
Within distance of surface visibility
Within fade distance
Suppose, we have two LODs: high-poly surface_lod_0 and low-poly surface_lod_1, and we need to organize smooth transition between these two LODs.
- We want the switching distance at 50 units. For that, we need to "dock" visibility distances of surfaces to each other:
- Our first LOD surface surface_lod_0 should be always presented when the camera is close to the object. So, the minimum visibility distance is set to -inf. And 50 units from the camera will be the maximum visibility distance for it.
- Directly following it, comes the second LOD surface surface_lod_1. It is visible from 50 units (which is the minimum visibility distance) - and up to infinity (the maximum visibility distance = inf).
- Now the LOds are switched but sharply rather then smoothly. To be smoothly blended, the symmetrical fade-out (for the 1st LOD) and fade-in (for the 2nd LOD) distances are set. Let's say, the fading region should be 5 units.
In the result, the LODs will be changed in the following manner:
|From the BB of the object
to 50 units
|Only the 1st LOD surface surface_lod_0 is completely visible|
|50 — 55 units||The 1st LOD fades out, while the 2nd LOD fades in|
|From 55 units and further||Only the 2nd LOD surface surface_lod_1 is completely visible|
There is one more LOD-related parameter: reference object to measure the distance to switch LODs. It specifies whether the distances should be measured to the surface itself or to any of the surfaces or nodes up the hierarchy branch. There are two reference objects for each surface:
|Min Parent||Minimum parent is a reference object to measure the minimum visibility distance from:
|Max Parent||Maximum parent is a reference object to measure the maximum visibility distance from. The same principle is used to count it.|
Let's take a model of the house, for example. When the camera is close by, the high-poly detail surfaces are seen, such as the door arch, stone corners, round window and roof tiles. When we move away, all these surfaces should be simultaneously changed by one united low-poly LOD surface.
Low-poly model to be used as distant LOD
The problem is, all this detail surfaces have different bounding boxes. So if their distances are checked for themselves (0 as min and max parents), we can have a situation when LODs of different parts of the house are turned on (or off) unequally, because their bounding boxes will be closer to the camera. This may cause z-fighting artifacts. Here the distant corner has not yet switched to a more detailed LOD, while the close one is drawn twice: as a high-poly corner LOD and at the same time the united low-poly house LOD.
If we set a bounding box of the whole house to be a reference object (min and max parent to 1), all surfaces will switch on simultaneously no matter what side we approach the house from.
One more option is to use different reference objects when check. For example, the lower bound (minimum distance) is checked for the surface itself, and the upper bound (maximum distance) is checked for the parent. This may sound complicated, so take a look at the pictures below. The first picture shows a ring, which is split into surfaces according to levels of details.
Here, surfaces from the rightmost column will be displayed, when the camera is very close to them. The leftmost surfaces will be displayed, when the camera is very far from them. Merging of several surfaces into one reduces the number of objects to draw, hence, reduces the number of DIP requests and speeds up rendering.
Note that all of the minimum distances here are measured to the surface itself, but almost all of the maximum distances are measured to another reference object, a parent. One more picture will help you to understand, why it is so.
A star is the camera; it doesn't matter now what exactly the camera will be looking at. On both images, required surfaces are drawn according to the camera position and distances from the camera to the corresponding reference objects. For example, on the left image, the upper left part of the ring is a single surface, the upper right part is split in two separate surfaces, and the bottom part is also a single surface. On the right image, the whole upper part is divided into the smallest possible sectors.
Here, distances are measured to different reference objects to properly "turn off" smaller single sectors and display a larger sector instead. The maximum distance is calculated to the parent sector, because the distances to the neighboring subsectors may differ too much. The minimum distance is calculated to the current sector, because we need to show it, if the camera is too close to it.
Often, it is enough to create and display to the user only a part of an artificial world, in a form of a labyrinth-like series of rooms and passages between them. Parts of open space, if they are present and do not spread at infinity (if they are confined areas), can also be considered "rooms". These limitations make such world ideal for being viewed as a set of sectors and portals.
The whole space is partitioned into convex areas called sectors ("rooms"). If there is some opening - a door or a window - between two adjacent sectors, through which one sector can be partially seen from another, this opening is called a portal. The sectors and portals help the renderer to determine, which areas and objects are visible from any given point in the world. Also, if a neighboring sector is seen through a portal, this portal is used as a viewing frustum for the area it leads to, allowing viewing frustum culling.
The techniques that are appropriate for indoor scenes, are not efficient when it comes to managing the vast landscapes. The rendering speed directly depends on the number of entities and polygons drawn in the scene, as well as physic computed for the objects, the count of which is usually very high in the outdoor scenes. So the main goal of the managing is to render only the regions that are seen while culling all the rest. If the world cannot be narrowed down to a set of closed areas, the approach called space partitioning becomes relevant.
Space partitioning in Unigine is implemented using adaptive axis-aligned BSP trees.
Binary Space Partitioning#
Binary space partitioning is a method for dividing the scene along the axes into the regions that are dealt with individually and thus are easier to manage. BSP tree is a hierarchical structure obtained by division and organizing all the scene data. The adaptive behavior allows to optimize the BSP algorithm by adjusting sizes of the regions to the processed geometry and distribution of objects in the world.
The BSP tree is built in the following way:
- First of all, the root node is created. It is done by simply spanning an axis-aligned bounding box over the whole scene.
- Then the space of the bounding box is recursively subdivided into two regions by a partitioning plane that is perpendicular to one of the three major axes. As a result, two nodes of the tree are created. Each of these nodes is again enclosed in an axis-aligned bounding box and this step is repeated for each of them until the whole scene geometry is represented.
- The subdivision is stopped when the level with required number of editor nodes is reached. If a partitioning plane at certain level splits an object, such object stays at the previous level and does not slide down the tree. It often happens with big and extensive objects, like sky or huge buildings.
At rendering time, the engine loops through the BSP nodes to determine whether their bounding boxes are inside the viewing frustum. If a node passes this test, the same action is repeated for its children until a leaf node or a node that is outside the viewing frustum is reached. All necessary calculations are performed for visible regions, while the rest of the scene (i.e. objects, their lighting and physical interactions with each other) is discarded.
The tree is regenerated on the fly each time an object is added or removed from the world as well as when the object changes its status to collider or clutter object. If there were no changes, the tree remains the same. This quality makes adaptive BSP efficient not only when rendering static geometry, but also for handling dynamic objects.
To provide effective management of the scene on the one hand and good tree balancing on the other, separate trees are created for different editor node types:
- World tree handles all sectors, portals, occluders, triggers and clusters.
- Objects tree includes all objects except for the ones with collider and clutter flags.
- Collider objects form a separate tree to facilitate collision detection and avoid the worst case scenario of testing all the objects against all other objects. It is clear, that objects can intersect only if they are situated and overlap in the same region of the scene. This tree allows to drastically reduce the number of pair-wise tests and accelerate the calculation.
- Clutter objects are also separated as they are intended to be used in great numbers, which can disturb the balance of the main object tree.
- Light tree handles all light sources.
- Decal tree handles decals.
- Player tree handles all types of players.
- Physical node tree handles all physical forces.
- Sound tree handles all sound sources.
After the editor node level is reached, there still exists the need for further partitioning of the mesh. Division is based on the same principles: the tree must be binary and axis aligned. The only difference is that these trees are precomputed (they are generated at the time of world loading), because a mesh is a baked object and there is no need for the related trees to change dynamically. The mesh is divided into the following trees:
- Surfaces tree
- Polygon tree
These two mesh-based trees provide the basis for fast intersection and collision calculations with the mesh.
When the human eye views a scene, objects in the distance appear smaller than objects close by - this is known as perspective. While orthographic projection ignores this effect to allow accurate measurements, perspective definition shows distant objects as smaller to provide additional realism.
A viewing frustum (or a view frustum) is a field of view of the virtual camera for the perspective projection; in other words, it is the part of the world space that is seen on the screen. Its exact shape depends on the camera being simulated, but often it is simply a frustum of a rectangular pyramid. The planes of the viewing frustum that are parallel to the screen are called the near plane and the far plane.
As the field of view of the camera is not infinite, there are objects that do not get inside it. For example, objects that are closer to the viewer than the near plane will not be visible. The same is true for the objects beyond the far plane, if it is not placed at infinity, or for the objects cut off by the side faces. As such objects are invisible anyway, one can skip their drawing. The process of discarding unseen objects is called viewing frustum culling.
When the human eye looks at a scene, objects in the distance appear smaller than objects close by. Orthographic projection ignores this effect to allow the creation of to-scale drawings for construction and engineering.
With an orthographic projection, the viewing volume is a rectangular parallelepiped, or more informally, a box. Unlike perspective projection, the size of the viewing volume doesn't change from one end to the other, so distance from the camera doesn't affect how large an object appears.
Another popular practice is to remove objects that are completely hidden by other objects. For example, there is no need to draw a room behind a blank wall or flowers behind an entirely opaque fence. This technique is called occlusion culling. The particular cases of occlusion culling are the following:
- Portals and sectors (described above)
- Potentially visible sets that divide the space in a bunch of regions, with each region containing a set of polygons that can be visible from anywhere inside this region. Then, in real-time, the renderer simply looks up the pre-computed set given the view position. This technique is usually used to speed up binary space partitioning.
In large and complex environments with a lot of objects that occlude each other, culling of the occluded geometry allows significantly increase performance. The most appropriate decision, in this case, is to use occluders. They allow culling geometry that isn't visible behind them.
However, using occluders to cull large objects with a few surfaces may cause additional performance loss. Moreover, occluders aren't effective in scenes with flat objects, or when a camera looks down on the scene from above. So, when using the occluders, you should take into account peculiarities of objects to be culled.
Hardware Occlusion Queries#
Another way to cull geometry that is not visible in the camera viewport is to use a hardware occlusion query. It allows reducing the number of the rendered polygons therefore increasing performance. To run the hardware occlusion test for the scene before sending data to the GPU, set the Rendering -> Features -> Occlusion query flag. In this case, culling will be performed for all objects with the Culled by occlusion query flag set in the Node tab of the Parameters window.
When culling is enabled for the object, an occlusion query box is rendered for it. Its size coincides the size of the object's bounding box. If the occlusion query box is in the camera viewport, the object will be rendered; otherwise, it is not.
The hardware occlusion queries should be used only for a few objects that use heavy shaders. Otherwise, performance will decrease instead of increasing. It is recommended to enable queries for water or objects with reflections.
Asyncronous Data Streaming#
Data streaming is an optimization technique, which supposes that not all the data is loaded into random access memory (RAM) at once. Instead, only the required data is loaded, and all the rest is loaded progressively, on demand.
In Unigine, asynchronous data streaming is enabled by default. Due to data streaming, the following data is loaded asynchronously to RAM:
Sometimes it might be necessary to force-load all meshes and/or textures required for each frame at once (e.g. grabbing frame sequences, rendering node previews). In such cases you can use render_manager_create_meshes and render_manager_create_textures console commands.
Keep in mind, that these console commands do not force-load all required ObjectMeshStatic, ObjectMeshClutter, ObjectMeshCluster meshes and textures on world start-up.
Procedurally generated objects such as ObjectMeshClutter and ObjectGrass are generated in a separate thread, which significantly reduces performance costs.
Keep in mind, asynchronous data streaming does not affect meshes and textures transmission to the GPU: they are transferred in the main thread.
Multi-threaded Update of Nodes#
Multi-threaded update of nodes (if enabled via world_threaded console command) can substantially increase performance. For example, this can be very handy when a big number of particle systems are rendered in the world.
- All nodes that have one root in the nodes hierarchy are updated in one thread. To parallel the jobs on nodes update, make sure that they do not use the same parent.
- Each Node Reference is handled as a root node without any parents (regardless of their position in node hierarchy). For example, this means that particle systems contained in node references are always optimized for multi-threaded update.