Jump to content

Instanced Animation Rendering


Recommended Posts

Hi all,

after this discussion about instanced animation, and my long year interest in crowd rendering and simulation in general, I implemented two different approaches of instanced animation rendering in UNIGINE. The main goal was to push all the work to GPU and keep performance low on CPU site. I also wanted to support different LODs and multiple numbers of different "looks" to get a more diverse crowd. Each instance should be controlled by its own, so transformation, animation type and frame update should be available for each instance. Finally, the system should be an out-of-the-box solution that fully aligns only with UNIGNE-related content, so using GOLAEM or other middleware-integration was not an option.


During my implementation I discovered two ways of instanced rendering of animated meshes, all with their pros and cons.


Bone baked instanced animation

This one wasn't really new and already explained in one of the NVidias GPU Gems (Link), so there was "only" a port to UNIGINE engine. For this solution, all bone transformations for each frame of an animation is baked into an texture. On GPU side, you can transform the vertex according to their bones/weight in vertex shader. Bone baked animation is pretty flexible and have a few advantages:

  • the texture size only depends on the number of bones available for an mesh. While (for now) the maximum number of bones for each surface is 128, the texture size is 512 x 512 max.
  • The mesh itself could have any number of triangles, so this is very good for supporting very large meshes.

On the con side we have:

  • A slightly "heavy" shader. For each frame, for each vertex we need to decode the bone transform from the texture and apply it to our vertex instance. Because of storing matrix information in our texture, we need to take 3-texture fetches into account.
  • On top of that, each vertex "may" be influenced by four bones, so at worse, we need to have 12 texture fetches.

All-in-all this approach is indeed very powerful and save a lot of performance. For an animated mesh with 2274 triangles and 45 bones we can render (with an i9-10850K CPU and an Nvidia Geforce 1080TI) around 10.000 meshes with 10 DIPs at 150 FPS. When rendering 40.000 meshes, we still got around 40-50 FPS with 40 DIPs. The bottleneck here was the GPU side, UNIGINE profiler showed twice the waiting time for GPU than have CPU load.

Because of this I have tried a second approach:


Vertex baked instanced animation

Instead of baking only the bone information to our texture I directly baked the vertex position into it. Thus degrading our GPU to a simple LUT implementation for our animation data. So here we have

  • a faster approach on the GPU side because of a single texture fetch during vertex shader stage.

on the pro side, but we have the point

  • Textures can be really large because we store for each animation frame, for each vertex, its position. At maximum for a 4K texture we can store only mesh information for 167.772 vertices by using 100 animation frames.

on the con side. Still, the simplification of our GPU shader results in a 30 % performance boost, so 40.000 meshes will be rendered around 60-70 FPS. CPU is still faster here, so GPU is still the bottleneck.


Generating animation textures


To generate such animation textures I have written basic editor. The plugin loads an mesh, applies an animation file to it and stores for each instancing approach the proper animation texture. For now I have chosen an RGBA32F-texture to give maximum precision when storing transform data. In future, this can be made flexible to allow smaller texture data sizes, a more compressed data but less precise information.

The editor will calculate for each animation data the proper information stored and the texture size as well. While on implementation side, the animation texture data is stored in an 2D texture array, the size needs to be equal for each animation file. Therefor you can change the size manually if needed, in future a bulk-procession with aligned information generation can be for sure implemented.


Usage in the project

Both implementations are using the ObjectExtern-class, so working with it is very easy and don't need any additional work. The objects manage the rendering part, LOD calculation animation frame update internally, so each single instance can be treated as an separate animated object. During setup, the user can add for each LOD an mesh to be drawn, their animation file data, the visibility and the mesh texture. In the update phase, the user may apply for each instance transform data or a new animation to be played. The internal system collects all instances in LOD groups and render them together. Because the animation update is done on GPU side, different instances with different animations and different mesh texture will still be rendered in one draw call.

There is one single "design limitation": An added mesh needs to only have one surface for the whole object. Otherwise it increases the number of DIPs even more which I wanted to avoid. This may change in future when internal frame calculations are done asynchronously. UNIGINE also offers an great feature: merge multiple surfaces into one with one single function. So I don't see that as blockig point at the moment.

Future additions

For now, the system is yet powerful, but can be improved in various ways in future. So I might work on the following things to improve the system.

  • Animation blending. For now only one animation can be applied for each instance at a time. Blending between two animations is not that hard an might be implemented easily.
  • Shadows and metal-based materials. Currently only a single albedo texture is added to the mesh surface. A more complex pipeline for metal-based material rendering can be added for sure.
  • Asynchronous frame update to speed up CPU part even more.


What else:

Instanced animation rendering is definetly possible with current the UNIGINE version, the system can be easily added into other project environments where crowd rendering is needed. Performance is still pretty good, but can for sure be optimized any further. Last but not least I want to thank @karpych11 @rohit.gonsalves and @sweetluna for giving me the right direction in some cases and are very patient into answering my simple but stupid questions regarding shader programming. Much appreciated!

  • Like 14
Link to comment

Dear Christian.

This is a wonderful achievement. Congrats and go for it with future updates and upgrades.

@Unigine: Please opt for UNIGINE marketplace. If such things come for Community either free or paid, it will help to thrive the community and add value to usage of this wonderful engine. 

I am trying to push things for virtual production at indie studios. I get one answer we need content. Content is king. But finding readymade free or paid content for UNIGNE projects is hard at this moment. Please think about this very important aspect to make it possible.


Link to comment
  • Create New...