Rendering 25 Mil particles


photo

Recommended Posts

As output of a simulation was considering the problems involved in such a target.

With my current rig I will not able to test an emitter with these volume of particles, so want to ask if will be possible with latest generation GPU (Ampere/RDNA2) or there is some other sw limitation.

Link to post

Hello!

None of latest GPU doesn't help much in this case because whole calucations based on CPU power. Within current implementation regular desktop PC with 4 cores takes like 10ms per frame to calc ~1kk particles which is based on 10 emitters with 100k particles per each. So with your case it will requires approximately at least 0.25s per frame to simulate 25kk particles.

You can try to use compute shade and develop your own particles based on GPU power, to see an example follow SDK -> Samples -> C++ API -> Render -> ComputeShader

Can I curious what for you need that amount of particles? Might be we could suggest you another solution, cause basically artists requires around 10k particles within the screen, even when creating snow scene.

Thanks!

Link to post

Hello @bmyagkov we will have a custom system for simulation, optimized to run on the GPU so that will be able (stretching to the limit) to sim 25 Mil particles dynamics with a 10 fps min goal. Will be a physics driven simulation, not VFX, we have already selected the most promising solutions for the specific domain we need to use it.

My concern was if we will be able using Unigine standard particle system - recieiving input from our simulator - to next render such an amount of particles. Maybe we will not need finally to visualize all the 25 Mil, just checking where our problems can lie.

Edited by davide445
Link to post
20 minutes ago, davide445 said:

Hello @bmyagkov we will have a custom system for simulation, optimized to run on the GPU so that will be able (stretching to the limit) to sim 25 Mil particles dynamics with a 10 fps min goal. Will be a physics driven simulation, not VFX, we have already selected the most promising solutions for the specific domain we need to use it.

My concern was if we will be able using Unigine standard particle system - recieiving input from our simulator - to next render such an amount of particles. Maybe we will not need finally to visualize all the 25 Mil, just checking where our problems can lie.

I'm sorry but as our team said that is impossible. You need to write you own render for it. The only thing that you could reuse it's shader code.

Link to post
6 minutes ago, bmyagkov said:

I'm sorry but as our team said that is impossible. You need to write you own render for it. The only thing that you could reuse it's shader code.

Ok good to know. Based on your previous calculations let's say to render at 30fps (the physics sim will be slower) we can have on screen max 3 Mil particles using current status?

Edited by davide445
Link to post
36 minutes ago, davide445 said:

Ok good to know. Based on your previous calculations let's say to render at 30fps (the physics sim will be slower) we can have on screen max 3 Mil particles using current status?

Just checked on our own rig with Ryzen 3900X (12 cores with 24 thread) loaded with 2080ti and I was able to get 30 fps with 3KK particles on scene with current particles system implementation.

Basically, some beast like Threadripper 3990X within 64 cores should push the result even further but we don't have any of it at this moment to confirm those guess.

  • Like 1
Link to post
1 minute ago, bmyagkov said:

Just checked on our own rig with Ryzen 3900X (12 cores with 24 thread) loaded with 2080ti and I was able to get 30 fps with 3KK particles on scene with current implementation.

Basically, some beast like Threadripper 3990X withing 64 cores should push the result even further but we don't have any of it at this moment to confirm those guess.

Thanks for helping me in setting the boundaries of what is possible. We can say this is more CPU than GPU-bound? What was your load in the test?

Edited by davide445
  • Like 1
Link to post
1 minute ago, davide445 said:

Thanks for helping me in setting the boundaries of what is possible.

Was glad to help! Thanks

Link to post
1 minute ago, bmyagkov said:

Was glad to help! Thanks

PS We can say this is more CPU than GPU-bound? What was your load in the test?

Link to post
2 minutes ago, davide445 said:

PS We can say this is more CPU than GPU-bound? What was your load in the test?

CPU-bound for sure, only 20% of 2080ti was loaded with 1080P resolution screen

Link to post
9 hours ago, bmyagkov said:

CPU-bound for sure, only 20% of 2080ti was loaded with 1080P resolution screen

Returning on the topic. Apart the tot number of particles to be moved that we still need to discover, we will need in fact just to render them, with positions and speed given from the external simulator.

Looking at the APIs didn't seem there are a way to just tell where the single particle need to be, so that I suppose we have two choices I wanted to ask if you agree:

- use by API the modifiers to start from a position and next push them around to match as best as possible the simulator results, but loosing the fine behavior coming out from the simulator since in fact we need to apply the modifier to a single emitter influencing a whole set of particles and not a single one. In this case we will be CPU-bound since we need to pass trough the current particles management system

- as you wrote before, just write our render reusing the shader code. In this case we will be possibly GPU bound if we use Compute Shaders.

The goal is to animate a volume of particles pushed around from an object, starting from still state. So something different from the standard particles usage I suppose.

Edited by davide445
Link to post

The big challenge here would be to find a proper way to transfer 25M transforms of per frame to your custom compute shader.

Maybe the best way would be to use the set of 5000x5000px textures that would contain positions, colors, rotation, scale parameters that you can transfer via some fast memcpy (if the texture is in RAM) or use additiona GAPI functionality for inter-process textures uploading / downloading without involving much CPU.

  • Like 1

How to submit a good bug report
---
FTP server for test scenes and user uploads:

Link to post

@silent discussing a bit more 5 Mil particles might be the one we need effectivelly to render (but this is a very rough estimate), so we will need to pass trough these amount of data (if not more).

Thanks for the hint, will check with the engineers.

Link to post

@ulf.schroeterwe have evaluated that multi-GPU method, seems not to scale well in term of speed, scales well with the N of particles.

Link to post