Unigine stops operating on Virtualized graphic card with DirectX


photo

Recommended Posts

We are testing now a virtualized system with a Nvidia A40 (equivalent to the RTX3090) card.

When passing all graphic card resources to one Virtual machine Unigine works fine with DirectX.

When splitting all graphic card resources on more then one virtual machine Unigine crashes with an Error message "Can't create Texture".

The log output states that somehow the DirectX device gets lost.

We encounter no problems with OpenGL.

This happens when we are creating much Texture Objects at once at the start of the application without calling a frame swap. The Video RAM size is 24 GB. 

Have you any idea what can cause this behavior?

Thanks,

Sebastian

Link to post

Hi Sebastian,

Hard to tell. Maybe the resulting VRAM that driver allows to use is on a PC after splitting is not enough (somehow) for DX11.

Try to launch debug build with -video_debug 2 to get some additional debug messages from driver. Also you can try to see when this error will stop to appear (try to reduce texturs amount / resolution to see if that's really changes anything).

Thanks!

How to submit a good bug report
---
FTP server for test scenes and user uploads:

Link to post

Thanks for the hints, I am going to try to deploy a debug build on the system to get additional information.

After splitting, the system has 24 GB VRAM for the virtual machine, Unigine is displaying it correctly. We are only using about 2 GB for Textures.

When skipping the large scenery file, it is starting.

I will try to shrink texture resolutions and check if anything changes.

I have added the last log output as attachment.

2106021251_log.txt

Link to post

I also can see that there are two identical GPUs detected:

GPU 0 Active: NVIDIA A40-24Q 23935 MB
GPU 1	    : NVIDIA A40-24Q 23935 MB

Maybe trying to switch to the second one (-video_adapter 1) will also change something?

How to submit a good bug report
---
FTP server for test scenes and user uploads:

Link to post

That is one physical GPU splitted into two. I will test it when I get hands on the system again, but I think this won't work because this video adapter is connected to the other VM.

Link to post

Hello silent,

 

good and bad news, we managed to get things work.

It happens when we are baking our Lighttexture for one Spotlight. The Light Texture is generated with over 450 single Spotlights by rendering to a 4096x4096 texture.

When the number of spotlights exceeds 450 the application crashes.

We added Lights piece by piece to check were the limit is.

When reducing the resolution to 2048 x2048 pixels it will work out of the box, but we are loosing resolution.

I guess the driver has a timeout when it kicks the direct3d device and stops working.

Do you know any settings in windows registry or nvidia driver setting to prevent the driver from killing the direct3d device when rendering takes a long time?

 

Thanks

Sebastian

Link to post

You can try to increase TDR delay in registry. Here is the instructions:

However, if you can reproduce it on a regular GPU on a simplified scene - it would be interesting to see what is actually happens here inside the engine itself.

Maybe there is a way to improve this behavior somehow (especially keeping in mind that OpenGL is working just fine).

Thanks!

How to submit a good bug report
---
FTP server for test scenes and user uploads:

Link to post

Hello silent,

 

I set the TDR delay to 1 minute, but the problem is still there.

I also disabled it but this will freeze the operating system.

I try to reproduce this with a simplified scene, without any result yet.

 

To workarround this problem now: Is it possible to render to a texture without clearing it? So I can render the texture in multiple passes.

Thanks

Link to post
Posted (edited)

We were able to reproduce this on the virtual machine with a minimal sample.

I only had one start on my workstation PC were the device has been lost, but it looks like it has something todo with the light settings we make.

Edited by sebastian.vesenmayer
Link to post

Hello silent,

 

we changed the attenuation distance for the LightProj Object when we create it.

It works for the value 125.f, but still breaks when we double it to 250.f.

We first had a default value of 1000000.f in our configuration.

So I guess there is some calculation in the shader which kills the driver.

 

I could not reproduce this when using a not shared graphics adapter.

I don't know if you have a setup to test Graphics adapter sharing on a virtual machine, but I can provide the example.

 

Thanks,

Sebastian

Link to post