sebastian.vesenmayer Posted June 4, 2021 Share Posted June 4, 2021 We are testing now a virtualized system with a Nvidia A40 (equivalent to the RTX3090) card. When passing all graphic card resources to one Virtual machine Unigine works fine with DirectX. When splitting all graphic card resources on more then one virtual machine Unigine crashes with an Error message "Can't create Texture". The log output states that somehow the DirectX device gets lost. We encounter no problems with OpenGL. This happens when we are creating much Texture Objects at once at the start of the application without calling a frame swap. The Video RAM size is 24 GB. Have you any idea what can cause this behavior? Thanks, Sebastian Link to comment
silent Posted June 4, 2021 Share Posted June 4, 2021 Hi Sebastian, Hard to tell. Maybe the resulting VRAM that driver allows to use is on a PC after splitting is not enough (somehow) for DX11. Try to launch debug build with -video_debug 2 to get some additional debug messages from driver. Also you can try to see when this error will stop to appear (try to reduce texturs amount / resolution to see if that's really changes anything). Thanks! How to submit a good bug report --- FTP server for test scenes and user uploads: ftp://files.unigine.com user: upload password: 6xYkd6vLYWjpW6SN Link to comment
sebastian.vesenmayer Posted June 7, 2021 Author Share Posted June 7, 2021 Thanks for the hints, I am going to try to deploy a debug build on the system to get additional information. After splitting, the system has 24 GB VRAM for the virtual machine, Unigine is displaying it correctly. We are only using about 2 GB for Textures. When skipping the large scenery file, it is starting. I will try to shrink texture resolutions and check if anything changes. I have added the last log output as attachment. 2106021251_log.txt Link to comment
silent Posted June 7, 2021 Share Posted June 7, 2021 I also can see that there are two identical GPUs detected: GPU 0 Active: NVIDIA A40-24Q 23935 MB GPU 1 : NVIDIA A40-24Q 23935 MB Maybe trying to switch to the second one (-video_adapter 1) will also change something? How to submit a good bug report --- FTP server for test scenes and user uploads: ftp://files.unigine.com user: upload password: 6xYkd6vLYWjpW6SN Link to comment
sebastian.vesenmayer Posted June 7, 2021 Author Share Posted June 7, 2021 That is one physical GPU splitted into two. I will test it when I get hands on the system again, but I think this won't work because this video adapter is connected to the other VM. Link to comment
sebastian.vesenmayer Posted June 10, 2021 Author Share Posted June 10, 2021 Hello silent, good and bad news, we managed to get things work. It happens when we are baking our Lighttexture for one Spotlight. The Light Texture is generated with over 450 single Spotlights by rendering to a 4096x4096 texture. When the number of spotlights exceeds 450 the application crashes. We added Lights piece by piece to check were the limit is. When reducing the resolution to 2048 x2048 pixels it will work out of the box, but we are loosing resolution. I guess the driver has a timeout when it kicks the direct3d device and stops working. Do you know any settings in windows registry or nvidia driver setting to prevent the driver from killing the direct3d device when rendering takes a long time? Thanks Sebastian Link to comment
silent Posted June 10, 2021 Share Posted June 10, 2021 You can try to increase TDR delay in registry. Here is the instructions: https://developer.unigine.com/forum/topic/7154-solved-ig-template-error-plz/?do=findComment&comment=35935 However, if you can reproduce it on a regular GPU on a simplified scene - it would be interesting to see what is actually happens here inside the engine itself. Maybe there is a way to improve this behavior somehow (especially keeping in mind that OpenGL is working just fine). Thanks! How to submit a good bug report --- FTP server for test scenes and user uploads: ftp://files.unigine.com user: upload password: 6xYkd6vLYWjpW6SN Link to comment
sebastian.vesenmayer Posted June 15, 2021 Author Share Posted June 15, 2021 Hello silent, I set the TDR delay to 1 minute, but the problem is still there. I also disabled it but this will freeze the operating system. I try to reproduce this with a simplified scene, without any result yet. To workarround this problem now: Is it possible to render to a texture without clearing it? So I can render the texture in multiple passes. Thanks Link to comment
sebastian.vesenmayer Posted June 15, 2021 Author Share Posted June 15, 2021 (edited) We were able to reproduce this on the virtual machine with a minimal sample. I only had one start on my workstation PC were the device has been lost, but it looks like it has something todo with the light settings we make. Edited June 15, 2021 by sebastian.vesenmayer Link to comment
sebastian.vesenmayer Posted June 15, 2021 Author Share Posted June 15, 2021 Hello silent, we changed the attenuation distance for the LightProj Object when we create it. It works for the value 125.f, but still breaks when we double it to 250.f. We first had a default value of 1000000.f in our configuration. So I guess there is some calculation in the shader which kills the driver. I could not reproduce this when using a not shared graphics adapter. I don't know if you have a setup to test Graphics adapter sharing on a virtual machine, but I can provide the example. Thanks, Sebastian Link to comment
silent Posted June 16, 2021 Share Posted June 16, 2021 Sebastian, An example would be great to have. It's obviosly some driver issue, but we need to see what is really going on (and report back to nVidia team). Thanks! How to submit a good bug report --- FTP server for test scenes and user uploads: ftp://files.unigine.com user: upload password: 6xYkd6vLYWjpW6SN Link to comment
sebastian.vesenmayer Posted June 22, 2021 Author Share Posted June 22, 2021 Hello silent, this is the promised reproducer for fixed shared graphics cards on a virtual machine. reproducer_4K_texture_mapping.zip 1 Link to comment
silent Posted June 28, 2021 Share Posted June 28, 2021 Can you tell us more about server setup so we can try to build a similar test environment? We found some ancient GRID GPUs in our office, maybe it would be enough for reproduction :) Could you please specify: OS on host PC (Windows Server 2019 or Linux)? Which VM software are you using on a server and clients? Thanks! How to submit a good bug report --- FTP server for test scenes and user uploads: ftp://files.unigine.com user: upload password: 6xYkd6vLYWjpW6SN Link to comment
sebastian.vesenmayer Posted June 28, 2021 Author Share Posted June 28, 2021 (edited) Hi silent, the host operating system is Dell VxRail based on vSphere ESXi from VMWare. Guest OS have been Windows 10 operating systems. Edited June 28, 2021 by sebastian.vesenmayer 1 Link to comment
Recommended Posts