boreas419 Posted February 15, 2022 Posted February 15, 2022 Hi, we use the AppWall plugin to run Unigine with multiple screens. Using the Microprofiler, I noticed there seems to be some overhead in Engine::doSwap on the CPU that I cannot explain. This overhead is not present when using AppWall with only one screen. It is only present when using AppWall with more than one screen. I made two screenshots of the Microprofiler while running Unigine 2.13.0.1 (VSync is off). I encircled Engine::doSwap in both screenshots. AppWall with one screen: AppWall with two screens: If I use AppWall with only one screen, then Engine::doSwap only takes about 0.2 ms per frame, which seems normal. If I use AppWall with two screens, then Engine::doSwap takes a long time (about 23 ms per frame). During the first 16 ms Engine::doSwap seems to be waiting for the GPU to finish RenderRenderer::renderWorld, which is normal. But after RenderRenderer::renderWorld is finished, Engine::doSwap is still waiting for an extra 7 ms, while the GPU seems to be doing nothing. What is happening during those extra 7 ms? What is the CPU waiting for? Is something happening on the GPU that is not shown in the Microprofiler output? Those extra 7 ms happen every frame, so this is a big performance issue for us. We would really like to know what is happening during those extra 7 ms? And can we optimize this in some way to increase the framerate when using AppWall with two screens? I also tested this with an empty Unigine project that was created using the Unigine SDK Browser. This can be reproduced as follows: In the Unigine SDK Browser create an empty new project for Unigine 2 Sim (2.13.0.1) with the Monitor wall (AppWall) plugin. Run the new project with Microprofile Enabled and add arguments: -extern_plugin AppWall -width 2 -height 1 Open the Microprofiler in a browser. With the empty Unigine project, the same problem occurs, only less obvious: With 2 screens Engine::doSwap takes about 1 ms extra per frame. With 5 screens Engine::doSwap takes about 3 ms extra per frame.
silent Posted February 15, 2022 Posted February 15, 2022 boreas419 With AppWall enabled for each additional screen engine needs to recreate the whole scene on CPU from scratch (and, of course send this data to GPU to render). If your scene normally with only single screen takes 5ms on CPU, enabling AppWall (with just 2 viewports) would increase CPU time to 10ms. In worst case scenario with heavily CPU-bound scenes you can get even bigger numbers. Adding more viewports to render will increase these difference even further. Also, since you are rendering more on CPU, GPU also needs to render more, so in some cases when you are heavy on CPU and hitting the GPU limits you can expect even bigger framerate difference and additional performance loss. Deferred rendering is also applying far more pressure on a PCI-E memory bandwidth (compared to previous mostly forward rendering from UNIGINE 1) and if you are trying to render a lot of viewports at the same time with full resolution you will probably get a slowdown at some point since all the data can't be transferred at once. So overall recommendations to reduce the CPU load: Reduce number of simultaneous outputs from the single PC / GPU to 1-2; Reduce the CPU scene complexity (number of objects and shadows); Reduce GPU complexity (disable some post effects or decrease their resolution to reduce GPU memory transfer); Use latest-gen CPUs with PCI-E 5 speeds. Unfortunately, there is no way at this moment to eliminate this performance gap. Changing the GPU to more powerful would probably not give you any results, since the AppWall applications are mostly CPU-bound, so only upgrading CPU and reducing CPU scene complexity can give noticeable results. Thanks! How to submit a good bug report --- FTP server for test scenes and user uploads: ftp://files.unigine.com user: upload password: 6xYkd6vLYWjpW6SN
boreas419 Posted February 22, 2022 Author Posted February 22, 2022 Thank you for your elaborate answer and recommendations. I understand that adding more viewports will always increase the frametime. However, I still don't understand why the Microprofiler shows a gap in GPU usage in our case. I would like to know what is happening during that gap. I encircled this gap in this Microprofiler screenshot: In that gap it seems that the GPU is doing nothing, for 7 ms. Maybe something is happening there on the GPU, but the Microprofiler does not show it? Maybe Unigine is transferring data from CPU memory to GPU memory there, and the GPU is waiting for that to finish? If that is the case, then I think it would be nice if the Microprofiler would show that in the GPU usage, instead of showing a gap there.
silent Posted February 24, 2022 Posted February 24, 2022 boreas419 Currently only default main window is added to the microprofile (D3D11RenderContext.cpp) in renderWindow() / swapWindow(). AppWall windows functions are not exported to the microprofile, so that's why there is a blank spot in graph. Currently we are in the middle of window manager refactoring and with 2.16+ there will be much easier to expose performance counters to external plugins and we would be able to finally see the whole picture. Thanks! How to submit a good bug report --- FTP server for test scenes and user uploads: ftp://files.unigine.com user: upload password: 6xYkd6vLYWjpW6SN
Recommended Posts