ulf.schroeter Posted March 16, 2011 Share Posted March 16, 2011 Problem While debugging EngineThreads threading bug we realized another issue with EngineThread job thread count. Currently EngineThreads allocates (Number of CPU - 1) job threads (e.g. 3 job thread for QuadCore-CPU). This can cause inefficent task dispatching in engine.render.renderWorld() due previously started parallel jobs for pyhsics and pathfinding runUpdate() from within Engine::do_render(). In case of QuadCore-CPU physics and pathfinding task both 'occupy' 1 thread each, leaving just 1 thread for parallel multi-threaded WorldOccluderTerrain 4-splits rendering. This causes assignment of 3 sequential WorldOccluderTerrain render split jobs to this single remaining job thread (1 job will be handled by main thread itself). When increasing the job thread count total performance of the test case increases from 160 FPS to 191 FPS due to now possible work distribution of 3 jobs to 3 threads. Therefore it seems to be more efficent to allocate larger job thread count than CPU-Cores, because at least in case of only short physics/pathfinding task jobs freed physical CPU cores can also be used for multi-threaded render/update/flush process. Possible Fix engine/EngineThreads.cpp EngineThreads::EngineThreads() : sound_lock(0), filesystem_lock(0), job_lock(0) { .... // create 2 x (Num CPU - 1) job threads to increase total throuhput num_jobs = 2 * (SystemInfo::getCPUCount() - 1); ... } Link to comment
steve3d Posted March 16, 2011 Share Posted March 16, 2011 agree and +1 All my scenes don't have much physics nor pathfinding things. I have a heavy scene of air-port, total objects count nearly 20k, if I run this scene in my duo-core machine, unigine only use 40% of cpu, and result 15fps, If I run this scene in quad-core, FPS increased to 25, but the CPU useage is still lower than 30%. create more thread than actual cores will normally gain better performance, Under linux, I use gentoo, I always set to parallel compile jobs to core*2 + 1, if I set the job number to core number, the compile time will always 40% slower. I think it is same with this thread thing. Link to comment
frustum Posted March 17, 2011 Share Posted March 17, 2011 Thanks for investigation, I will increase the number of threads by CPUCount * 2. Link to comment
frustum Posted March 17, 2011 Share Posted March 17, 2011 I think this code is also required under win32: void EngineJobThread::process() { #ifdef _WIN32 SetThreadIdealProcessor(thread,(num + 1) % SystemInfo::getCPUCount()); #endif } Link to comment
ulf.schroeter Posted March 17, 2011 Author Share Posted March 17, 2011 I think this code is also required under win32: void EngineJobThread::process() { #ifdef _WIN32 SetThreadIdealProcessor(thread,(num + 1) % SystemInfo::getCPUCount()); #endif } yep. Could you verify EngineThreads threading bug report ? Link to comment
frustum Posted March 17, 2011 Share Posted March 17, 2011 yep. Could you verify EngineThreads threading bug report ? I will check is issue a little later. I have a crunch time because of additional platforms support. Link to comment
steve3d Posted March 18, 2011 Share Posted March 18, 2011 well, seems increase threads number still make unigine use 20% CPU or lower on a quad-core computer. As I said, the scene is heavy, about 20k objects in it, unigine use 70% CPU on a dual-core computer, but only 20% or below on quad-core. No pathfinding and physics at all. Link to comment
ulf.schroeter Posted March 18, 2011 Author Share Posted March 18, 2011 As I said, the scene is heavy, about 20k objects in it, unigine use 70% CPU on a dual-core computer, but only 20% or below on quad-core. Maybe you are GPU-bound or bottleneck is not multi-threaded update. Link to comment
ulf.schroeter Posted March 19, 2011 Author Share Posted March 19, 2011 As I said, the scene is heavy, about 20k objects in it, unigine use 70% CPU on a dual-core computer, but only 20% or below on quad-core. @steve3d: due to your high object count single-threaded node flushing might also be a bottleneck (see screenshot of comparable scene with high node count). Maybe it would boost render performance if you bake spatially close static objects into a much smaller count of parent WorldCluster objects to reduce spatial tree update load. Link to comment
steve3d Posted March 19, 2011 Share Posted March 19, 2011 yes, I've used WorldCluster, if I don't use WorldCluster, the framerate is only 5-7FPS, and after use WorldCluster, framerate increaed to 25-28fps. Link to comment
Recommended Posts