Jump to content

EngineThreads Job Thread Count


photo

Recommended Posts

Problem

 

While debugging EngineThreads threading bug we realized another issue with EngineThread job thread count.

 

Currently EngineThreads allocates (Number of CPU - 1) job threads (e.g. 3 job thread for QuadCore-CPU). This can cause inefficent task dispatching in engine.render.renderWorld() due previously started parallel jobs for pyhsics and pathfinding runUpdate() from within Engine::do_render().

 

In case of QuadCore-CPU physics and pathfinding task both 'occupy' 1 thread each, leaving just 1 thread for parallel multi-threaded WorldOccluderTerrain 4-splits rendering. This causes assignment of 3 sequential WorldOccluderTerrain render split jobs to this single remaining job thread (1 job will be handled by main thread itself).

 

When increasing the job thread count total performance of the test case increases from 160 FPS to 191 FPS due to now possible work distribution of 3 jobs to 3 threads. Therefore it seems to be more efficent to allocate larger job thread count than CPU-Cores, because at least in case of only short physics/pathfinding task jobs freed physical CPU cores can also be used for multi-threaded render/update/flush process.

 

post-82-0-37636100-1300288116_thumb.jpgpost-82-0-00420500-1300288274_thumb.jpg

 

 

Possible Fix

 

engine/EngineThreads.cpp

EngineThreads::EngineThreads() : sound_lock(0), filesystem_lock(0), job_lock(0) {
....
// create 2 x (Num CPU - 1) job threads to increase total throuhput
num_jobs = 2 * (SystemInfo::getCPUCount() - 1);
...
}

Link to comment

agree and +1

 

All my scenes don't have much physics nor pathfinding things. I have a heavy scene of air-port, total objects count nearly 20k, if I run this scene in my duo-core machine, unigine only use 40% of cpu, and result 15fps, If I run this scene in quad-core, FPS increased to 25, but the CPU useage is still lower than 30%.

 

create more thread than actual cores will normally gain better performance, Under linux, I use gentoo, I always set to parallel compile jobs to core*2 + 1, if I set the job number to core number, the compile time will always 40% slower. I think it is same with this thread thing.

Link to comment

I think this code is also required under win32:

void EngineJobThread::process() {
#ifdef _WIN32
 SetThreadIdealProcessor(thread,(num + 1) % SystemInfo::getCPUCount());
#endif
}

Link to comment

well, seems increase threads number still make unigine use 20% CPU or lower on a quad-core computer.

 

As I said, the scene is heavy, about 20k objects in it, unigine use 70% CPU on a dual-core computer, but only 20% or below on quad-core.

 

No pathfinding and physics at all.

Link to comment

As I said, the scene is heavy, about 20k objects in it, unigine use 70% CPU on a dual-core computer, but only 20% or below on quad-core.

 

Maybe you are GPU-bound or bottleneck is not multi-threaded update.

Link to comment

As I said, the scene is heavy, about 20k objects in it, unigine use 70% CPU on a dual-core computer, but only 20% or below on quad-core.

 

@steve3d: due to your high object count single-threaded node flushing might also be a bottleneck (see screenshot of comparable scene with high node count). Maybe it would boost render performance if you bake spatially close static objects into a much smaller count of parent WorldCluster objects to reduce spatial tree update load.

Link to comment
×
×
  • Create New...