Jump to content

WorldOccluderTerrain Threading Bug


photo

Recommended Posts

Problem 1

 

There seems to be a threading issue in WorldOccluderTerrain in case of multi-threaded render_occlusion execution. Geometry occlusion fails for very short times (maybe 1,2 frames) in irregular intervals even when camera is absolutely static. Full geometry is visible as short flash/flickering (see screenshot sequence with 3 correct frames an one intermediate false unoccluded frame). Tested with WinXP with attached test case.

 

post-82-0-11345100-1291244613_thumb.jpg

 

When forcing single-threaded render_occlusion execution by setting num_threads=1 in renderOcclusion() geometry is always corectly occluded.

 

 

Problem 2

 

There seems to be another side-effect of WorldOccluderTerrain multi-threaded jobs with multi-threaded physics update. In debug builds we get very quickly an assert on line 738 in physics\Physics.cpp due to num_threads = engine.threads->getNumJobs() = 0/1

 

void Physics::simulation_multiple(float ifps) {
....
// number of threads
int num_threads = min(physics_num_threads,min(engine.threads->getNumJobs(),PHYSICS_NUM_THREADS));
assert(num_threads > 1 && "Physics::simulation_multiple(): bad number of threads");

 

As engine.threads->getNumJobs() > 1 has been check before entering simulation_multiple

 

void Physics::update() {
....
if(physics_num_threads < 2 || engine.threads->getNumJobs() < 2) simulation_single(ifps);
else simulation_multiple(ifps);
....

 

number of jobs must have been decremented in the meantime (best guess: maybe due to finished WorldOccluderTerrain jobs). In case of disabled WorldOccluderTerrain node, no physics assert happens.

WorldOccluderTerrain.zip

Link to comment

yep, patch fixes physics assert (Problem 2), but does not fix sporadic failure of occlusion (Problem 1). Irregular "flashing" of full geometry still visible in test case.

Link to comment

Maybe there are 2 more issues for WorldOccluderTerrain

 

Problem 3

 

WorldOccluderTerrain rendering to occluder buffer does not seems to render pixels on right border. See screenshot for visible seam in occluder debug window. Maybe this is the cause for the remaining unoccluded geometry.

 

post-82-0-62108500-1291457697_thumb.jpg

 

Also occluder rendering seems to render top-side occluder geometry, but "front-side" occluder geometry (marked by red outline) seems to be missing.

 

post-82-0-11711900-1291471555_thumb.jpg

 

 

Problem 4

 

Not sure, but there might be a problem for newly introduced WorldManager image un-/re-loading. Cone step data is calculated on WorldOccluderTerrain initialization based on input heightmap image in setHeightsImageName(). Original heightfield image data gets modified to include cone step data in g/b components. Compared to previous release there seems to be no longer a replacement of the input heightfield image with cone step modified image on disk (at least editor no longer ask for file overwrite on new input heightfield image assignment

 

If WorldOccluderTerrain is not visible for some time or too far away WorldManager will clear modified occluder image data. When WorldOccluderTerrain gets "re-activated" original heightfield image data will be automatically reloaded, but most probably no re-calculation of required cone step data will happen.

 

Possible Fix 4

 

At the moment resource unloading/reloading is more or less transparent to the resource "owning" node. Therefore node has no chance to do required re-initialization on resource reload. Brings me back to our discussion about node-level based approach

 

Dynamic Node Load/Unload

 

In case of WorldOccluderTerrain proposed Node::load() callback could be used for re-calculation of cone step image data on re-activation.

 

Dynamic node load/unload could maybe also be used to allocate/free other large CPU-side resources (e.g. ObjectMesh vertex/surface data, ObjectTerrain height data). Keeping just the abolute minimum of node data in memory for far away nodes should allow for much larger/more complex worlds even without full nodefile streaming implementation. Central WorldManager coordinated node load/unload could also limit per frame load/unload operations to avoid render stalls.

Link to comment

Problem 4 follow-up

 

Having a look into data/editor/editor_worlds.h reveals cause for missing editor dialogMessage() for replacement of input image heightfield.

 

void heights_image_pressed() {
...
line 1437 if(image.getFormat() != format) {
              if(dialogMessage("Confirm",::format("Update \"%s\" heights image?",name))) {
	   if(image.save(name) == 0) {
....

 

Updated RGB8 (previously RG8) occluder cone step image only gets saved in case of a format change compared to input heightfield image.

In case of a grayscale RGB8 input heightfield (as in the test case) modified occluder cone step image will NOT be saved. This will lead to previously described image reload problems.

 

Quick-Fix Problem 4

Modified occluder cone step image should be always saved after new input heightfield image assignment. This will solve image reload problem. Dynamic node-level load/unload should nevertheless be considered for more sophisticated large-world management.

Link to comment
  • 1 month later...

yep, patch fixes physics assert (Problem 2), but does not fix sporadic failure of occlusion (Problem 1). Irregular "flashing" of full geometry still visible in test case.

 

 

Ulf, please recheck the problem 1. I have no flashing of the full geometry.

Link to comment

Ulf, please recheck the problem 1. I have no flashing of the full geometry.

 

Hi Serega,

 

problem 1 still exists in latest release (tested under WindowsXP with DX9 with attached test case). The problem shows up more frequently when visualizers are visible (blue voxels for occluder cells).

 

There seems to be some kind of threading issue, as engine sometimes also crashes with message Unigine fatal error: EngineThreads::rumJobs() can't run job thread. Also the sporadic appearance of the problem might be an indicator for a threading related problem.

 

post-82-0-87540000-1296030170_thumb.jpg

WorldOccluderTerrain.zip

Link to comment
  • 1 month later...

There seems to be some kind of threading issue, as engine sometimes also crashes with message Unigine fatal error: EngineThreads::rumJobs() can't run job thread. Also the sporadic appearance of the problem might be an indicator for a threading related problem.

 

 

Problem Cause

 

After some nightly debugging: Calling EngineThreads::runJobs() with number of jobs = 0 can cause false EngineThreads::waitJobs() behaviour. This is caused by returning task ID even in case of job num = 0 for (unnecessary) later waitJobs() call. As no task job/thread are assigned to the returned task instance it is possible that same task instance might also be returned on follow-up call to runJob() to other thread(s).

 

int EngineThreads::runJobs(void *jobs,int stride,int num) {

AtomicLock atomic(&job_lock);

// find free task
int id = -1;
EngineTask *task = NULL;
for(int i = 0; i < tasks.size(); i++) {
	if(tasks[i]->jobs.size()) continue;      // this indicator for 'free' task instance might fail  
	if(tasks[i]->threads.size()) continue;
	task = tasks[i];
	id = i;
	break;
}

 

 

Fix

 

In case of num=0 always return ID=-1 and check for empty task condition in waitThread()

 

engine/EngineThread.cpp

int EngineThreads::runJobs(void *jobs,int stride,int num) {

// empty task
if(num == 0) return -1;

AtomicLock atomic(&job_lock);

// find free task
.....
}

void EngineThreads::waitJobs(int id) {

// empty task
if(id == -1) return;

// wait job
....
}

Link to comment

Thanks for the report. I have slightly refactored EngineThreads class. The bug should disappear. There is no runJobs() function anymore. runSync() and runAsync()/waitAsync() functions instead of runJobs().

Link to comment
×
×
  • Create New...