Jump to content

Тормоза при большом количестве сложных объектов с ЛОДами


photo

Recommended Posts

Добрый день!

 

Имеется сложная детализованная модель вагона, состоящая из большого количества отдельных объектов (530 штук).

Для крупных элементов этого вагона сделаны ЛОД1 (это 30 объектов).

Суммарный объем геометрии детализованной модели - более миллиона полигонов.

Суммарный объем геометрии ЛОДа - около 2.000 полигонов.

Расстояние смены ЛОДов - 5-12 метров. Мелкие элементы, не имеющие упрощенной геометрии, просто "выключаются".

Все элементы вагона загружены в виде Reference.

 

Когда вагон занимает практически весь кадр и на экране максимум детализованных объектов, количество RTriangles составляет около 700.000, и мы имеем 80 кадров в секунду.

Когда вагон далеко от камеры, количество RTriangles составляет около 2.000, и мы имеем 100 кадров в секунду.

 

Если создать в сцене 40 таких вагонов, то мы имеем катастрофическое падение производительности даже на больших расстояниях от вагонов, когда отображаются только незначительное количество упрощенных объектов (ЛОД1).

Т.е. количество RTriangles составляет около 80.000, а мы имеем всего 15-20 fps.

 

Вопрос 1: что является причиной такого падения производительности?

 

То, что у нас в сцене 21.200 объектов (40 вагонов, каждый из 530 деталей)?

Но на больших расстояниях на экране реально отображаются только 1.200 из них, остальные 20.000 скрыты.

 

Вопрос 2: есть ли у вас идеи, как решить эту проблему?

 

 

PS. Перевод вопроса на английский будет через день-два.

Link to comment

approximate translation:

 

Title: perfomance problems with huge amount complex objects with LOD.

 

Hi!

 

There is complex detail model consists of huge amount of separate objects (530 items)

LOD1's were done for large elements of the model (for 30 items)

Overall polygon quantity is over 1 000 000.

Polygon quantity of LOD is about 2 000.

LOD change distance is 5-12 meters. Tiny elements that don't have simplified geometry simply disable.

Every element of the model appears as Reference.

 

When the model occupies all window space and we have a maximum of detail (complex) objects RTriangles quantity is 700 000 and fps is 80.

When the model is placed far from camera RTriangles quantity is 2 000 and fps is 100.

 

Creating 40 instances of the model leads to disastrous perfomance recession

even when distance from models to camera is large and only simplified LOD objects are rendered.

 

i.e. RTriangles quantity is 80 000 but fps is only 15-20.

 

Question 1: what is reason of such perfomance recession?

 

May it be the fact that we have 21 200 objects in the scene (40 models composed of 530 elements)?

But only 1200 of them are rendered when camera is far (rest objects are hidden).

 

Question 2: do you have any ideas how to solve this problem?

Link to comment

approximate translation:

 

Thank you very much for your help ! While reading your translation it came to my mind to give GOOGLE translate a try. And I have to admit that the result is surprisingly good for a translation robot and just 5 seconds of effort. Maybe this could be a way for keeping english as primary posting language

 

Good day!

 

There is a complex detailed model of the car, consisting of a large number of individual objects (530 pieces).

For major elements of this car made LOD1 (it's 30 sites).

The total volume of the geometry detailed model - more than a million polygons.

The total volume of the geometry of Lod - approximately 2.000 polygons.

Distance shift LODov - 5-12 meters. Small items that are not simplified geometry, simply "turned off".

All items loaded wagon in the form of Reference.

 

When the car is virtually the entire frame and the screen with a maximum detalizovannye objects, RTriangles number is about 700.000, and we have 80 frames per second.

When the car away from the camera, the number RTriangles is about 2.000, and we have 100 frames per second.

 

If you create a stage 40 of those cars, we have a catastrophic drop in performance even at large distances from the cars when they are displayed only a small number of simplified objects (LOD1).

Ie RTriangles number is about 80.000, while we have only 15-20 fps.

 

Question 1: What is the cause of the decline in productivity?

 

The fact that we have 21,200 objects in the scene (40 cars, each of 530 parts)?

But at large distances on the screen actually displays only 1.200 of them, the remaining 20.000 hidden.

 

Question 2: do you have any ideas how to solve this problem?

 

 

PS. Translated into English the question will be a day or two.

Link to comment

We noticed the same problem while testing our trees.

A single complex tree made similar to the mentioned model had no serious performance impact.

But 20 trees killed the performance down to 4 times (from 80 to 20 FPS on the empty test scene with 8600GT and Core 2 Duo).

We decided to continue investigating this case after our level designer is back from vocation, so in a few days we'll be ready to provide a test scene.

Link to comment

Hello,

 

Alternatively you can optimize by setting a physics distance parameter(distance after which the physics will not be simulated).

Please, see the following function:

 

void engine.physics.setDistance(float distance)

Link to comment

Physics, light, shadow, etc. distances has nothing to do with this. We've tried spawning 200k trees and even models that are not in sight greatly affect performance, regardless of distance settings. That is the same issue as in `data\samples\stress\clutter_00.world` sample.

 

As a temporary solution we organized models to rectangular sectors; custom scene manager hides sector that are out of range and behind the camera. Performance increase is about 200-300% while rotation and moving could cause some issues with shadows, etc. and overall method is far from perfect.

post-26-094037600 1284022220_thumb.jpg

post-26-071791900 1284022255_thumb.jpg

Link to comment

Hello,

 

Alternatively you can optimize by setting a physics distance parameter(distance after which the physics will not be simulated).

Please, see the following function:

 

void engine.physics.setDistance(float distance)

 

 

В этой модели нет физики.

Это просто загруженная геометрия и текстуры.

 

In this model, there is no physics.

It's just loaded geometry and texture.

 

Вот как это выглядит.

 

Here's how it looks.

post-93-026496800 1284029822_thumb.jpg

post-93-039019400 1284029828_thumb.jpg

Link to comment

Алексей, Вы не могли бы сделать скрин из второй сцены (где много вагонов), но с такого ракурса, чтобы их не было в кадре? Если ФПС при этом останется низким, то это проблема менеджмента сцены, о которой я и писал выше, если же исправится, то это что-то другое, с ЛОДами и геометрией.

 

Alexey, could you please make a screenshot with cars located behind the camera? If FPS still low, even when no objects are displayed, that is the same scene management issue I mentioned below. Otherwise, this is something new related to geometry and LODing.

Link to comment

What type of GPU do you use?

 

Nvidia 260 GTX.

Core 2 Duo 3,13 GHz, 8 Gb RAM, WinXP 64-bit.

 

This is a GPU bottleneck regarding to to big present time.

Почему?

30 вагонов - около 16.000 отдельных мешей (из которых видно около 900, остальные скрыты) - 25.000 треугольников в кадре - 30 fps.

1 вагон - около 530 отдельных мешей (из которых видно около 300-400) - 700.000 треугольников в кадре - 80 fps.

 

Why?

30 wagons - about 16.000 individual meshes (which show about 900, the rest are hidden) - 25.000 triangles per frame - 30 fps.

1 wagon - about 530 separate meshes (of which can be seen around 300-400) - 700.000 triangles per frame - 80 fps.

 

Как вставить в свое сообщение цитаты с двух разных сообщений?

 

How to insert into your message quotes from two different messages?

Link to comment

Алексей, Вы не могли бы сделать скрин из второй сцены (где много вагонов), но с такого ракурса, чтобы их не было в кадре? Если ФПС при этом останется низким, то это проблема менеджмента сцены, о которой я и писал выше, если же исправится, то это что-то другое, с ЛОДами и геометрией.

 

Alexey, could you please make a screenshot with cars located behind the camera? If FPS still low, even when no objects are displayed, that is the same scene management issue I mentioned below. Otherwise, this is something new related to geometry and LODing.

 

Последовательные кадры: камера смотрит в небо; вид на 30 вагонов издалека; вид на 30 вагонов с близкого расстояния; вид на несколько крайних вагонов (остальные - вне поля зрения камеры).

 

Successive shots: the camera looks into the sky, kind of 30 cars from a distance, the form of 30 cars at close range; view of the extreme few cars (the others - out of sight of the camera).

post-93-047578700 1284036154_thumb.jpg

post-93-015565700 1284036161_thumb.jpg

post-93-070095800 1284036166_thumb.jpg

post-93-094439900 1284036171_thumb.jpg

Link to comment

Последовательные кадры: камера смотрит в небо; вид на 30 вагонов издалека; вид на 30 вагонов с близкого расстояния; вид на несколько крайних вагонов (остальные - вне поля зрения камеры).

 

Successive shots: the camera looks into the sky, kind of 30 cars from a distance, the form of 30 cars at close range; view of the extreme few cars (the others - out of sight of the camera).

 

К сожалению, это не тот случай, что я расписывал выше - скрин с камерой "в небо" показывает время Update и Render по нулям, в то время как при "перегрузке" менеджера сцены у нас вот такая картина:

 

Unfortunately, you case differs from the problem I described in prev. posts - as we can see on sky-looking screenshot, Update and Render times are almost 0, while in 'scene manager overflow' scenario we have following timings:

post-26-082677300 1284037989_thumb.jpg

Link to comment
  • 2 weeks later...

That really should be fixed - there is a lot of cases when we need A LOT of meshes.

Take a look at samples/stress/mesh_00.world - such scenario is very common, especially when level designer is hoping that instancing would help and avoids duplicating surfaces reusing the same meshes. I.e. huge truck could have about 12 wheels - why can't we use single model for all of them?

  • Like 1
Link to comment

That really should be fixed - there is a lot of cases when we need A LOT of meshes.

Take a look at samples/stress/mesh_00.world - such scenario is very common, especially when level designer is hoping that instancing would help and avoids duplicating surfaces reusing the same meshes. I.e. huge truck could have about 12 wheels - why can't we use single model for all of them?

There are way too much meshes in the test scene we've received, every car consists of 100+ unique meshes.

Link to comment

Denis,

I'm pretty sure that merging to single mesh is only a workaround for this particular model, but not for the whole problem. Just recall, "Syndicates of Arkon" guys used to create buildings from small blocks - windows, doors, walls, etc. and got the same issue. Your own stress/mesh_00.world scene with 3375 boxes and 20fps on GF9600 is also showing unacceptable behavior for year 2010 3d engine.

 

P.S. Of course, we'd better love universal solution instead of workarounds, but may be there is a way to group several meshes so they'll be treated as a big mesh with different surfaces?

  • Like 5
Link to comment

Your own stress/mesh_00.world scene with 3375 boxes and 20fps on GF9600 is also showing unacceptable behavior for year 2010 3d engine.

 

I am quite sure that you will kill performance with every 2010+ 3D engine (CryEngine, Unreal, UNIGINE, doesn't matter) when splitting your geometry into hundreds of fine-grained nodes/meshes, as this will always produce high CPU load and driver overhead.

  • Like 1
Link to comment

Okay, let's make some tests.

 

Here goes screenshot from data/samples/stress/mesh_00.cpp from Unigine SDK:

post-26-028398600 1285700195_thumb.jpg

 

As you can see, we have 20 fps here and 3375 static (instanced!!) meshes.

 

Now let's try to take a look what if we had perfect-working occlusion system (none of those exist even in 2010 but we can just imagine it). It hides all geometry that should not be displayed at this moment. Real solution requires a LOT to be done - octree/BSP tree for visible scene, may be precomputed occluders and portals, "shadow" based culling, etc., but we'll implement very simple ray-tracing occlusion algorithm for this test. In this scene we have ideally shaped objects (boxes) and those are axis-aligned, so it is enough just to test two rays: from lower and upper object boundary points to camera position. If one of rays passes, we should display the object, otherwise it should be hidden.

 

vec3 pos = node.getPosition();
float radius = node.getWorldBoundRadius();

vec3 points[0];
points.append(pos + vec3(radius, radius, radius));
points.append(pos + vec3(-radius, -radius, -radius));

int visible = false;
for(int i=0;i<points.size();i++)
{
	Node result = engine.editor.getIntersection(points[i], playerPosition, self, ret);

	if (result == NULL)
	{
		visible = true;
		break;
	}
}

 

Above code is applied to each mesh when camera position changes. As a result, we have 140 fps in still frame and about 30-50 fps while moving:

post-26-001873200 1285700742_thumb.jpg

 

I'm attaching sample code to this post for you to test. That is very rought code and results may vary on different hardware, but idea stills the same - such scenes could be and should be optimized by the engine. A lot of algorithms exist to cull invisible geometry and in year 2010 we are able to display 100000+ trees as separate models in Unreal, thousands of plants and boulders in CryEngine and so on. Even NVidia instancing demo shows several thousands of dwarf models with awesome FPS.

mesh_00.cpp

  • Like 1
Link to comment

I'm attaching sample code to this post for you to test. That is very rought code and results may vary on different hardware, but idea stills the same - such scenes could be and should be optimized by the engine. A lot of algorithms exist to cull invisible geometry and in year 2010 we are able to display 100000+ trees as separate models in Unreal, thousands of plants and boulders in CryEngine and so on. Even NVidia instancing demo shows several thousands of dwarf models with awesome FPS.

 

Nice sample ! Have you tried UNIGINE WorldOccluder box object placed inside the cube instead of your custom ray casting approach ?

 

post-82-031926200 1285743462_thumb.jpg

 

With regard to your mentioned vegetation rendering in Unreal/CryEngine, please keep in mind that these are special objects (e.g. like UNIGINE ObjectGrass) with higly optimized - but also specialized - rendering code. Also the NVidia instancing demo is very specialized for demo prupose. This is not so easy with general mesh objects and general scenes. In these cases you have to use things like occlusion queries, occluder geometry or portals.

 

Alse keep in mind that your ray cating is based on your special knowledge about cube occlusion in this test case.

Link to comment

That is strange, cause putting box occluder does almost nothing on my hardware:

post-26-024950500 1285746244_thumb.jpg

 

Also I see on your screenshot that you have 1331 meshes, that is ~3 times less than in original sample and it can possibly be enought to fight some kind of bottleneck on your system.

About special knowledge and ray casting. There are LOTS of occlusion culling methods exist for any kind of geometry, and some of them are really fast and could do the job. I'm pretty sure the method used in engine is also very good, but it is unable to handle such scenes - a lot of different objects in the view (regardless of instancing).

 

P.S. There is some kind of occlusion queries with IDirect3DQuery9 and its analogs on other APIs in Unigine. It could be turned on by setting "Query" flag on each object, but in this particular scene it makes even worse.

  • Like 1
Link to comment

One of my crew created test scene in Unreal UDK engine with same geometry - large box 15x15x15 composed of small mesh boxes.

He have relatively weak hardware - ATI Radeon 4550 with DDR2, so with Unreal lighting it is not lighting fast, but still have ~40 fps:

post-26-094188400 1285752869_thumb.jpg

 

When physics is involved FPS drops down dramatically:

post-26-029913600 1285752984_thumb.jpg

 

But, hell, it is really able to work even with 10000 meshes with acceptable rate:

post-26-096971100 1285753971_thumb.jpg

 

Wireframe image for proof:

post-26-024028800 1285754024_thumb.jpg

Link to comment
×
×
  • Create New...