poor performance

foxbat

22-09-2006 11:02:21

Hi.
I'm having some problems with poor engine performance. I'm experiencing a major fps drop after adding only a few simple box geometries with bodies.

After placing 6 boxes on some terrain, my frame rate goes from 160 down to 30 fps. I'm using the quickstepper (I can't find big matrix stepping in OgreODE) which is supposed to be the fastest type.

I tried replacing the terrain with just a large box, and the performance drop isn't as major, but still pretty severe. Frame rate goes from 160 to 90 fps.

From the comparison between terrain and non-terrain collisions, it looks like terrain collision is very expensive performance wise.

Does anyone have any ideas, or is this a problem for the ODE mailing list?

syedhs

22-09-2006 11:08:13

I have about 6 boxes (crates which I stole from OgreOde's demo :lol: ) and performance drop is little maybe 1-2 FPS.

If I were you, I will check the followings:
1) Is box collision as expected? Example include floating boxes.
2) Turn on physics debugging mode and see if the geometry's size is as intended.
3) Consider using "space" to reduce the number of collision & physics computation.
4) Disable the far away resting geometries. Reenable them if the distance from you is small.

tuan kuranes

22-09-2006 13:11:46

5) make sure map is not too small, otherwise it ends in many, many heightfield collision

Indie

22-09-2006 15:47:32

Hm, the performance part is really bothering me, where there any test made with OgreODE?

:roll:

syedhs

22-09-2006 16:08:34

6) Combine all static collision into one single trimesh.

I think the performance is okay.. well maybe not to my expectation but then I am still new in physics.

foxbat

23-09-2006 07:20:21


1) Is box collision as expected? Example include floating boxes.


Box collision works as expected. All the boxes are sitting stationary on the ground.

2) Turn on physics debugging mode and see if the geometry's size is as intended.

Yes, they're at the intended size.

3) Consider using "space" to reduce the number of collision & physics computation.

Are you suggesting that a quadtree space rather than hash space should be used?

5) make sure map is not too small, otherwise it ends in many, many heightfield collision
The map is pretty large. The largest box intersects a maximum of 4 or 5 terrain faces, with some of the smaller boxes easily fitting within a single face.

syedhs

23-09-2006 07:40:44

I am not suggesting of type of space, but adding new space. Currently there is only one default space which you can get by calling world->getDefaultSpace();

Having more than one space will greatly increase the performance. Imagine that within one scene you have a city and a jungle where you can test your driving skills (offroading). The city itself has 10 boxes, and jungle has another 15 geometries. For this case, maybe you can create 2 spaces - one for city and another for jungle. So until there is a collision with one of the two spaces, no further collision test is done on the boxes or geometries. So lots of calculations are saved there.

tuan kuranes

23-09-2006 13:15:47

You can even do it by yourself, using ogre specialized query (if using octreescenemanager) and then call collide() only when needed.
That way you can save even more collision. Make sure only moving - static collision is done, and no static-static.
(that's what ogre bspcollision does, I think)

BEST before ANYTHING it touse a CPU profiler to see exaclty where the speed bottleneck is, and spend time on optimizing the slow part only.

Check ogre wiki page about assembling a toolset, there's a list of profiler.
(and btw, make sure you did benchmark in release mode without ogreode debug lines)

foxbat

24-09-2006 08:13:31

Each of the boxes are sitting right beside each other, so I don't see how adding extra spaces would work.

At some point in my game, I'm going to need to call collide on at least 6 objects per frame, so using a specialized query isn't going to help much either.

I did a test with returning false from the collision callback (so no contact joints are generated) and the performance drop doesn't change. This suggests that the problem is with the collision detection rather than the physics.

I'm not doing anything special with my code, so I'm really at a loss as to what to profile within OgreOde, or Ode itself.

luis

24-09-2006 11:43:08

I had a similar problem, it was because of the collision detection, exactly the what Tuan said:

5) make sure map is not too small, otherwise it ends in many, many heightfield collision

I was using very high resolution in the terrain mesh, about: 0.33 world units per pixel -> (PageWorldX/PageSize).

On my first tests i had one car on the ground, and in the first 5 seconds i have 30fps, after 20 seconds the FPS starts to drop untill ~5-6 FPS, i found out that it was because any:
*QuickStepper function divides (making substractions) the 'elapsed time' with the given _step_size, so when the *QuickStepper::step function takes even more time than the time given in the call, the problem get worse and worse :(

In the function:

bool QuickStepper::basicStep(const Real time)
{
if ((!_listener)||(_listener->preStep(time)))
{
_world->getDefaultSpace()->collide();
_world->quickStep(time);
_world->clearContacts();

return true;
}
return false;
}


My bottleneck was in this call:
_world->getDefaultSpace()->collide();

My advice: To know if your problem is a CPU bottleneck with the collision detection use a smaller heightmap image (for example a 65x65pixels) and see what happens.

My PC (P4 2,4Ghz) can handle 25 boxes and 8 cars perfectly well (around 2ms per step) with a resolution of ~2.6 world units per pixel and default timestep (=0.01).

foxbat

02-10-2006 09:55:19

I did another test and found that when testing a single sphere geometry against the terrain, ODE makes, on average, about 40 calls to getTerrainHeight() per step. In some cases, I found that it made over 100 calls per step, for a different coordinate each time.

This many calls is completly unnecessary, since my sphere is only 1 unit big, and easily fits into my terrain with 300x300 unit face sizes. I can't think of any reason why any more than 4 or at the most 9 calls would be needed.

Were you experiancing this problem too? Did you simply resize your terrain and the problem disapeared?

tuan kuranes

02-10-2006 12:49:04

Ode heightfield is really, really bad from preformance point of view.
Scaling it should make it call less times getheigth.

I'm considering making mine, or more precisely, a "planebounded" geometry, which would be exaclty looking like BSP_collision/refAppcollision, using plane scene query, which I would optimise in plsm2 and even propose a patch for TSM.

That way we would have a geometry adapted to any scene manager that has some world geometry.

syedhs

02-10-2006 17:40:43

Ode heightfield is really, really bad from preformance point of view.

Which ode heighfield.. is it from Monster's own TerrainGeometry or ode's native heighfield?

luis

02-10-2006 19:56:25

@foxbat

Did you simply resize your terrain and the problem disapeared?
Yes...
I also tried calling the step manually an arbitrary number of times each frame (not using the timestep) but i had another problem: frame rate dependency...

I did it to avoid this:
*QuickStepper function divides (making substractions) the 'elapsed time' with the given _step_size, so when the *QuickStepper::step function takes even more time than the time given in the call, the problem get worse and worse Sad

@Tuan

I'm considering making mine, or more precisely, a "planebounded" geometry, which would be exaclty looking like BSP_collision/refAppcollision, using plane scene query, which I would optimise in plsm2 and even propose a patch for TSM.

i had problems with BSP_collision/refAppcollision with sharp corners in boxes (in the worldgeometry) i dont know if it was because of my BSPs

tuan kuranes

02-10-2006 20:15:42

@syedhs: previous ogreode ode terrain patch was an ode contribution now merged in code ode. So both are supposed to be more or less the same.

@luis : we'll debug that once it is in ogreode ;)

foxbat

03-10-2006 04:29:20

I did some more testing and found a direct dependence between frame rate and the number of getHeight() calls made. A 6 wheeled vehicle for example requires up to 200 getHeight calls per step, even though it fits within a single terrain quad. With about 200 calls, I get a 25% fps drop, which is unacceptable just for a single vehicle, let alone multiple AI entities making hundreds of ray checks...potentially leading to millions of calls per second!

I'm going to look into other terrain collision techniques and see what I can find.

tuan kuranes

03-10-2006 12:49:20

A 6 wheeled vehicle for example requires up to 200 getHeight calls per step, even though it fits within a single terrain quad
That would be adressed to a at most two "getPlane()" call using the bsp collision idea.

luis

03-10-2006 20:15:36

That would be adressed to a at most two "getPlane()" call using the bsp collision idea.

please! tell me where is that new OgreOde version :D

EDIT:
What about multithread "support" ? for example... a Howto or one of the demos running the simulation in a different thread....

tuan kuranes

03-10-2006 20:43:17

@luis: planeboundedRegionGeometry is a wip, still not finished. Threading using pthread (only if user activate a define) and "worker thread" model and lazy synchronisation is planned.

luis

03-10-2006 22:24:22

planeboundedRegionGeometry is a wip, still not finished.
I know, it was a joke :)

Threading using pthread (only if user activate a define) and "worker thread" model and lazy synchronisation is planned.

wow, that's great!

foxbat

04-10-2006 02:09:33

The planeboundedRegionGeometry BSP idea sounds very interesting.

I'm actually considering making a modification to the ode heightfield so that it will do ray collisions using quadtrees. First a 2x2 version of the terrain will be checked, and any cells which don't intersect the ray bounds will be rejected. Then the remaining cells will be checked at a higher resolution, and the process repeats with further sub-divisions until only the max resolution cells which intersect the terrain are remaining.

Even in a worst case situation where no low res cells are rejected, this method would still save a massive number of per quad collision checks...by something like a factor of 1000 when checking a ray with a terrain sized AABB against a 1024x1024 terrain. The current heightfield code would handle a situation like this with brute force, checking every single quad in the terrain, making over a million collision checks.

How is the getPlane() system going to work? Maybe I should hold off on implementing my own quad tree code, if yours is underway.

One interesting piece of advice I got when asking about terrains on the ODE mailing list was that it might be faster to let dHeightfield do it's own internal height checks with it's own terrain representation, rather than using the callback. Do you think this would provide a speed increase, or is the PLSM getHeight code pretty efficient?

Anonymous

04-10-2006 11:46:29

Maybe ODEs heightfieldcollider is not fully optimized (certainly not with big objects on fine-grained heightmap...) but part of the problem is that plsm2 and ODE duplicates work in the getHeight(Vector3) queries. ODE only asks for the height at heightfield vertices, never between - then it makes its own plane... plsm2 on the other hand always gives back an interpolated height, based on the current LOD (you're supposed to be able to turn off LOD, but not interpolation... right?).

BTW, I made a little threaded demo (one physics thread, and one for the rest...) where I duplicate the heightfield data and just return the height at the right heightfield point, and it works beautifully... big perfomance gain even without threading. With threading I can have ~1000 boxes or ~1500 spheres with a 10ms timestep - and my implementation is quite naive so I am sure there is room for improvement.

Using quadtrees (or something similar) sounds like a really good idea, though... is this planned as in "next week" or as in "next year"? I am considering my own implementation as well...

tuan kuranes

04-10-2006 15:00:42

@foxbat: They doesn't understood the heightfield code. Quadtree won't do anything to performance, and huge data set doesn't either.
Once profiled, its clearly an ODE problem. If you ask me, 1500 queries for a single box and 4 spheres... adding 9 boxes in demo and I get up to 4000 queries.
But I still get at 150-300 fps here using CVS plsm2 and cvs OgreOde (amd 643800 + geforce 6800).

I sent an answer to the mailing list over there

highly tesselated terrain 512x512 and big object on it (a box that can cover up to 150 terrains vertices at once.). Tha can leads to 2000 getheight calls, but getHeight is not the really the Bottleneck there, it's rather the plane construction code, which is heavy and not reusing previous results :

// Collide against all potential collision cells.
for ( i = nMinX; i < nMaxX; ++i )
for ( j = nMinZ; j < nMaxZ; ++j )
{
numTerrainContacts += terrain->dCollideHeightfieldUnit(i,j);
}

dCollideHeightfieldUnit is computing triangles planes which geom will collide against, and in that loop that means compute several times exactly the same plane.
As a point share 6 triangles with its neighbour...
For each point at least Two planes is computed...

Best would be a first pass getting all points, compute a BBox and see if it intersect geom.
A second pass would build all planes defined by those points, even possibly build an optimised plane list (why not merging when identical planes)
Third pass will collide those planes against the geom.

That should at least divide Planes computation by 6, as each points defines 6 triangles/plane, and divide collision by the same factor.

@martin.enge:
1) in plsm2 and ogreode CVS, you can use

bool noInterpolation = true;
mSceneMgr->setOption("queryNoInterpolation", &noInterpolation);

landscape demo in CVS is compatible with PLSM2 and use that.

2) using PLSM2 events you easily can get heightfield data, but I'm not sure the gain is that much different.

Anonymous

04-10-2006 15:38:24

I thought foxbat meant having a quadtree with progressively finer resolution of max heigth, so you can disregard collisions between the bounding box (or point) and terrain early for stuff that is not very close to the terrain, without first retrieving the heightfield data and then constructing a bounding box. Anyhow it requires rewriting the ODE heightfield collider code, of course. Like you say, checking collision agains the way they do it seems a bit crazy.

I have not tried the scenequery in CVS, but that sounds good. I still need the copy of the heightfield data for separate thread, though.

tuan kuranes

04-10-2006 16:06:55

terrain early for stuff that is not very close
Not sure that particular case really needs that memory-heavy optimisation.
I still need the copy of the heightfield data for separate thread, though.
We'll be happy if you can contribute an patch for landscape demo using your method, or/and threaded "ogreode". (but we'll understand if you can't)
Do you have multicode or Hyperthread processor.

foxbat

05-10-2006 01:10:00

I sent an answer to the mailing list over there

Yes, thanks for that.

I thought foxbat meant having a quadtree with progressively finer resolution of max heigth

That's correct. If the plane construction bottleneck is improved though, then maybe using a quadtree won't provide much of an improvement, since height lookups are ment to be extremely fast. As I understand it, it's the rest of the collision code which is done per terrain cell which needs improvement, rather than the actual height lookup.

I think a quadtree representation could still provide an improvement though, simply be reducing the number of total cell checks. It also wouldn't be too expensive in terms of memory, using about double that of the highest resolution representation. As far as I can see though, it would only be beneficial on ray checks, since it's easy to get the AABB of any segment of a ray to check against any corresponding segment of terrain.

Maybe ODEs heightfieldcollider is not fully optimized (certainly not with big objects on fine-grained heightmap...) but part of the problem is that plsm2 and ODE duplicates work in the getHeight(Vector3) queries. ODE only asks for the height at heightfield vertices, never between - then it makes its own plane... plsm2 on the other hand always gives back an interpolated height, based on the current LOD (you're supposed to be able to turn off LOD, but not interpolation... right?).

Yes, that's a good idea for an improvement. It's pretty unnecessary to do two interpolations like that. By duplicating the data, do you mean that you stored the terrain points in the actual dHeightfield object rather than using the callback? I can see that that would have the effect of removing any unnecessary interpolations.

tuan kuranes

16-10-2006 12:38:22

Seems Ode mailing list and/or heighfield author doesn't need that speed improvement...
Seems a Patch has to be done.
Let me know if you want to do this or if I have to add that to my todo list ?

Interpolation can be deactivated in plsm2, read posts above.

syedhs

16-10-2006 12:50:29

Maybe I am bit too late, but here is one thought (very simplistic though):

How about simply cache the value returned by getHeight() and invalidate and recalculate it say every 1 ms? That way, it is possible to calculate only once every step or so.

tuan kuranes

16-10-2006 12:55:46

In my test current ode code goes up to 7000 getHeight per frame with a vehicle and 5 boxes... no cache can help that...
If previous solution doesn't work, it still can be a solution, but as said above getHeight is not the CPU bottleneck here, it's plane construction.

foxbat

24-10-2006 01:16:14

Let me know if you want to do this or if I have to add that to my todo list ?

Since your already working on the plane construction improvement, I think that any other improvements may be unnecessary. As you say, the bottle neck is with the plane construction, and I agree that memory management improvements won't offer much improvement until this is fixed.

Let us know how you go with your planeboundedRegionGeometry code. It sounds great.

tuan kuranes

24-10-2006 07:18:50

Let us know how you go with your planeboundedRegionGeometry code. It sounds great.
Just to make sure nothing is wrong with a "planeboundedRegionGeometry" approach, I rewrote part of the heighfield collider and provided it as a patch for Ode on sourceforge.

That patch should make it a lot more usable.
Have a try meanwhile.

tuan kuranes

25-10-2006 12:32:40

It would be great if someone can test the patch and provide feedback here o on Ode mailing list.

Patch is downloadable here :

http://sourceforge.net/tracker/index.ph ... tid=382801

luis

25-10-2006 20:16:03

Older Ode:
13ms - 65FPS

Ode + patch (with HEIGHTFIELD_TRIANGLE_BORDER_AS_RAY defined):
2ms - 227FPS


'ms' are milliseconds spent only by these calls:
mStepper->step( timeStep );
mWorld->synchronise();

World geometry setup:
PageSize=257 (HM image is 257x257 pixels)
PageWorldX=170
PageWorldZ=170

Two cars (with bigger tires than the Jeep) and 16 boxes. 1024x768 window mode, AA x2, no shadows.
Intel Core2 D 6600 / Nvidia7950GT






Time based profile codeAnalyst older ODE (only the first 3 functions):
CS:EIP Symbol + Offset 64-bit Total % Timer samples
0x4645e0 dNormalize3 5.64 2263

1 function, 98 instructions, Total: 2263 samples, 28.71% of samples in the module, 5.64% of total session samples

CS:EIP Symbol + Offset 64-bit Total % Timer samples
0x44ee30 dxHeightfield::dCollideHeightfieldZone 5.38 2159

1 function, 344 instructions, Total: 2159 samples, 27.39% of samples in the module, 5.38% of total session samples

CS:EIP Symbol + Offset 64-bit Total % Timer samples
0x466850 dCollideRayBox 3.84 1538

1 function, 229 instructions, Total: 1538 samples, 19.51% of samples in the module, 3.84% of total session samples

Time based profile codeAnalyst NEW ODE (only the first 3 functions):

CS:EIP Symbol + Offset 64-bit Total % Timer samples
0x3c26d0 dBodyAddRelForceAtRelPos 6.04 2420

1 function, 75 instructions, Total: 2420 samples, 26.01% of samples in the module, 6.04% of total session samples

CS:EIP Symbol + Offset 64-bit Total % Timer samples
0x3aee30 dxHeightfield::dCollideHeightfieldZone 2.92 1169

1 function, 313 instructions, Total: 1169 samples, 12.56% of samples in the module, 2.92% of total session samples

CS:EIP Symbol + Offset 64-bit Total % Timer samples
0x3c2870 dBodyGetTorque 2.47 990

1 function, 12 instructions, Total: 990 samples, 10.64% of samples in the module, 2.47% of total session samples

Note that i made the first codeAnalyst sample some days ago with a slightly different binary/world geometry so you can't compare directly but i post both samples to show the "ranking" in the function calls.

i think it is a great improvement ! :D :D :D

tuan kuranes

26-10-2006 13:41:50

Great, thanks for testing.

Intel Core2 D 6600 / Nvidia7950GT
mmmhhmm I might be glad you're around when threaded OgreODE version will have to be tested ;)
Ode + patch (with HEIGHTFIELD_TRIANGLE_BORDER_AS_RAY defined):
Why did you enable HEIGHTFIELD_TRIANGLE_BORDER_AS_RAY ?
(I'd like to find what it's used for and then find a way not to use it in all other case, as it eats a lot of cpu cycles.)

Time based profile codeAnalyst older ODE (only the first 3 functions):
0x44ee30 dxHeightfield::dCollideHeightfieldZone 5.38 2159


If "dCollideHeightfieldZone" is called, that should mean it's the patched ODE ?

luis

26-10-2006 14:30:22

mmmhhmm I might be glad you're around when threaded OgreODE version will have to be tested Wink

sure !! ;)

Why did you enable HEIGHTFIELD_TRIANGLE_BORDER_AS_RAY ?

i misunderstood the meaning of the define :oops: i should stop reading with mousewheel :)

If "dCollideHeightfieldZone" is called, that should mean it's the patched ODE ?

no, it shouldn't.... maybe i's a mistake in the copy-paste text.

I'll make a new test undefining HEIGHTFIELD_TRIANGLE_BORDER_AS_RAY and check again the CAnalyst when i get home ;)

thanks for your work Tuan!

luis

26-10-2006 18:52:43

new test:


Older Ode:
20ms - 40FPS (with shadows)
5ms - 140FPS (without shadows)

New Ode: (HEIGHTFIELD_TRIANGLE_BORDER_AS_RAY undefined)
1.8ms - 140FPS (with shadows)
0.4ms - 600FPS (without shadows)

In previews posts i said something about the framerate and CPU spent in physics that could explain the difference in CPU used by the simulation when FPS goes down. I think that going multithread will fix that...



Older Ode - CA - the first 4 calls:
CS:EIP Symbol + Offset 64-bit Total % Timer samples
0x464470 dNormalize3 8.08 3234

1 function, 103 instructions, Total: 3234 samples, 40.72% of samples in the module, 8.08% of total session samples
CS:EIP Symbol + Offset 64-bit Total % Timer samples
0x466720 dCollideRayBox 3.55 1423

1 function, 214 instructions, Total: 1423 samples, 17.92% of samples in the module, 3.55% of total session samples
CS:EIP Symbol + Offset 64-bit Total % Timer samples
0x44ef70 dxHeightfield::dCollideHeightfieldUnit 2.57 1030

1 function, 293 instructions, Total: 1030 samples, 12.97% of samples in the module, 2.57% of total session samples
CS:EIP Symbol + Offset 64-bit Total % Timer samples
0x464a30 dGeomPlanePointDepth 1.22 488

1 function, 127 instructions, Total: 488 samples, 6.14% of samples in the module, 1.22% of total session samples

New Ode - CA - the first 4 calls:
CS:EIP Symbol + Offset 64-bit Total % Timer samples
0x44edf0 dxHeightfield::dCollideHeightfieldZone 2.49 999

1 function, 169 instructions, Total: 999 samples, 40.56% of samples in the module, 2.49% of total session samples
CS:EIP Symbol + Offset 64-bit Total % Timer samples
0x4640c0 dGeomPlanePointDepth 0.99 398

1 function, 115 instructions, Total: 398 samples, 16.16% of samples in the module, 0.99% of total session samples
CS:EIP Symbol + Offset 64-bit Total % Timer samples
0x463b00 dNormalize3 0.97 390

1 function, 30 instructions, Total: 390 samples, 15.83% of samples in the module, 0.97% of total session samples
CS:EIP Symbol + Offset 64-bit Total % Timer samples
0x433270 dCollideBoxPlane 0.68 274

1 function, 115 instructions, Total: 274 samples, 11.12% of samples in the module, 0.68% of total session samples

luis

26-10-2006 19:28:42

The simulation is a bit unstable now....
some of the boxes are jumping on the terrain :(
(it only happens with the new version)

EDIT:

Ok, undefining HEIGHTFIELD_TRIANGLE_BORDER_AS_RAY adds that new feature to the box (the jump) :(

it's really bad because of the speed improvement...

I didn't see the code, but do you think that there is still a solution?

tuan kuranes

27-10-2006 13:46:31

I do think so.

In Theory, it's very wrong. if you have terrain triangles as planes, why would we need the triangles borders as geoms ?

anyway, even with HEIGHTFIELD_TRIANGLE_BORDER_AS_RAY you should get some speed improvements, no ?

tuan kuranes

27-10-2006 17:39:10

mmmh, have a try at that :


take the whole

if ( isColliderRayEnabled )
{

}


that is inside the for() for() and put it after the 2 plane collision.... (but still inside the 2 loops)
And... that should limitate the use of that costly part of code to bad case.

luis

27-10-2006 20:11:28

anyway, even with HEIGHTFIELD_TRIANGLE_BORDER_AS_RAY you should get some speed improvements, no ?

yes, but not too much comparing with the original patch, about 12% Vs. 95% speed increase.

that is inside the for() for() and put it after the 2 plane collision.... (but still inside the 2 loops)
And... that should limitate the use of that costly part of code to bad case.


unfortunately i get the same results :(

syedhs

28-10-2006 10:16:08

@Tuan,

Just to chip in to say great to see those speed improvement as reported by Luis. I am on holiday right now, and will be back on this Monday and will get this tested!

foxbat

29-10-2006 11:01:42

I’ve finally had a chance to test the new patch, and I too have noticed some speed improvements, although not to the extent that luis has.

The following tests were taken with a 6 wheeled, 2 doored vehicle (9 objects doing terrain checking per vehicle), and a single vertical player ray check.

Baseline fps (no collision checks): 170 fps
Old ODE with 1 vehicle: 130 fps - approx 1.8ms
New ODE with 1 vehicle: 140 fps - approx 1.3 ms

The difference isn't substantial, although I wasn't able to do any more thorough tests, which may have produced more significant results. I'm also noticing some jittering in the simulation, which I don't remember being there with the old version. Perhaps it's a HEIGHTFIELD_TRIANGLE_BORDER_AS_RAY issue?


I'm sorry to say that despite the new heightfield code, my simulation still slows to a crawl when increasing the number of vehicles. Here are comparisons made between using terrain, and using a simple ground plane to approximate the terrain.

1 vehicle: 1.3 ms for terrain, <1 ms for plane
2 vehicles: 5.3 ms for terrain, <1 ms for plane
3 vehicles: 18.6 ms for terrain, 1.1 ms for plane
6 vehicles: 161 ms for terrain, 1.3 ms for plane
8 vehicles: > 1000 ms for terrain, 4 ms for plane

As you can see, performance is dropping off exponentially (probably due to a cpu bottleneck forming), a using terrain obviously has a phenomenally greater impact on performance than when using a simple ground plane.

luis

30-10-2006 15:06:10

I’ve finally had a chance to test the new patch, and I too have noticed some speed improvements, although not to the extent that luis has.

may be it is because my baseline is different:
5ms - 140FPS (without shadows), and as you said performance is dropping off exponentially...

would be good to have some kind of standard test :)
maybe a modified version of landscape demo.

I think that having the simulation in another thread will really increase the speed even in PCs with only one CPU.... but of course any optimization is wellcome.

luis

07-11-2006 11:25:15

@Tuan
Any news on multi threading version ? :D

Anonymous

10-11-2006 13:47:43

Amazing progress with the planeboundedgeometry! I've been away for a while (disgusting amount of work...) but just thought I would give a late reply to tuan kuranes.


I still need the copy of the heightfield data for separate thread, though.
We'll be happy if you can contribute an patch for landscape demo using your method, or/and threaded "ogreode". (but we'll understand if you can't)
Do you have multicode or Hyperthread processor.


I would be honoured to contribute to OgreODE / PLSM. I'm not sure my current code is ready for it, though... I'm basically playing around with some ideas for a cool demo (after getting an dualcore AMD), for use in a generic setting it would have to be rewritten.
Anyway this is my simplistic approach:
I have not made a "thread safe" ODE. In fact I have not modified ODE at all.

Instead I have a "WorldState" object which contains output from OgreODE (position / rotation of all bodies.) And input to OgreODE. I have to mutexes: one for ouput from ODE (WorldStateMutex) and one for input to ODE... Also, I have subclassed OgreODE::Body to and OgreODE::QuickStepper.

My subclassed Body::sync() does not update the Ogre SceneNode position, but instead the position/rotation in the WorldState object. It does not lock anything.

My subclassed Stepper::step() performs everything you expect step() to do, but before calling _world->synchronise() it locks the WorldStateMutex. Then sync() will be called on all bodies, which is ok because step() already locked the WorldStateMutex. Oh, and it also checks for input. If it is ahead of time, it goes to sleep for a while...
Then, every frame I sync the Ogre SceneNode to the WorldState, after locking the mutex.

...and that's it. It seems to work reasonable well, but it is not a multi-threaded OgreODE. In fact, I think making OgreODE thread-safe is somewhat risky; thread safety always comes at a price (waiting/overhead of locking a mutex, etc) so I kind of favour the solution of making thread-safe "wrappers" for certain functions you need, and othrewise keeping things fast and thread-dangerous!

If you're interested, I could fix up the code and give it away, but I don't plan to use it myself. For starters, I want to use something like this: http://www.codeproject.com/threads/lwsync.asp
to add some structure, and I want to stop using boost::threads (which uses critical_section on windows - slow) in favour of hardware supported single-instruction calls (InterlockedCompareExchange), at least on Windows (see: http://www.codeproject.com/threads/fast_ipc.asp for a nice example). The idea is that if I have really fast locking, then I can afford to lock smaller amounts of data at a time, thus reducing waiting states.

luis

17-11-2006 08:11:58

martin i would like to test and see your code, i have two machines (with dual core and single core) at home.....

Anonymous

17-11-2006 17:00:42

Allright, I'll try and turn it into "human readable" code, then. Might take me a little while, though.

luis

18-11-2006 10:32:06

ok, i'll be around impatiently waiting :)

someone knows where is Tuan ?

foxbat

20-11-2006 23:33:03

I just did another performance test, and I've realised that my bottleneck was actually being caused by the getHeight callback. I switched to using an internal height representation for ODE, and to my amazement, the bottleneck I spoke of in an earlier post disappeared. I can now get over 100fps in situations where the simulation used to completely hang due to the number of bodies present.

This is probably why I didn't see any major benefits when using Tuan's patch.

Anonymous

21-11-2006 08:56:58

This is my experience also (that getHeight() was a bottleneck for ODE, and making my own fast version fixed it), although I never tried the fastest version of getHeight() - without interpolation.

tuan kuranes

29-11-2006 14:48:49

new Ode patch. nearly as fast as possible.
I'll intend to release a new OgreOde soon, that will show those results.

Anonymous

10-03-2007 13:17:20

Seems I will never get around to doing this properly, but in case luis or someone is interested, I posted the important parts of my threaded ogreODE implementation here: http://www.ogre3d.org/phpBB2addons/viewtopic.php?t=3722
The plsm2 specific parts are not there, they are really hacky. Maybe I should post them anyhow?

luis

10-03-2007 17:31:56

i'm porting all my code to NxOgre right now, but thanks any way !

I think any threaded version of OgreOde (specially using heightmaps with high poly density) is a big step ahead in performance ;)

Anonymous

10-03-2007 21:26:47

yeah, I will be trying bullet (maybe through OgreBullet) or OpenTissue. I'd use NxOgre too, if it were only for making a good game, but I am more interested in learning/complicating things.

Tabarn@kdeca.lis

06-06-2007 21:10:04

That way you can save even more collision. Make sure only moving - static collision is done, and no static-static.

I was able to prevent such a problem using this function : OgreOde::TransformGeometry::setCollisionBitfield

I set all map trimesh to 0, and the player capsule to 1 and I get desired collisions. It was fun though to see the debug of 10k contact normals of my environment on itself at 1 fps!

rewb0rn

06-06-2007 22:39:30

why not just adding all static geos to one space without internal collisions? is there any disadvantage? because i do it that way right now.

Tabarn@kdeca.lis

07-06-2007 15:12:53

kpreid_ on mIRC #ODE channel told me to use separate spaces for that, as you are doing, it's more efficient.