[Bleeding] asynchronous simulation and rendering update

weak

10-12-2007 22:38:33

i'm trying to asynchronously update the physx simulation and the rendering position and i'm wondering if i'm doing it the right way?

create world like this:
m_world = new World("FrameListener: No, log: html");
m_scene = m_world->createScene("mainNxScene", m_sceneManager, "gravity: 0 -9.8 0, floor: yes, controller: accumulator, renderer: ogre");


and calling simulate every 1/60 seconds and render at every frame:
const float timeStep = 1.0f/60.0f;
static float timeElapsed = 0;
timeElapsed += m_timeSinceLastFrame;

if( timeElapsed >= timeStep )
{
m_world->simulate(timeStep);
timeElapsed -= timeStep;
}

m_world->render(m_timeSinceLastFrame);


would that correctly interpolate the current rendering position depending on the alpha value?

betajaen

10-12-2007 22:43:10

Accumulator does that for you. All you need to do, is inject the deltaTime value; in this case 1.0/60.0f, or the time since last frame. Both ways are almost the same.

[Edit]

If your not planning to use VSync, and just let Ogre go all out with the FPS. It's probably best to inject the deltaTime, at least then in the interpolation methods would work better that way.

Otherwise 1.0/60.0f it is then. ;)

weak

10-12-2007 23:07:44


If your not planning to use VSync, and just let Ogre go all out with the FPS. It's probably best to inject the deltaTime, at least then in the interpolation methods would work better that way.

Otherwise 1.0/60.0f it is then. ;)


i guess with inject time you mean calling simulate() and render() and passing the time? but how do i influence the speed at which the simulation gets updated if i call simulate() and render() every frame?

luis

11-12-2007 08:56:06

i guess with inject time you mean calling simulate() and render() and passing the time?
yes, create the scene with "controller: accumulator", call simulate & render in all frames passing the timesincelastframe.
but how do i influence the speed at which the simulation gets updated if i call simulate() and render() every frame?
Simulate method in NxOgre will do more or less exactly what you're doing in the code you posted, and then render method will interpolate the positions if you set it in the body:

mBody->setInterpolation(RenderableSource::I_Linear);


If your not planning to use VSync, and just let Ogre go all out with the FPS. It's probably best to inject the deltaTime, at least then in the interpolation methods would work better that way.
I'm calling the render method using TimeSinceLastFrame and i'm still getting a very small jittering :( should i pass the alpha/delta value ?

betajaen

11-12-2007 09:43:25

i guess with inject time you mean calling simulate() and render() and passing the time? but how do i influence the speed at which the simulation gets updated if i call simulate() and render() every frame?

Do you mean; go slow motion or faster within the Scene? There is a time modifier method on PhysXDriver, but it doesn't work with the accumulator. I haven't bothered/or didn't remember about it when I wrote the accumulator code. I can try and put it in, but it may take a while.

I'm calling the render method using TimeSinceLastFrame and i'm still getting a very small jittering Sad should i pass the alpha/delta value ?

Delta = Time Last Frame
Alpha = Accumulated value for Interpolation, range 0.0f to 1.0f. Only used with the accumulator scene controller otherwise it's always 0.5f

I get some tiny amounts of jittering too, but it's rare and some times random. I expect the interpolation needs to be across a few more frames for that.

weak

11-12-2007 10:02:43


I'm calling the render method using TimeSinceLastFrame and i'm still getting a very small jittering :( should i pass the alpha/delta value ?


that's the reason i was asking if i'm doing it right. there's still some jittering left with vsync on and without vsync it can get really nasty in certain situations. actually the interpolation i used with 0.9 seemed to work better.

@betajaen: i didn't have time yet to look at your implementation but i guess the physx simulation is one frame ahead and you interpolate the rendering position based on the current alpha internally?

betajaen

11-12-2007 10:05:02

That's weird, I more or less copied the code that was posted. I did add my artist talent to it though.

Nope. I don't think it is rendering one frame ahead, but the interpolation is based on if the Renderable is meant to interpolate (by default it's not).

luis

11-12-2007 10:06:10

@betajaen: i didn't have time yet to look at your implementation but i guess the physx simulation is one frame ahead and you interpolate the rendering position based on the current alpha internally?

I want to know that too, is it internally or we have to pass it in the render method ?

weak

11-12-2007 10:13:08

That's weird, I more or less copied the code that was posted. I did add my artist talent to it though.

Nope. I don't think it is rendering one frame ahead, but the interpolation is based on if the Renderable is meant to interpolate (by default it's not).


hm, how can you interpolate correctly if the simulation is not one step ahead? Fiedler described in his article an interpolation between the previous simulation step and the current one. that should smooth position and orientation (tradeoff is of course the introduced latency of one step duration)

betajaen

11-12-2007 10:15:33

I want to know that too, is it internally or we have to pass it in the render method ?

The render method does it. Or at least it's meant to, and will do.

hm, how can you interpolate correctly if the simulation is not one step ahead?

By doing the opposite; It interpolates based on the current frame and the last frame.

weak

11-12-2007 10:37:01


By doing the opposite; It interpolates based on the current frame and the last frame.


this can't work imho.
just think of a body moving at high speed and getting suddenly stopped by a static object. an interpolation of the last and current step might show the moving body behind or inside the static object.
it's hard to accurately predict the future ;)

betajaen

11-12-2007 10:46:37

That's true. I'll have a second stab at it after cloth and fluids.

weak

11-12-2007 11:00:26

I guess you're familiar with the 'Asynchronous Stepping' Article in the PhysX documentation?
As said i didn't have time to take a close enough look at your wonderful lib, but running the simulation in an independent thread should be the desired method imo.

It's a advance to be able to render at full framerate and update the simulation at a fixed (lower) framerate, but it would be even better to use buffered values and let the simulation run completely in parallel.

betajaen

11-12-2007 11:05:00

It's a advance to be able to render at full framerate and update the simulation at a fixed (lower) framerate, but it would be even better to use buffered values and let the simulation run completely in parallel.

See this is what I want. But some people Luis say it's an awful idea and I should be ashamed of thinking up such ideas.

weak

11-12-2007 11:14:11


See this is what I want. But some people Luis say it's an awful idea and I should be ashamed of thinking up such ideas.


Unfortunately many peolpe are kind of afraid when it comes to parallelizing stuff, but the benefits are surely worth any trouble. With the above simulation method you'd still be blocking rendering and that sucks for so many reasons.

Nowadays you get multicore cpus for a few bucks, so a physic simulation that runs in parallel is not only state of the art, it's almost a must. Just think of all the wonderful possibilities...

luis

11-12-2007 11:25:04

hehe you're exaggerating!

I dont see the point to create *another* thread to update the simulation at a fixed timestep. It will not help, it will complicate NxOgre, add synchronization overhead, lots of hard to find bugs (race conditions for example).
And using a time accumulator is the same solution as doing it without the problems of using *another* thread to update physics, and by update i mean just calling simulate()/fetchResults() methods, wich is different than making 'the simulation'.

also read what Ageia docs say:

The AGEIA PhysX SDK is multi-threaded; the physics simulation calculations run in their own thread, separate from the application thread. The state of the simulation is updated by calling a sequence of functions that (1) start the simulation, (2) ensure that all necessary data has been sent to the simulation thread, (3) check to see whether the simulation is finished, and if so, update the state data in the buffer, and (4) swap the state data buffers so that the next simulation step will be performed on the alternate buffer, leaving the current results accessible to the application. The function sequence is illustrated in the following snippet of pseudocode:


The AGEIA PhysX SDK is multi-threaded; the physics simulation calculations run in their own thread, separate from the application thread.
So, why are we going to add *another* thread ? just to update the simulation? no sense... and how are you going to free the CPU when the updating thread has to wait untill the time to update is reached since the time slice under windows is 10ms ???

Creating another thread to run the simulation (wich is different than creating a thread just to call simulate()/fetchResults()) has sense if you're using ODE for example. Or any other single-threaded physics engine ;)

luis

11-12-2007 11:40:44

With the above simulation method you'd still be blocking rendering and that sucks for so many reasons.
NxOgre is blocking rendering only once per 1/60 seconds and it isn't so serious. If you have your app running at 240FPS it means you're blocking (updating physics) 1 of 4 frames.

Ageia's advice is you add code between flushStream/fetchResults call....
a good example is: SampleAsyncBoxes

weak

11-12-2007 11:41:39

see, that's the nice thing. it just needs to be used. take a look at SceneController::Simulate() in NxOgreSceneController.cpp:

bool SceneController::Simulate(NxReal deltaTime) {
mDeltaTime = deltaTime;
mNxScene->simulate(deltaTime);
mNxScene->flushStream();
while (!mNxScene->fetchResults(NX_RIGID_BODY_FINISHED, false));
return true;
}


as you can see a call to simulate is still blocking and not using the multithreading capabilities of physx.

weak

11-12-2007 11:44:31


NxOgre is blocking rendering only once per 1/60 seconds and it isn't so serious. If you have your app running at 240FPS it means you're blocking (updating physics) 1 of 4 frames.


it doesn't matter how often you're blocking but how long. if you're physics simulation is very complex you might still introduce serious lag because it's blocking long enough to see it.
so a true parallel approach would allow more complex physics while maintaining a more stable framerate.

luis

11-12-2007 12:05:05

as you can see a call to simulate is still blocking and not using the multithreading capabilities of physx.

we already have the option to update the simulation in a non-blocking way:

fetchResults(NX_RIGID_BODY_FINISHED, false);
it doesn't matter how often you're blocking but how long. if you're physics simulation is very complex you might still introduce serious lag because it's blocking long enough to see it.
yes, you're right, but adding threading to nxogre is not mandatory, you can add code that will run un parallel see:

bool SceneController::Simulate(NxReal deltaTime) {
mDeltaTime = deltaTime;
mNxScene->simulate(deltaTime);
mNxScene->flushStream();
// <------- here
while (!mNxScene->fetchResults(NX_RIGID_BODY_FINISHED, false))
{
// <------ and here
}
return true;
}


it is up to us to use it or not, and adding code there will be a lot easier than deal with the extra complexity of adding an 'almost useless threading' to nxogre......

luis

11-12-2007 12:06:27

perhaps some NxOgre calculus could be made in those lines... ? betajaen, any opinion ?

weak

11-12-2007 12:17:32

don't get me wrong, i don't want to introduce a separate thread when physx already has the necessary capabilities.
it just should get used.

betajaen

11-12-2007 12:20:40

There is/was planned a Scene::idle() method to operate within the while (!mNxScene->fetchResults(NX_RIGID_BODY_FINISHED, false)). It's designed to do some cleanup and some other little things whilst the scene is simulating.

However, PhysX is so fast; I've never seen it actually run that method in a normal sized scene. So if there is a block, it's tiny.

weak

11-12-2007 12:41:48

what do you think about a loop like that (pseudocode):

loop
{
if( accumulator >= timestep )
{
fetchPhysXResults(); // blocks here
updateSavedPositionsForInterpolation();
startNewPhysXSimulationStep();
}
interpolateRenderPositions();
renderStuff();
}


let's say the timestep is 1/60. if the physx simulation is completed within that time the only blocking call would be fetching the results which should be very fast.
we should be able to squeeze every frame out of ogre while the simulation runs in parallel. and we should be able to interpolate the positions for rendering.

betajaen

11-12-2007 13:06:04

It's a little bit more complicated than that. Interpolation is optional, and processed on a per RenderableSource basis.

weak

11-12-2007 13:15:08

It's a little bit more complicated than that. Interpolation is optional, and processed on a per RenderableSource basis.

the call to interpolate the position could be optional. if the user doesn't want interpolation updateSavedPositionsForInterpolation() could update the position after each simulation step.

but of course you know the internals much better than i do so that may not work as i think.

luis

11-12-2007 13:29:39

don't get me wrong, i don't want to introduce a separate thread when physx already has the necessary capabilities.
it just should get used.

ok i misunderstand you :)
However, PhysX is so fast; I've never seen it actually run that method in a normal sized scene. So if there is a block, it's tiny.
Yes, Physx is amazingly fast and i'm not sure but i think that very complex thinks like fluids are calculated in parallel....


well, anyway i preffer to fix the jittering issue first :)
I'm simulating 8 cars and Physx + my AI code is eating ~150us (microseconds) in my dual core 6600....

weak

11-12-2007 14:21:12


Yes, Physx is amazingly fast and i'm not sure but i think that very complex thinks like fluids are calculated in parallel....

well, anyway i preffer to fix the jittering issue first :)
I'm simulating 8 cars and Physx + my AI code is eating ~150us (microseconds) in my dual core 6600....


i think currently nothing is simulated in parallel. and you can never have enough speed :D

if i can use twice as much actors or more complex collision models while maintaining the same framerate i want it ;)

Nazgul

20-12-2007 18:37:15


bool SceneController::Simulate(NxReal deltaTime) {
mNxScene->fetchResults(NX_RIGID_BODY_FINISHED,true);
mNxScene->simulate(deltaTime);
mNxScene->flushStream();
//while (!mNxScene->fetchResults(NX_RIGID_BODY_FINISHED, false));
return true;
}

Would make the Simulation and Rendering parallel (but you would be one frame behind with physics i think)
It would also remove the active waiting for the PhysX results - fetchResults hast to call some thread-sync function anyway to check wether the workerthread from PhysX is finished, so waiting for the results with
mNxScene->fetchResults(NX_RIGID_BODY_FINISHED,true);
instead of
while (!mNxScene->fetchResults(NX_RIGID_BODY_FINISHED, false));
would be a good idea anyway because currently on a single-CPU machin there is the PhysX thread and the main thread running (and the main thread consuming cycles in a waiting loop)
Setting fetchResults(..,Blocking=true) would place the main thread in the "WAIT" state with the os and leave more CPU free for PhysX if it has to wait already (which should increase single CPU performance)

betajaen

20-12-2007 19:31:13

I've done some tests in the past, with that while, and only a few times I've ever seen it being "used", in the cases of scenes with a high actor count >700. I don't think it's the bottleneck everyone thinks it is.

Nazgul

20-12-2007 21:41:07

I did some tests myself with the following results:

@400 Bodies

  1. Active Wait: 15.8 FPS
    Passive Wait: 16.7 FPS
    Parallel Passive: 16.4 FPS
    [/list:u]

    @600 Bodies

    1. Active Wait: 10.5 FPS
      Passive Wait: 11.1 FPS
      Parallel Passive: 11.9 FPS
      [/list:u]

      @1000 Bodies

      1. Active Wait: 5.9 FPS
        Passive Wait: 6.0 FPS
        Parallel Passive: 6.0 FPS
        [/list:u]

        The test was conducted on a

        Toshiba Satellite A110-178
        - Core Duo 1.6 GHz (T2050)
        - 1024 MB RAM
        - Intel 945 GMA Graphics controller


        Changes in the viewing perspective suggest that the main performance impact lies
        in the lack of proper 3d graphics acceleration.

        I think it still save to say, that there is a small bonus in passive waiting fetchResults(...,true) vs the current active approach.

        There is no evidence in my results for better framerates using the parallel approach with less than 1000 bodies. Using the parallel approach my CPU-Usage graphs (gkrellm) shows activities on both cores wich it does not for the other configurations.
        Subjectively the whole thing feels more responsive with the parallel approach but i might be biased there.

betajaen

20-12-2007 21:53:55

A very little difference in frame rate as I expected. The real bottleneck is Ogre, or at least the amount of render calls it uses per Actor. If anyone can come up or point me at some Ogre code that implements instancing, or some code that does batching, I'll be more than glad to implement it.

weak

21-12-2007 03:05:19

A very little difference in frame rate as I expected. The real bottleneck is Ogre, or at least the amount of render calls it uses per Actor. If anyone can come up or point me at some Ogre code that implements instancing, or some code that does batching, I'll be more than glad to implement it.

Do we really have discuss if we need the best possible implementation?
Let's just think a bit ahead...