Threaded Game Engine

Discussion area about developing or extending OGRE, adding plugins for it or building applications on it. No newbie questions please, use the Help forum for that.
Post Reply
klauss
Hobgoblin
Posts: 559
Joined: Wed Oct 19, 2005 4:57 pm
Location: LS87, Buenos Aires, República Argentina.

Post by klauss »

One small thing that some people (ie Zeal) seem to be concerned about and some not, is "intraframe consistency".

Probably because it is or isn't a serious issue, depending on the game type.

What is it? Suppose you have a particle system, one in which each particle moves at high speed. Ie: a fleet of ships. They're 10 meters apart from each other (wingtip to wingtip, lets say), but they're moving at 30.000 kph. You just can't feed the graphics engine a partial position update - it would make little sense, and you would break intraframe consistency: a unit that should be on front of another will be rendered on its back, because the other's position hasn't been updated yet - its position corresponds to a prior physics time. You'd want all rendered states to be a snapshot at some physics time.

Some systems, particularly FPS, don't need to concern themselves with that, because they seldom have any coordinated physics or high velocities (that could show important inconsistency at any sensible "time fragmentation" value).

Some do, however.

You don't design both systems the same way. You want consistency-aware implementations to assure that, and you need some kind of "snapshot queue" for that - ie, Zeal's double buffer. You update your "back snapshot", and when it's all updated and consistent, you switch it with the "front snapshot", so that the next graphics update will take it instead. In this case, you wouldn't want your graphics thread to waste time rendering the same state twice. Unless you have some form of automatic animation in it - ie: animated textures, or things like that. Which is probably the case (hint: that's why you would want a free-running graphics thread).

In a consistency-oblivious implementation, though, you only have to send update packets to the graphics thread. Inbetween renders, the graphics thread would pull all the update packes, apply them to its scenegraph, and render. As simple as that. I wish this could apply to a consistency-concerned application, because it's so elegant... but it can't.
Oíd mortales, el grito sagrado...
Hey! What is it with that that?
Wing Commander Universe
User avatar
Zeal
Ogre Magi
Posts: 1260
Joined: Mon Aug 07, 2006 6:16 am
Location: Colorado Springs, CO USA

Post by Zeal »

Thats a good point. I wasnt even thinking about it in terms of 'intraframe consistency', but youre right. In my current game, I have lots of 'tight moving' objects, they all march in a line, bounce off each other, ect... thus I cant just update one position (it could interact with another object), I need to update ALL their positions (a physics snapshot as you mentioned).

So the question is, whats the best way to implement such a 'snap shot' method? I really dont think using two 'scene states' is even possible, so that leaves messaging right?

And another thing I have been thinking about, since (in this method) you can only send 'complete' snapshots of the scenes state, it might be a good idea to use THREE buffers (buffers of messages).

In other words, you have a 'current' list of messages that the render thread is chewing through (using it to set node positions, ect...), and TWO 'next' lists that the update thread works through in an alternating fashion. The purpose of having two update buffers is so when one is completed, but the current render isnt finished, you can start on the next update (ie youll ALWAYS have a COMPLETE buffer ready to pass to the render thread, no matter when it requests it).

And since were working with 'local buffers' (local to the thread thats writting messages in them), you NEVER have to worry about locking resources.

What am I missing?
User avatar
xavier
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 9481
Joined: Fri Feb 18, 2005 2:03 am
Location: Dublin, CA, US
x 22

Post by xavier »

Zeal wrote:Thats a good point. I wasnt even thinking about it in terms of 'intraframe consistency', but youre right. In my current game, I have lots of 'tight moving' objects, they all march in a line, bounce off each other, ect... thus I cant just update one position (it could interact with another object), I need to update ALL their positions (a physics snapshot as you mentioned).
No you don't. Your eye cannot tell the difference between an object rendered in one place in one frame and a slightly different place in the next -- in other words, there is no visual difference between rendering only after everything has been updated, or rendering somewhere in the middle of the physics world update....unless you are rendering at about 10fps and your objects are moving halfway across the screen between physics updates.
Do you need help? What have you tried?

Image

Angels can fly because they take themselves lightly.
User avatar
Zeal
Ogre Magi
Posts: 1260
Joined: Mon Aug 07, 2006 6:16 am
Location: Colorado Springs, CO USA

Post by Zeal »

There could easily be a situation where a very fast object could 'overlap' another if you didnt display a complete 'physics state'. Sure it would be hard to detect at 60fps, but it could be noticeable.

Anyway, ive read several articles on 'messaging' and heard some people say its better to have a simple 'central array', that both threads can see.

Please critique my following strategy...

Thread 1 - Update (update object positions)

Thread 2 - Render (render object positions)

Now, 'in between' the two threads you would have THREE arrays (vectors actually). Well call them 'positionBuffer1-2-3[]'. Each of the two threads (update and render) will maintain a 'currentBuffer' which tells them which positionBuffer[] they are working with. The update thread writes to its buffer, while the render thread reads from its buffer.

So, to kick things off, the update thread 'locks' (via a boost mutex) positionBuffer2[], and begins to write to it (pushing onto the vector a sceneNode*, and a xyz position). When it finishes, it unlocks the array, and moves onto positionBuffer3[] (this process goes round and round forever, never stopping).

The render thread does the same thing in essence. It locks one of the positionBuffers, and begins reading its contents, setting node positions, ect... Now, after it renders the scene, it checks to see if the NEXT positionBuffer is locked (ie the update thread is writing to it), even if it is, it should have a third COMPLETE positionBuffer ready to go (no waiting no matter what).

So please tell me whats wrong with that design? And remember the whole point of only sending updates for a completed 'scene state' (besides "intraframe consistency") is to limit the amount of resource locking. Sure it would be nice to send a update to the render thread for EVERY object you position, but it has the potential to be very slow (with all the locking), and is it even really THAT necessary? IMO, no.

*so to sum up, its a Data Centered System of Threads (just like the Data Centered System of Systems Jeff Plummer talks about, the architecture im currently using, it kicks ass). The systems (or threads in this case) are completely independent, as they use centralized data to communicate, rather than a clunky messaging system. And rather than lots of LITTLE central data, you have big groups/buffers of data, to avoid frequent locking/unlocking.
User avatar
Praetor
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 3335
Joined: Tue Jun 21, 2005 8:26 pm
Location: Rochester, New York, US
x 3
Contact:

Post by Praetor »

You mention messaging as slow, and I've found the opposite. My messaging system, with quite a complicated data conversion mechanism, is quite fast. It may not be suited for state updates, but I have another system for that. I found that a fully data-centered system was very cumbersome. If you limit it to only certain data, like the position/orientation between physics and graphics it will probably stay manageable. Still, I think a hybrid of messaging and data-centered would end up being the most useful.
User avatar
Zeal
Ogre Magi
Posts: 1260
Joined: Mon Aug 07, 2006 6:16 am
Location: Colorado Springs, CO USA

Post by Zeal »

Yeah sure I would be open to some kind of hybrid approach assuming messages ARENT as bad as I have heard. The only thing I would care about would be making sure the interface/nature of the messages was 'simple'. Thats my favorite part about a pure data centered design. The interface between systems is so basic (since youre working with raw data).

So given some kind of central data or simple/fast messaging system - What are the flaws in the above design? Keep in mind its all very coarse grained at this point just to keep things simple. Of course you could 'fork' the update thread and run some of its 'sub parts' in parallel (as long as each sub thread writes to a unique part of the buffer).
User avatar
Praetor
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 3335
Joined: Tue Jun 21, 2005 8:26 pm
Location: Rochester, New York, US
x 3
Contact:

Post by Praetor »

There will still probably be waiting involved. The systems would have to be running at exactly the same speed to avoid it. You don't want the renderer to go from one buffer to the next available buffer, but in a strict sequence. If it renders buffer1, detects buffer2 is locked, then moves onto buffer3, then it is conceivable buffer3 is actually older than buffer1. I understand you want to avoid any waiting, but if you're worried about partial updates then there must be some sort of synchronization somewhere. I think the multiple buffers idea is a good plan in general though.
User avatar
Falagard
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 2060
Joined: Thu Feb 26, 2004 12:11 am
Location: Toronto, Canada
x 3
Contact:

Post by Falagard »

K, first off, I don't claim to know anything about anything ;-)

Who says you need to send a message per object? You could update your physics state on a separate thread and send one message with all new positions. You're probably going to have a reasonable number of physics bodies anyhow.

Therefore in a single update you could update all positions based on the last snapshot of the physics state of all objects and never have some objects
lagging behind others.

I see nothing wrong in particular with your buffering idea, except that it's specific to this problem, is it not? The same buffering could be applied to messages, and work well with sending physics state of all objects in each message.
User avatar
Zeal
Ogre Magi
Posts: 1260
Joined: Mon Aug 07, 2006 6:16 am
Location: Colorado Springs, CO USA

Post by Zeal »

If it renders buffer1, detects buffer2 is locked, then moves onto buffer3, then it is conceivable buffer3 is actually older than buffer1.
Yes but for that to happen your update thread would have to be running TWO full frames behind your render thread. I dont think anyone would argue that 'rendering' is the ultimate bottleneck, not the game logic. So if your update thread is performing THAT badly, you have other issues to worry about.

Besides, worst case you would have to pause your rendering thread. But what else can you do? No sense rendering an old frame. And that problem has nothing to do with this particular architecture anyway. You could have the exact same problem with a 'traditional' multithreaded design if your update threads were performing that badly.

Bottom line, rendering should NEVER outperform your logic. That would just be sad :p
The same buffering could be applied to messages, and work well with sending physics state of all objects in each message.
Yup buffering (aka writing to a piece of data) or sending messages in the manner you described are about the same. I just chose the 'centralized data' approach because I assumed it would be faster/easier than messages. In the end I think either would work just dandy.
Last edited by Zeal on Thu Nov 30, 2006 11:07 pm, edited 1 time in total.
User avatar
xavier
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 9481
Joined: Fri Feb 18, 2005 2:03 am
Location: Dublin, CA, US
x 22

Post by xavier »

Zeal wrote:No sense rendering an old frame.
What is the difference to the viewer? They see only a sequence of frames. I think you are putting too much emphasis on temporal cohesion. If your render thread just renders the state of the objects in its world, when it is ready, it should not matter. The render thread will empty the queue of messages before rendering its next frame; whether your other threads have removed "old" messages or your render thread just plows through the waiting messages, it will *always* be rendering the latest available state of the overall application.
Do you need help? What have you tried?

Image

Angels can fly because they take themselves lightly.
User avatar
Praetor
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 3335
Joined: Tue Jun 21, 2005 8:26 pm
Location: Rochester, New York, US
x 3
Contact:

Post by Praetor »

Well if you skip rendering a useless frame you could spend the time rendering a new one, or even better, you can use the time to render a prettier current frame!
User avatar
xavier
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 9481
Joined: Fri Feb 18, 2005 2:03 am
Location: Dublin, CA, US
x 22

Post by xavier »

That's my point -- if your render thread is clearing out the waiting messages each time around, there is no "old frame". The render thread will therefore always be rendering the latest state of the app.
Do you need help? What have you tried?

Image

Angels can fly because they take themselves lightly.
User avatar
Zeal
Ogre Magi
Posts: 1260
Joined: Mon Aug 07, 2006 6:16 am
Location: Colorado Springs, CO USA

Post by Zeal »

The render thread will empty the queue of messages before rendering its next frame; whether your other threads have removed "old" messages or your render thread just plows through the waiting messages, it will *always* be rendering the latest available state of the overall application.
We need to settle this one once and for all (you might very well be in the right).

It seems we have two opposing points of view.

You say - Update the scene state (object positions, ect) as fast as you can, and pass the results on a individual basis to the render thread. When the render thread begins rendering a scene, it takes the current state of all objects (even if half of the objects are 'between' update states). Im sure it would work fine, however...

I say - Only pass a complete set of scene state data (object positions, ect) to the render thread.

The reason I say that is because it seems (I may be wrong) that your method would require a TREMENDOUS amount of resource locking (relative to mine). And the only benefit youd get from it would be 'you can render a partial frame' (which I maintain isnt THAT big of a deal). With my method you have MUCH fewer resource locks because A.) youre working with buffers, and B.) the buffers represnt a large set of data (resulting in fewer total resources to lock when it comes time to swap the buffers).
User avatar
xavier
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 9481
Joined: Fri Feb 18, 2005 2:03 am
Location: Dublin, CA, US
x 22

Post by xavier »

What resources need locking? You shouldn't be locking anything. If you are talking about locking vertex buffers, that's Ogre's business; in the canonical Ogre world, you are not touching those unless you have a good reason. For example, vertex buffers only get locked if you are deforming mesh; otherwise, you are just calculating new transforms. If you don't like recalcing transforms because they might be stale, then if you are just passing raw position, orientation and scale (as opposed to incremental updates such as translations and rotations), you can discard or overwrite redundant messages when the message is inserted (on the logic side, if you are using a priority queue) or do a presort on the render side and discard old messages before processing any of them.
Do you need help? What have you tried?

Image

Angels can fly because they take themselves lightly.
User avatar
Zeal
Ogre Magi
Posts: 1260
Joined: Mon Aug 07, 2006 6:16 am
Location: Colorado Springs, CO USA

Post by Zeal »

No im not talking about vbuffers, or other ogre related stuff (thats ogres business as you pointed out).

So youre saying you could pass object positions on a individual basis, without having to worry about what the render system is doing? I guess that would work if you had some kind of intelligent queue... something that would allow you to 'write' to the queue, even while the render thread is 'reading' from another part of the SAME queue.

Thats what youre getting at right? Can you elaborate a bit on how that would work?
User avatar
Falagard
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 2060
Joined: Thu Feb 26, 2004 12:11 am
Location: Toronto, Canada
x 3
Contact:

Post by Falagard »

I think xavier misunderstood what you meant by locking resources, and buffers.

Zeal is simply talking about the state data he's passing between threads (positions, orientations, possibly velocity of physics bodies, for example).

This has been discussed many times on these forums by the way. I personally agree with Xavier.

A physics simulation running on a separate thread can send physics state updates to the rendering thread at whatever frequency it is running. Chances are that you'll want it running *faster* than the rendering thread anyhow. These state updates can be done as messages, but each message can contain the complete state of the objects - at a minimum all that is required is an id that represents the object, position and orientation, unless other game specific values are needed such as velocity. The physics thread can lock the message queue on the rendering thread, post the message, and then unlock. It's a thread-safe message queue. The other benefit is that such a system could be generalized and used for other systems such as AI, or whatever might be able to process on a separate thread.

The rendering thread, each loop, can look into the queue and process the messages before rendering occurs. Let's say the physics thread is running faster than the rendering thread - there may be multiple updates in the queue from the physics system since the last time the rendering thread checked the queue. Each message can be processed in order - which will simply update each object's position and orientation from the data. Without even removing duplicate messages, this system will work. Let's say you have 3 update messages from the physics system. The overhead of iterating 100 objects and setting position/orientation of those objects 3 times per frame isn't going to be significant. Alternatively, you could look into the queue and only get the latest update, removing any other updates from the queue without processing them.

The only reason why I didn't implement this system in my own engine is because I'm hoping that physics engines will end up running their internal systems in a separate thread already, saving me from having to do it on my own. I've read something about PhysX having the ability to do this, so it doesn't make sense to add my own overhead of a threaded system for physics. The same threading architecture would be useful for other systems such as AI, pathfinding, etc. though so it's something I may still look into.
Last edited by Falagard on Fri Dec 01, 2006 12:00 am, edited 1 time in total.
User avatar
xavier
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 9481
Joined: Fri Feb 18, 2005 2:03 am
Location: Dublin, CA, US
x 22

Post by xavier »

Zeal wrote:something that would allow you to 'write' to the queue, even while the render thread is 'reading' from another part of the SAME queue.
That's the basis of message queuing. You don't actually have to lock the queue on either end, but if you wanted to you could. In a truly concurrent system, synchronization objects are not really necessary in this case.
Do you need help? What have you tried?

Image

Angels can fly because they take themselves lightly.
User avatar
Zeal
Ogre Magi
Posts: 1260
Joined: Mon Aug 07, 2006 6:16 am
Location: Colorado Springs, CO USA

Post by Zeal »

That's the basis of message queuing. You don't actually have to lock the queue on either end, but if you wanted to you could. In a truly concurrent system, synchronization objects are not really necessary in this case.
OK I guess im starting to see the light :p. And youre sure such a queue wouldnt be any slower than the double buffering techinique ive been kicking around?

God why did they even mention double buffering in that article? If using a simple queue in this situation is as wonderful as you say it is...
User avatar
xavier
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 9481
Joined: Fri Feb 18, 2005 2:03 am
Location: Dublin, CA, US
x 22

Post by xavier »

I wouldn't say it's wonderful -- it's just a common solution to a particular problem, that of inter-process communication. It makes decoupling systems from each other much simpler (possible, actually).
Do you need help? What have you tried?

Image

Angels can fly because they take themselves lightly.
User avatar
Falagard
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 2060
Joined: Thu Feb 26, 2004 12:11 am
Location: Toronto, Canada
x 3
Contact:

Post by Falagard »

That's the basis of message queuing. You don't actually have to lock the queue on either end, but if you wanted to you could.
Really? Can I ask how this is possible?
If you're writing from the queue from one thread, and reading from the queue in another, wouldn't a lock be necessary?
User avatar
Zeal
Ogre Magi
Posts: 1260
Joined: Mon Aug 07, 2006 6:16 am
Location: Colorado Springs, CO USA

Post by Zeal »

Really? Can I ask how this is possible?
If you're writing from the queue from one thread, and reading from the queue in another, wouldn't a lock be necessary?
Yes please elaborate a bit. Ive implemented simple messaging/queue systems before, but I dont see how it could be done without any locks.
User avatar
xavier
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 9481
Joined: Fri Feb 18, 2005 2:03 am
Location: Dublin, CA, US
x 22

Post by xavier »

Falagard wrote:
That's the basis of message queuing. You don't actually have to lock the queue on either end, but if you wanted to you could.
Really? Can I ask how this is possible?
If you're writing from the queue from one thread, and reading from the queue in another, wouldn't a lock be necessary?
Not really. Disclaimer: this is all entirely theoretical, I have not tried this (as it would not really work in a hyperthreaded or single-processing system).

First, I wouldn't have either thread own the queue; it could exist in its own thread, or in a general manager thread that holds all queues. The two threads that need access to the queue then can register with either end of it through the manager.

Since the queue state (really just a pair of pointers and a count) is managed in an uninterested (and uninterrupted) third party, and you have only one thread pushing and another popping, there can be no uncertainty about the state of the queue on the popping side (the rendering thread) if you access the queue in such a way that (a) you check for empty() (which is based on queue element count) before popping, and (b) the queue itself adds/deletes elements before updating the element count.

Since the entire system is timestepped and not truly continuous, and running on the same basis system clock, there can be no uncertainty about the value of the element count at the time that its value is transferred to the rendering queue: it either is zero, or it is not. If it is zero, then the rendering queue does not bother to try to pop an element, even if at the same time (in the next clock cycle) the queue updates the count to reflect a newly added element; that element just gets processed next time the rendering thread comes by. If the count is greater than zero, then it is guaranteed safe to pop an element because the queue has updated the count *after* updating the element pointers. Note that it does not matter what the threads on the other side of the queue are doing in this case; the only number that matters is zero, and only to the rendering thread. Everyone else is just adding stuff to the queue (never deleting) on the other end, which therefore has no effect on the rendering side.

This is one of those instances where the subtleties of truly concurrent processing diverge in unintuitive ways from our normal single-threaded (or even multithreaded) way of thinking.

Of course, I could be entirely off my rocker, but I am confident this would work in practice (on truly concurrent machines like PS3 and XBox360). This probably would not work for single-threaded or hyperthreaded machines simply because each thread is not actually free-running; they get interrupted to allow other threads to run (note that hyperthreaded machines are still single-threaded when it comes to data access, whereas the PS3 and XB360 maintain separate memory for each core).
Do you need help? What have you tried?

Image

Angels can fly because they take themselves lightly.
User avatar
xavier
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 9481
Joined: Fri Feb 18, 2005 2:03 am
Location: Dublin, CA, US
x 22

Post by xavier »

Zeal wrote:
Really? Can I ask how this is possible?
If you're writing from the queue from one thread, and reading from the queue in another, wouldn't a lock be necessary?
Yes please elaborate a bit. Ive implemented simple messaging/queue systems before, but I dont see how it could be done without any locks.
To go a bit further on this, you could use locks if you like -- the actual time spent in adding a message to a queue is rather small, and you are not likely to experience any delay in the inserting thread. Remember that you are not waiting for the message to be processed on the other end (at least I hope you are not). It's like UDP in this sense; send-it-and-forget-it.
Do you need help? What have you tried?

Image

Angels can fly because they take themselves lightly.
User avatar
Zeal
Ogre Magi
Posts: 1260
Joined: Mon Aug 07, 2006 6:16 am
Location: Colorado Springs, CO USA

Post by Zeal »

Hrmm so youre sure the update thread could push a message (potentially quite large, containing a node ptr, some xyz positions, ect...), and at the EXACT SAME TIME the render thread could read the same queue, no problem? Like you said it wouldnt really matter if the render thread 'missed' the newly pushed piece of data for one frame (since it would just process it next frame), but youre sure all this could be done without locking?
User avatar
xavier
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 9481
Joined: Fri Feb 18, 2005 2:03 am
Location: Dublin, CA, US
x 22

Post by xavier »

In a concurrent system, yes. Have I tried it? No, I don't have a concurrent system to try it on. It probably would work on a normal hyperthreaded or singlethreaded system; I haven't tried it there yet either. I don't see any reason for huge messages, though, and you absolutely do not want to be passing pointers around -- you have no guarantee that they will have any meaning on the receiving end.
Do you need help? What have you tried?

Image

Angels can fly because they take themselves lightly.
Post Reply