Threaded Game Engine

Discussion area about developing or extending OGRE, adding plugins for it or building applications on it. No newbie questions please, use the Help forum for that.
Post Reply
klauss
Hobgoblin
Posts: 559
Joined: Wed Oct 19, 2005 4:57 pm
Location: LS87, Buenos Aires, República Argentina.

Post by klauss »

It depends on the game. Some games have more renderables than AI. Some have a similar number of them, and some have more AI than renderables.

Some games need full all-against-all collisions, some don't.

If you're in a scene with a few thousand entities, all dynamic (ie - all moving according to a certain law or AI) and all need to be collided against each other, physics can be very demanding. In such case, physics would have to run slower than graphics. Notice that sometimes particle systems need interparticle collisions and sometimes they don't, so it's not always so impossible to find oneself in the hard case.

If your physics only entail updating the positions of a handful of entities, like with many other kinds of games, it's different.

In any case, I wasn't talking about that exactly, but the idea of updating graphics without waiting for a consistent state - no matter how many times your physics thread updates the structures, if you catch it when it hasn't updated all entities the same number of times (or synchronized them all to a common "game time"), you end up with (minor or major) inconsistencies. That's the point. Of course, if your physics run slow, the discrepancies will be bigger and the issue is bigger - but it still exists even if your physics run fast, it's just a different degree of noticeability.
Oíd mortales, el grito sagrado...
Hey! What is it with that that?
Wing Commander Universe
User avatar
Falagard
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 2060
Joined: Thu Feb 26, 2004 12:11 am
Location: Toronto, Canada
x 3
Contact:

Post by Falagard »

if you catch it when it hasn't updated all entities
This could be avoided by sending an update message from the physics thread to the rendering thread when all entities have been updated containing the entire state of all objects (small amount of data per object is needed anyhow), instead of a message per entity.

A few thousand entities requires a huge amount of optimization and careful consideration anyhow. You'd probably break your scene into multiple physics spaces, updating the closer bodies at a higher frequency than those further away. Most scenes have a couple hundred physics bodies, not thousands. This is generalization, but then again, I said "most" ;-)
klauss
Hobgoblin
Posts: 559
Joined: Wed Oct 19, 2005 4:57 pm
Location: LS87, Buenos Aires, República Argentina.

Post by klauss »

Falagard wrote:A few thousand entities requires a huge amount of optimization and careful consideration anyhow. You'd probably break your scene into multiple physics spaces, updating the closer bodies at a higher frequency than those further away. Most scenes have a couple hundred physics bodies, not thousands. This is generalization, but then again, I said "most" ;-)
Exactly. I have a few ideas into how to accomplish that that I'm eager to put to the test, but I have exams every day until Jan 1 20007 (and there isn't an extra digit).

Anyway, multicore is all about the ability to implement those extremes, after all. If you're implementing something for multicore, you have to aim that high, or you're just killing an ant with a tactical ballistic missile.
Oíd mortales, el grito sagrado...
Hey! What is it with that that?
Wing Commander Universe
OvermindDL1
Gnome
Posts: 333
Joined: Sun Sep 25, 2005 7:55 pm

Post by OvermindDL1 »

Wow... Really wish I saw this thread back when it started. I started making a multi-threaded game engine using non-locking styles a few weeks before this sizable thread started up, and I have not been to this forum since then, combination of drastic lack of time and so forth. Regardless, I have not got near as far along as I would have liked by this point (due entirely to the lack of time mentioned above), but I have still apparently got further then what this thread indicates (and I have not worked on it in a few weeks, was starting back on it today now that I have time again). I was not even searching for this subject here, I just accidentally ran across it

So, if anyone is still interested in this thread, I have some more (still rather green) experiences to add to this. First of all, I use boost rather heavily in this test engine. I have as of yet used even a single mutex or other synchronization primitive, and I have been testing it equally on both a single-core and a dual-core athlons (I do not have direct access to quad-cores or Intel's). First of all, I have it dynamically create threads based on the number of cores detected (using some assembly I probably should not use), or can be overridden by passing a thread count to the engines' constructor. Thread's can be dynamically created and destroyed during run-time. Thread's are boost::thread's and I use the thread-group to keep them together. The main thread runs the graphic engine (OGRE of course) and parses through the renderThread's non-locking message queue. There is also a global message queue.

The global message queue is the more primitive one as it was the first, but it will be changed to the way the renderThread message queue and so I will talk about it will work, probably by this evening. First, push'ing data. It accepts a boost::function reference for the function signature (which is usually a functor in this case). First, it atomically get's a new boost::function from a pool and sets it equal to the passed in &boost::function. Then it gets a copy of the head and tail (just size_t's currently, holds an index into the data array, which is actually quite a bit smaller then 2^32 cell's in size obviously, thus all index lookup's are % with the actually size, which is hard-coded as a const var right now), and does head minus tail and tests the result to see if it is less then the data_length, if not, returns false, if so, continues. If it continues then it atomically sets the place on the data array (data is defined as an array of pointer's to my boost::function typedef's) to the location of the boost::function thing I got out of the pool, with the atomic test being zero, so if it fails, that means another thread got ahold of it before we could set it, so it repeats the above (first incrementing the real head only if it still equals the local copy, then copies the real head/tail struct into a local copy and tests to make sure it is less then data_length) and tries that new index location and keeps repeating until the data_length is exceeded or it finds one that succeeded. If succeeds then it atomically sets the real head with the local head plus 1 and returns true. If at any time it returns false, the boost::function it got from the pool is returned.

When pop'ing data from the end of the message queue, it gets a copy of the real head/tail into a local copy, if the head minus tail is less then or equal to 0, it returns a null(0) pointer. If it continues, it then gets the data item at the index of the local tail index and then tries to set that location to zero with the test being the value we just pulled, thus if that fails that means something had perfect/awful timing and got it right before we did (in the order of a 1 clock tick window), so it then increments the real tail if it still equals the local copy of the tail, then repeats the above, if it succeeds then I have my pointer, so I then try to increment the real tail if it still equals the local tail. I then dereference the pointer and call the function ( (*workPtr)(); ) and test the return value, if zero I then return the boost::function ptr to the atomic pool, if 1 I push it back on the queue, if 2 I then push it onto a priority pool (which is designed to only have things pushed onto if the work needed to do something that it could not yet do due to some condition or what-not, so it will be called again rather quickly).

If I want the length of the current queue I just do the head minus the tail. Overall it has seemed to work perfectly, in testing all the pushed messages have been called the amount of times they should have been called and have had no drops. My current method of pushing is to spin it, keep trying to push when it is full until it is no longer full and it succeeds, so I assume *all* work is important for now. The graphics queue is a little different in that it does not operate on function pointers, but just takes a pointer to an Actor class, since if that class pushes itself on the renderThread messages queue, that means that the graphical version of that object has changed since the last frame (since physics will most likely update far faster then graphics, there may be multiple updates, but this basically coalesces them all into one message). The render thread will try to empty that queue, but if it is too long (too much time taken to be exact) then it will stop going through (will always parse a minimum amount however), render a frame and do the OS message loop, and continue where it left off. If there is extra time in the render thread that is being wasted (no point rendering at 300fps if the screen only updates at 75fps or what-not) then it will parse some engine message worker's as well (keeping testing to see if the renderThread message queue has any and dealing with them as well) until enough time has elapsed and it renders frame and updates the os message loop again and repeats. The network thread is currently bound to the first generated thread (that cannot be killed) or if single core, will run in the render thread as well. I am looking for a multi-threaded networking library I can more easily control (without locking preferrably, it really is easy once you deal with it, just different thinking compared to normal multi-threaded programming), but currently using newton, just updating it at a fixed rate as much as I can... My current test game design on this engine is moving to needing multiple physics world's (space sim test game over multiple solar systems) so that I can spread over multiple threads (I really *really* hope newton does not internally use any globals or singletons that are not multi-threaded safe across different physics worlds...).

So, thoughts about this design? Anything glaring that I should change before I start working on it too heavily again? I have made some multi-threaded apps before (business apps for work, used the normal locking primitives), but never a non-locking pattern before...

Also, I had a heck of a time trying to post this, site kept timing out at wierd times...

EDIT: Do not want emoticons appearing in code chunks...
Post Reply