Multithreading Ogre3D in Kromaia

A place for users of OGRE to discuss ideas and experiences of utilitising OGRE in their games / demos / applications.
Post Reply
User avatar
Antodologo
Halfling
Posts: 46
Joined: Tue Jul 13, 2010 4:05 pm
x 5
Contact:

Multithreading Ogre3D in Kromaia

Post by Antodologo »

Hi!

We have written a first draft about our motivations, and a general description of the implementation we have chosen to use multithreading in our game Kromaia.

http://www.krakenempire.com/blog/?p=39

We hope it will help someone looking for the basics to try himself :)

Feel free to comment whatever you think about it.
TheSHEEEP
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 972
Joined: Mon Jun 02, 2008 6:52 pm
Location: Berlin
x 65

Re: Multithreading Ogre3D in Kromaia

Post by TheSHEEEP »

Looking very good!

Nice explanation, really.
I'm sure this will help a lot of people understanding some basic multithreading problems and find solutions.
My site! - Have a look :)
Also on Twitter - extra fluffy
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5292
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Multithreading Ogre3D in Kromaia

Post by dark_sylinc »

Hi!

I've done a similar implementation, but I don't even sync. Position & Orientation is read in the graphic's thread. Race conditions aren't a problem as long as the next frame's calculation don't depend on the previous frame. Any error will be corrected in the next frame, therefore no need (almost) to sync.

I say almost because there are still moments when I need to sync (loading time and when destroying an object) Object creation is handled completely in parallel with using lockless algorithms)

I strongly do NOT recommend manipulating Ogre in any form from threads other than the Graphic's thread. Changing nodes while graphic's is inside renderOneFrame can cause the Octree (assuming you're using an OctreeSceneManager) to be left in invalid/incoherent states. Symptoms can vary from crashes (rare) to progressive slowdowns (more common) across time. In other words the game becomes slower the more time you spend on it.
To solve this just manipulate the nodes (creation, deletion, setPosition, setOrientation, set***Etc) in the Graphics thread. You can read the position & orientation data from the Physics thread.
You could sync, but I don't. If (i.e.) the Physics position is in an invalid state due to race conditions, it will be hopefully fixed in the next frame, but the Octree will always remain valid.
Alternatively you can manipulate the nodes from other threads as long as you have some way to ensure Graphic's thread isn't inside renderOneFrame.

Other reasons why you shouldn't manipulate Ogre in other threads is because of OpenGL, it will just crash in all operations where creating a hardware buffer is involved (textures, vertices, indices, render targets) whereas Direct3D will work ok (as long as you're not hitting a race condition inside the D3D API).

I've found many AMD users need the dual-core optimizer fix as they do with many other games. (note: the fix is actually just a change to the Window's boot.ini file adding the option "/usepmtimer" to tell the OS to use an alternative HW timer)

Cheers
Dark Sylinc
User avatar
Antodologo
Halfling
Posts: 46
Joined: Tue Jul 13, 2010 4:05 pm
x 5
Contact:

Re: Multithreading Ogre3D in Kromaia

Post by Antodologo »

@TheSHEEP Thanks, we just hope it is understandable :D

@dark_sylinc

Interesting points and explanations. By the way, we are not using octree scene manager.

As we say in our post, we tried modifying nodes while renderOneFrame is being executed on another thread and it is clearly something to avoid. We experienced random crashes (the crash has a reason, but the moment is random) after (but not immediately after) modifying or deleting nodes.

At the moment we modify graphic nodes from the logic thread, but it is done when there is no render in progress, on synchronization, and it works perfectly. No crashes (never) and no slow down over time.

What I haven't understood from your comment is what have you done in your implementation. I mean, if you get and set positions and orientations on the graphic thread, what are you executing in the other thread? We wanted the logic to be executed at, at least, 100 FPS and independently from graphics. We don't mind if graphics are being rendered at 10 or 100 FPS, the game logic should be fluent always, even if the rendering is not. That is why we tried the single threaded solution we explain first, and a multithreaded one later to make the logic thread work as fast as possible and at a stable framerate without the graphic framerate dependency.

We have no invalid positions or states because the logic info (including complete physical state) is updated on the logic thread and the graphic info is synchronized at the start of a graphic frame. We can't afford an invalid position even if it is fixed on the next frame. Lucky you if you can ;)

I hadn't read about the AMD problem before, we are using Intel, but now I am curious. I will make some tests as soon as I get my hands on an AMD :P

Thank you for your comments

Greetings
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5292
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Multithreading Ogre3D in Kromaia

Post by dark_sylinc »

Hi!
Antodologo wrote: What I haven't understood from your comment is what have you done in your implementation. I mean, if you get and set positions and orientations on the graphic thread, what are you executing in the other thread?
I'm using Havok to run the physics. So the physic's thread is heavy on physics integration. Furthermore my own stuff, such as velocity, acceleration, "if( playerInsideXXRegion )", "if( enemyDead )", AI, etc, etc is handled on that thread. The physic's thread then caches a position & an orientation; then the graphic's thread reads it and calls SceneNode::setPosition & setOrientation. The unsafe part is that the physics thread may be writing to this cache while the Graphic's thread is trying to read from it. But this won't likely happen very often; therefore if it happens, the SceneNode will likely be fixed in the next frame.
Note that the physic's thread (which is the one that matter) has always the correct values. What you see may not be what it really is.
Antodologo wrote:We wanted the logic to be executed at, at least, 100 FPS and independently from graphics. We don't mind if graphics are being rendered at 10 or 100 FPS, the game logic should be fluent always, even if the rendering is not. That is why we tried the single threaded solution we explain first, and a multithreaded one later to make the logic thread work as fast as possible and at a stable framerate without the graphic framerate dependency.
Indeed, I was driven by the same idea. Thousand of objects and I keep getting steady 60 fps in the logic/physics side; while graphics is running in the range 20-40 fps depending on what you're looking at (steady 60 fps when using the minimum quality settings)
Antodologo wrote:We can't afford an invalid position even if it is fixed on the next frame. Lucky you if you can ;)
While I understand there may be cases when this is true; I'll repeat myself that it's what the user will see that is invalid and is corrected in the next frame, but internally nothing is invalid.
The user will just see (if he notices it) some flicker or shake.
Antodologo wrote: I hadn't read about the AMD problem before, we are using Intel, but now I am curious. I will make some tests as soon as I get my hands on an AMD :P
I've heard latest Phenom & Athlon models have fixed the issue. Models AthlonX2 between 4000 & 6000 have this bug for sure (running in Windows XP; IIRC Windows 7 defaults to the PM timer now in those systems)
If you're using QueryPerformanceCounter somewhere in your code it will work buggy in those AMDs. Otherwise it's hard to tell and you'll have to try it.
Symptoms can vary from timeSinceLast frame becoming negative, stuttering, lower/higher than usual (which may experience some kind of "fast forward" followed by sudden "slow motion").
Even using fixed frame in logic won't prevent you from those bugs because the logic/physics thread needs to know how much time to wait before calling updateLogic() & updatePhysics() again; and therefore ends up calling updateLogic( 1 / 60 ) much more/less than 60 times per second.

Even single threaded games may experience this problem if the process ends up ping ponging between the different cores; which can be prevented by locking the process to only one core. In fact MSDN recommends this (See remarks, by calling SetThreadAffinityMask) Note their example code is broken, SetThreadAffinityMask should use 1, not 0.

Normally AMD owners are already aware of this problem and they'll probably have the fix installed in their machines. Nevertheless there's always someone who didn't; and if your target audience aren't hardcore gamers (i.e. casual) they're less likely to be aware of the cause of the problem.
You'll notice many games include in their readme that AMD users may experience some problems during gameplay and they'll need to install the Dual Core optimizer (which is a lot friendlier to say "manually edit your boot.ini file")
For example Assassin's Creed games include this notice.
Here you can find a user talking about the problem and different approaches to fix it.

Cheers
Dark Sylinc
User avatar
Antodologo
Halfling
Posts: 46
Joined: Tue Jul 13, 2010 4:05 pm
x 5
Contact:

Re: Multithreading Ogre3D in Kromaia

Post by Antodologo »

Hey

I understand your implementation now. You are just executing the two threads without synchronization and the problem you can face is that some objects can be rendered on the "current" physics frames and others on the "last" physics frame. Doesn't sound like a big problem :P

We synchronize the positions and orientations that have changed from the last graphics frame before allowing the next renderOneFrame. In exchange we avoid the set of position, orientation, scale... to the graphic part along physic frames until the next graphic frame is reached. I can't say it is better or more efficient than your solution, we just prefer this implementation as I suppose you prefer yours :wink:
dark_sylinc wrote:I'll repeat myself that it's what the user will see that is invalid and is corrected in the next frame, but internally nothing is invalid.
The user will just see (if he notices it) some flicker or shake.
Yeah, I was referring to the fact that we can't afford any flicker or shake. We are trying to convince people that we have the most fluent game in the world and... :wink:


Thanks for the AMD info, we will have to consider adding it to the manual or something :P

Greetings
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5292
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Multithreading Ogre3D in Kromaia

Post by dark_sylinc »

Antodologo wrote:I understand your implementation now. You are just executing the two threads without synchronization and the problem you can face is that some objects can be rendered on the "current" physics frames and others on the "last" physics frame. Doesn't sound like a big problem :P
Cool, yeah that's it.
Nevertheless take into account race conditions are a bit more complex.

Suppose object X goes from (0,0,0) to (1000,1000,1000) in only one frame. Position is now being updated at the same time the graphics thread is trying to read from.
The execution flow could go as follows:

write x = 1000
read x (=1000)
read y (=0)
read z (=0)
write y = 1000
write z = 1000

The object is rendered at (1000,0,0) although it wasn't supposed to ever be there. This is an extreme case where change over frame was very abrupt and it just happened both threads are accessing the same variables at the same time. But is good to be aware of the drawback of my implementation (and to be aware of how race conditions work in general)
Antodologo wrote: We synchronize the positions and orientations that have changed from the last graphics frame before allowing the next renderOneFrame. In exchange we avoid the set of position, orientation, scale... to the graphic part along physic frames until the next graphic frame is reached. I can't say it is better or more efficient than your solution, we just prefer this implementation as I suppose you prefer yours :wink:
Yeah, I was just talking about my similar experience. My approach is more scalable due to lack of locks, but at the cost of visual quality ("fluent" graphics) and adds the burden of profiling the code because two cores accessing the same memory regions at the same time with read & write access can potentially cause a lot of cache misses; hurting performance.

As an advantage, I found that when locking like you do (my original implementation did use locking almost exactly the way you do); depending on how the lock is implemented; one of thread's framerate is tied to the other because it has to wait. Most likely your graphic's thread can't go faster than the physics one (or viceversa, depending on how & where the lock is placed). I don't think that will be a problem for you; but it's good to be aware of it.

None of us invented this technique though, it's called "render split" model, which became famous because Microsoft was advising to use it to take advantage of the 2 cores present in the original XBox (can't find the article now, it was old); but the way they presented it used almost doubled memory usage (i.e. twice the position variables) which was a lot to ask considering the amount of RAM a console has.
Antodologo wrote: Yeah, I was referring to the fact that we can't afford any flicker or shake. We are trying to convince people that we have the most fluent game in the world and... :wink:
That's completely understandable ;)

Good luck with your game!
Cheers
Dark Sylinc
User avatar
Antodologo
Halfling
Posts: 46
Joined: Tue Jul 13, 2010 4:05 pm
x 5
Contact:

Re: Multithreading Ogre3D in Kromaia

Post by Antodologo »

dark_sylinc wrote:Nevertheless take into account race conditions are a bit more complex...
Good example. That "impossible" position shouldn't be too far away from the real position but, of course, it is a problem for graphic fluency. Personally, I would be more worried about writing each of the floats. Is writing a float an atomic operation? Can be interrupted halfway? And for doubles? They should be atomic, but... it is so scary... :lol:
dark_sylinc wrote:As an advantage, I found that when locking like you do (my original implementation did use locking almost exactly the way you do); depending on how the lock is implemented; one of thread's framerate is tied to the other because it has to wait. Most likely your graphic's thread can't go faster than the physics one (or viceversa, depending on how & where the lock is placed). I don't think that will be a problem for you; but it's good to be aware of it.
Yeah, usually you implement all this taking one of the threads as "master". Of course our main thread is the logic/physics as it is the one that needs to run at, at least, 100 fps. Anyway you could make both threads independent using a third one as "master" to control synchronization of the other two. Take into account that, in our case for example, this would be useless, because if you render twice before a new physics frame has finished, you are rendering the same image than the last time you rendered. There are no changes in the screen if there are no "logic" changes :D
dark_sylinc wrote:None of us invented this technique though, it's called "render split" model
I assume I will never invent anything new, I am sure this idea comes from a time without computers as many other :lol: But you can always reinvent it if you don't know it exists :P I settle for choosing the best and easiest solution possible ;) (If only I could know all the existent and already invented solutions for every problem... :D)
dark_sylinc wrote:Good luck with your game!
Thank you! Good luck with yours too ;)
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5292
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Multithreading Ogre3D in Kromaia

Post by dark_sylinc »

Antodologo wrote:Is writing a float an atomic operation? Can be interrupted halfway? And for doubles? They should be atomic, but... it is so scary... :lol:
All operations of same size of the architecture are guaranteed to be atomic as long as they're aligned.
In english, this means that in a 32-bit system writing a float to memory is atomic as long as the float is aligned to 4 bytes in memory. If they're not aligned, the behavior is undefined (no guarantees)
I can't remember what happens with doubles since they're 64-bit when running in 32-bit mode. When compiled in 64-bit doubles it's for sure they're guaranteed to be atomic too (but they'll have to be 8-byte aligned)

Note that doing simple stuff like "x = x + 1" is not atomic. Since the actual operation is "load X" "sum 1" "store X". That's three operations. Your "X" won't be interrupted halfway if it's aligned (so you won't hit a shocking NaN or something) but it's possible that if two threads run the same operation, the result is X+1 instead of the expected X+2.
For example:
Thread 1: Load X = 0
Thread 1: Sum 1
Thread 2: Load X = 0
Thread 2: Sum 1
Thread 1: Store X (X = 1)
Thread 2: Store X (X = 1)

The result is X = 1; the expected value would have been X = 2. At least you won't get a NaN or other weird values because X was being stored (unless working with unaligned memory) while trying to read from another thread

If you want to do such stuff, use the Interlocked functions which are guaranteed to be atomic as long as the memory is aligned (even with 64-bit values).
They operate at a HW level using a "lock" prefix in the instructions, which makes them incredibly fast. Around 50x faster than using a mutex.

If you're interested, I wrote an article about it. The article was revised by Intel and they added some trademark legal stuff (added a "*" when mentioning a registered trademark) and in the meanwhile the formatting was corrupted. Hope you can still understand it :)
Antodologo wrote: Take into account that, in our case for example, this would be useless, because if you render twice before a new physics frame has finished, you are rendering the same image than the last time you rendered. There are no changes in the screen if there are no "logic" changes :D
Unless you're using interpolation to smooth things up. Most games actually run their logic/physics at 25-30 fps and try to render at 60 fps masking the low framerate through interpolation. Fast games like car racing & fighting (i.e. Street Fighter) need more framerate (usually 60 fps)

Of course, when your game internally runs at 100fps, it probably doesn't make sense for graphics since most monitors are 60hz; except 3D LCDs which are 120hz but that's because they need to render the exact same frame twice (but redrawn for the right eye).

One tip though, it's probably wiser to update your logic at round multiples of the graphics update. So if you're tipically drawing at 60fps, your logic should be updated at 15/30/60/90/120/240.
But updating at 100 fps you may experience some miss or jagged response between user input (i.e. a keyboard) and what he sees. "It feels" better when rounded.
User avatar
Antodologo
Halfling
Posts: 46
Joined: Tue Jul 13, 2010 4:05 pm
x 5
Contact:

Re: Multithreading Ogre3D in Kromaia

Post by Antodologo »

Interesting references and advices :wink:

Our game is from another galaxy ( :lol: ). The core is closer to a simulation game (with a focus on fluency and physics) while we are looking for an arcade gameplay (in a physical world). That is one of the reasons why we have made so many different and strange decisions :P Our physics are updated 100 times per second (they could be updated as much as possible, but Bullet has some problems with variable timesteps), but our input is updated "always" and that can be thousands of times per second. We have a special control for that too.

Anyway, we hope all this comments will help someone else. Thanks for your contributions :wink:
Valentin Perrelle
Halfling
Posts: 54
Joined: Thu Sep 15, 2011 4:14 pm
x 2

Re: Multithreading Ogre3D in Kromaia

Post by Valentin Perrelle »

dark_sylinc wrote:Other reasons why you shouldn't manipulate Ogre in other threads is because of OpenGL, it will just crash in all operations where creating a hardware buffer is involved (textures, vertices, indices, render targets) whereas Direct3D will work ok (as long as you're not hitting a race condition inside the D3D API).
I have a problem related to this. I generate some geometry dynamically and it requires a lot of cpu time to do so. Are you saying that if i want to do this in another thread (for parallelism and the ability to process variable number of geometry chunks between frames) i have to first create a copy of the buffer in central memory and let the graphic thread (or wait for some synchronisation) effectively create the vertex buffer from it ?
Post Reply