New InstanceManager: Instancing done the right way

Discussion area about developing or extending OGRE, adding plugins for it or building applications on it. No newbie questions please, use the Help forum for that.
Post Reply
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: New InstanceManager: Instancing done the right way

Post by dark_sylinc »

Oops, remove the assertion "assert( !(meshReference->hasSkeleton() && !indexToBoneMap) );" at file OgreInstanceBatch.cpp, line 54
I'll fix the patch later.

Anyway "300" is an arbitrary number. Higher values without flags may work better/worse for you.

Cheers
Dark Sylinc
LBDude
Gnome
Posts: 389
Joined: Mon Jul 26, 2010 10:53 pm
x 22

Re: New InstanceManager: Instancing done the right way

Post by LBDude »

Hey thanks. I will try that.

I modified your little motion update code in the test by changing it to translate (keeping the sin based oscillation) and iterating through the instances. What I found is the performance is on par with my results, give or take 10 fps. For example, when rendered 50x50 I get around 20fps while your test code gets around 30fps. Well, my "crowd" code is only partially parallelized so I expect it's performance to go up.

Either way I'm cool with this. I can keep the number down to around 1600 entities I will get 40-60fps. Which is plenty for messing around with game play, at this stage of development. I will release when I'm done with implementing a little bit game play.

Thanks!
My blog here.
Game twitter here
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: New InstanceManager: Instancing done the right way

Post by dark_sylinc »

I've done extensive testing with many moving scenenodes, and also being animated.

Animating the 10.000 entities caused a major FPS difference (2fps vs 6fps VTF)
Moving the 10.000 (but no animation) entities caused a CPU bottleneck (6fps vs 6fps VTF)
Moving and animating the 10.000 entities gave 2fps vs 6fps VTF

When instancing is on, whether moving and/or animating, there's a lot of time spent at _updateAnimation. It can't be faster than that (it's SSE optimized, memory is aligned and no cache misses!!!)

I'm afraid the main bottleneck is at updating the scene nodes. There's not much more to it.
OGRE can't handle that many objects being updated and visible at the same time with current hardware.
The only solution would be to update these scene nodes in parallel (and also _updateAnimation is a prime candidate) but unfortunately Ogre hasn't been built with much (scalable) multi threading in mind.

If you're going to use my Instancing code, one thing you can try is this, which gets me a couple more FPS (specially when I'm not looking at the units like +15fps boost, which is something). It works because the SceneNodes used for InstancedEntity are actually dummy nodes:

Code: Select all

//Don't use the SceneManager to create them
pseudoRootNode = OGRE_NEW SceneNode(0);
pseudoRootNode->_notifyRootNode(); //Instancing need this to think the InstancedEntity is in SceneGraph after attaching.

/* init code around here */
SceneNode *node = OGRE_NEW SceneNode(0); //Don't use createChildSceneNode
pseudoRootNode->addChild( node );
node->attachObject( myInstancedEntity );

/* later when updating all your positions */
myNode->setPosition( newPos );

myNode->_getDerivedPosition(); //Forces an _updateFromParent()
myNode->_updateBounds(); //Needed by Instancing
A bit hacky, but it might be even possible to update them from multiple threads in parallel out of the box with that since there's no SceneManager (i.e. octree)

which access is being shared. It's just updating derived positions and bboxes which ought be local and thread safe.
Also you'll need to take care of deallocation manually. With this you may be able to squeeze much more performance, at the expense of baby sitting the scene nodes.

If you can't put them in parallel and you insisist in using a large number of entities which are visible all the time, then I'm afraid Ogre is not well suited for you.
Don't ask me how to thread them, I'm pointing out directions, and have luck. Can't do more than this for you

Cheers
Dark Sylinc

PS: Fixed a bug where bbox was not being updated causing some entities to dissapear when they move under certain conditions, haven't updated the patch yet.
LBDude
Gnome
Posts: 389
Joined: Mon Jul 26, 2010 10:53 pm
x 22

Re: New InstanceManager: Instancing done the right way

Post by LBDude »

yeah you are right. I had initially thought from looking at some Nvidia and AMD "crowd" instancing demos that it would be possible to render < 5000 number of entities at decent frame rate in Ogre. I also had hoped from messing InstancedGeometry but without skinning that I would at least get similar performance with VTF and animation. I guess it's hard to do unless I do something similar as in those demos. My original plan was to see if I could use Ogre. If not, maybe implement my own animation and update completely in the shader (for animation maybe I could store complete animation states in the GPU and update them there). To do that though I have to learn the animation system.

I may even scale back my ambitions a bit and just go with an ISOMETRIC game. Performance wise from that perspective with 800 something rendered it's definitely playable. And still looks "cool" in terms of the crowd factor.

THanks!

P.S: I'm working on game play right now, but as soong as I made some headways with that I will look into ways to speed up the update and animation. My "crowd" update is pretty fast. I plan to upgrade it's psuedo number generator to use crypto hashing to make it even faster. So I have a lot of room to work from. If I could somehow make the animation and update faster, say by skipping scene nodes and animation states, I could probably make it work, without even doing parallelization on that. If not, then I will look into feasibility of doing it all on the GPU. I may still have problems with that even, because for my crowd stuff I use Radix sorting, which I can't test on the GPU because I don't have the hardware for it (on the CPU it's already pretty fast). I digress.

Update:

I probably don't even need that much. Really it depends on what type of game I make. I could probably do with much less number of entities active at a time and just end in waves.
My blog here.
Game twitter here
LBDude
Gnome
Posts: 389
Joined: Mon Jul 26, 2010 10:53 pm
x 22

Re: New InstanceManager: Instancing done the right way

Post by LBDude »

Sorry about hijacking your thread and this will be the last post I make on the subject. So I suppose Ogre's animation system does the forward kinematic transformation of the bone hierarchy on the GPU. It is probably possible to do this transformation using OpenCL. After which I would then maybe do a copy to a texture buffer which can then be used by the shader. This would effectively make the animation update parallel. Also, for this step I can use my world transform buffer which I computed earlier directly and thus be able to completely side step Ogre. The only problem with this is culling. Perhaps I can do some culling by doing a sort which sorts entities based on some key. I can then cull way entities this way and send the rest. Speed here is depended on the size of the batches I send. Or just brute force it all.

Another thing I'm looking into is Dual Quaternion. I was thinking maybe I could skip the forward kinematic step completely (also was looking into spline based skinning. But I want something that will play nice with the existing pipeline). I started reading this just now and my hope is that I can do the forward kinematic part completely in shader. I don't know, maybe it's all wishful thinking, the dual quaternion paper mentions converting joint matrices C1...Cn to dual quaternion and do the blending in the shader. Not sure if this implies forward kinematic in the shader or not, through the use of dual quaternion. It's bit too much math for me to wade through on a sunday morning lol.

Then maybe I don't need any of this. My vision for the game is simple graphics ala Multiwinina/Dawrinia. So I probably don't even need complicated animation. I could probably do without skinning animation even.

Thanks.
My blog here.
Game twitter here
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: New InstanceManager: Instancing done the right way

Post by dark_sylinc »

Hi!

I've updated the patch with some fixes....... and a sample for the sample browser!!
It's not 100% done yet, but allows a fast, interactive comparision of the instancing techniques, as well as showing clean C++ on how to do it.

You'll see some techniques are better when CPU culling kicks in, others are better when you're looking at the whole batch.
Also watch out when you're not seeing anything, which one achieves greater performance.
It also shows how the defragment with "optimize culling" option can help increase performance (mainly VTF gets the max benefits out of it)

Here are some screenshots:
Image
Uploaded with ImageShack.us

Image
Uploaded with ImageShack.us

Cheers
Dark Sylinc
LBDude
Gnome
Posts: 389
Joined: Mon Jul 26, 2010 10:53 pm
x 22

Re: New InstanceManager: Instancing done the right way

Post by LBDude »

That's cool. I will check it out.

Hey I ran into a ATI SDK demo on render to vertex buffer which does the animation interpolation in the pixel shader. It looks like they are able to render 10,000 skinned characters at decent frame rates.

http://www.lynxengine.com/old-site/ati.htm

Earlier I was thinking that I had to do forward kinematics now I realize I don't have to do that (it comes into play when baking the skeleton animation I think), So one only has to store the bone matrices for the animation frames in a texture, and interpolate them there, and render to texture. Then feed it back into the vertex shader.
My blog here.
Game twitter here
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: New InstanceManager: Instancing done the right way

Post by dark_sylinc »

LBDude wrote:That's cool. I will check it out.

Hey I ran into a ATI SDK demo on render to vertex buffer which does the animation interpolation in the pixel shader. It looks like they are able to render 10,000 skinned characters at decent frame rates.
What they're doing is exactly what I'm doing with VTF plus compute animations matrices.
Regarding the animation matrices, it boils to the same: it's the same problem as threading it in CPU (which I talked about already). So I won't bother and goes beyond the scope of my work (instancing).

Furthermore, threading animation updates is something that could even work without instancing.

On the other hand, that demo uses R2VB, which:
a. Is only available to D3D 9 ATI as a nasty hack. Here's Sinbad's opinion on the subject (note: I completely agree)
b. It could find some use in D3D 10, but the render system plugin still lacks stability. Plus even after 4 years, the Dx10 market is still low. I don't what about OGL.

Edit: Also what they're doing can be done with VTF. R2VB made sense when shader architectures weren't unified. Has R2VB now other uses where you want to reuse vertex buffers multiple times, which were processed the first time with R2VB, but not something like this.

Cheers
Dark Sylinc
LBDude
Gnome
Posts: 389
Joined: Mon Jul 26, 2010 10:53 pm
x 22

Re: New InstanceManager: Instancing done the right way

Post by LBDude »

No I didn't mean to say to use render to vertex buffer in your instancing code. I think you can accomplish the same thing with vertex texture fetch. I was just pointing out that it is possible to render lots of animated characters on screen at a decent frame rate. Specifically for Ogre it would mean skipping updating the scene nodes and animation state and to do application specific things. Also I don't mean to have you implement this with instancing at all. I will do this myself. I have most of the things in place already.

On the other hand you would think one of the major applications to instancing would be rendering 1000+ animated entities at 60+ fps. I know it is out of the scope of instancing but all I'm saying is if with instancing + scene node update + animation state updates slow everything down--then what is the point? It would be better to go with a customized solution that does animation and updates + instancing and do it at 60fps. (And clearly it is possible with Ogre if one skips scene node updates and animation state updates.)
My blog here.
Game twitter here
User avatar
Praetor
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 3335
Joined: Tue Jun 21, 2005 8:26 pm
Location: Rochester, New York, US
x 3
Contact:

Re: New InstanceManager: Instancing done the right way

Post by Praetor »

So you just want a way to exclude things from the internal scene node and animation update systems?
Game Development, Engine Development, Porting
http://www.darkwindmedia.com
LBDude
Gnome
Posts: 389
Joined: Mon Jul 26, 2010 10:53 pm
x 22

Re: New InstanceManager: Instancing done the right way

Post by LBDude »

Update: I wouldn't even worry about what I'm saying here about performance I don't even have any game play to speak of. Who knows, I may run into problems where I can't even have as much units as I want to have. So just like...ignore me. LOL.

well, I'm not sure :). I don't think just skipping the scene nodes and updating them directly does the trick. Because there are N entities that needs to have their animation interpolated, and N times again need to have their positions updated (not to mention CPU bottle necks, etc,etc). It's still O(n) but maybe the constant matters here due to some really tight constraints we're working with here. And this is all done with a single thread. Basically from what I see is that currently, when you update the positions/orientations and update animation (note that disabling either one only increase the FPS by 2 or 3 frames), this slows everything down (at 2500 entities FPS averages at 20fps when you have lots of entities in view). The updates seems to be the bottleneck. That's all I'm saying. I think that if I break it down and parallelize them somehow I can speed up this part. The way I want to do it is simply pass in my positions and orientations directly to the GPU and do all the transformations there. Which I think is possible.

Other than that I think I may still be confused. Perhaps the performance I'm seeing with these Nvidia and AMD demos is that they are using models with lower polygons than robot.mesh, which is the reason for the observed speed increase. But I don't know because we're not pushing that much triangles. Anyway I need to work this out myself :).

Basically I don't have a clue :).

BTW: From what I'm seeing if you work with around 1500 entities it's fine (this is on a quad core ATI 4800x2 machine 4gb ram). I can definitely see myself even working with that just divide them into waves. It's all about what games you are trying to make, I guess. So it's very application specific I think.
My blog here.
Game twitter here
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: New InstanceManager: Instancing done the right way

Post by dark_sylinc »

I am about to post an updated patch in SF.Net in about 30 minutes.

The patch is very polished now, with a nice sample for the sample browser. I am now fixing some GCC compiler errors, and then will test on more variety of HW.

After these testings, if all goes OK, I would like to push these changes into the repository.
Does anyone anything to say about it? (i.e. not to push it)


Cheers
Dark Sylinc

Edit:
Other than that I think I may still be confused. Perhaps the performance I'm seeing with these Nvidia and AMD demos is that they are using models with lower polygons than robot.mesh, which is the reason for the observed speed increase. But I don't know because we're not pushing that much triangles. Anyway I need to work this out myself
Because:
a. Those demos don't do any kind of scene graph, CPU culling, render queuing or anything whatsoever. It's pure raw processing, which works well to show specific stuff on very high end systems, but doesn't scale well to real world applications and/or with lower end hardware. Also Ogre is general purpose.
b. They use HW instancing, which we don't (yet)

Edit 2: Patch updated. Those who want to test it are welcome to try :D

Edit 3: Found a couple errors after trying on ATI HW. Under D3D9, a pixel shader 3.0 is used with a VS 2.0 for the grass plane. This is wrong. I'm fixing it.
Under OGL, looks like ATI doesn't like mixing GLSL with arbfp1 although this is allowed by the standard. Grrr....

Edit 4: Ok all bugs fixed. Path updated. I'm now waiting for confirmation to upload this to the repository
CABAListic
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 2903
Joined: Thu Jan 18, 2007 2:48 pm
x 58
Contact:

Re: New InstanceManager: Instancing done the right way

Post by CABAListic »

Before you commit it, I think this should get some testing from other people. Especially if you have changed Ogre internals. If not, it's less of an issue, but I would like confirmation of a few people that Ogre still compiles fine and the SampleBrowser still runs ;) If you give me until Tuesday, I'll test it, too.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: New InstanceManager: Instancing done the right way

Post by dark_sylinc »

CABAListic wrote:Before you commit it, I think this should get some testing from other people. Especially if you have changed Ogre internals. If not, it's less of an issue, but I would like confirmation of a few people that Ogre still compiles fine and the SampleBrowser still runs ;) If you give me until Tuesday, I'll test it, too.
The only "internal" that has changed is the SceneManager which has added functions (not changed).
However it's fair enough, since there are new files which files may not compile in all compilers (for example 4 hours ago I fixed a tiny non-standard compliant issue that prevented GCC from compiling).

I'll be impatiently waiting for feedback :lol:
Cheers
Dark Sylinc
LBDude
Gnome
Posts: 389
Joined: Mon Jul 26, 2010 10:53 pm
x 22

Re: New InstanceManager: Instancing done the right way

Post by LBDude »

I was able to run the new instancing sample from the sample browser with the last patch. Haven't tried the new patch yet, I will do so in a bit and let you know how that goes. I'm sure it still works :).
My blog here.
Game twitter here
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: New InstanceManager: Instancing done the right way

Post by dark_sylinc »

No more news regarding testing? :(

Tomorrow I'll try the code in an Intel GMA 950 (doesn't support SM 3.0) if SampleBrowser runs on that crap, I think it's safe to say it's stable.

As a precaution, the Instancing sample isn't included in the statically linked version of SampleBrowser.
CABAListic
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 2903
Joined: Thu Jan 18, 2007 2:48 pm
x 58
Contact:

Re: New InstanceManager: Instancing done the right way

Post by CABAListic »

Ouch, sorry, I totally forgot :oops:
I gave it a run now, compiled and ran fine on Ubuntu for me.
cyrfer
Orc
Posts: 424
Joined: Wed Aug 01, 2007 8:13 pm
Location: Venice, CA, USA
x 7

Re: New InstanceManager: Instancing done the right way

Post by cyrfer »

dark_sylinc wrote:Tomorrow I'll try the code in an Intel GMA 950 (doesn't support SM 3.0) if SampleBrowser runs on that crap, I think it's safe to say it's stable.
Hey, great to find this thread finally and someone who cares about the GMA 950 (LOL I have one too). I submitted a limited patch a while back to enable hardware instancing for some systems. I hope any new interfaces reflect all the options nicely. Supporting all the options was out of my scope looking back but now I want each (hardware repeat, software repeat, choose between vertex data synthesis, attributes, or VTF, and various RenderSystems oh my).
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: New InstanceManager: Instancing done the right way

Post by dark_sylinc »

cyrfer wrote:Hey, great to find this thread finally and someone who cares about the GMA 950 (LOL I have one too)
I know I said "tomorrow", but I guess it will wait until this weekend.
The GMA 950 sucks (and all Intels), sorry; but unfortunately too many PCs out there have it, so we can't ignore the fact. They need to be supported due to the broad Ogre audience.
The only thing those Intels are really good for is that if it runs on that crap, it will run pretty most anywhere with Shader Model 2.0.

Mind you, the shader performance is so bad, you usually want to do software skinning (and no instancing) instead of using large vertex shaders.
Also VTF uses too much bandwidth for an integrated GPU.

Furthermore there's a whole article on STALKER: Call of Pripyat which writes how they used software skinning to improve overall performance by 340% on integrated GPUs (mainly Intel) because shaders were a huge bottleneck while bus transfers were not an issue.
cyrfer wrote:I submitted a limited patch a while back to enable hardware instancing for some systems. I hope any new interfaces reflect all the options nicely.
Any link to that patch?
Has it been accepted/rejected/ignored?
I would be interested in seeing it

Cheers
Dark Sylinc
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: New InstanceManager: Instancing done the right way

Post by dark_sylinc »

Hi all!

In case no one noticed, it's been pushed into the main repository (default branch, of course).
My next big main addition will be supporting more than just the first submesh, but note (just exactly as it happens with SubEntities) you won't be able to manipulate it's positions/orientations independently.

But it will take a while, I won't start on it right away.
And after that (or before??), true HW instancing.

Enjoy the upgrade ;)

Cheers
Dark Sylinc
User avatar
Klaim
Old One
Posts: 2565
Joined: Sun Sep 11, 2005 1:04 am
Location: Paris, France
x 56
Contact:

Re: New InstanceManager: Instancing done the right way

Post by Klaim »

In case no one noticed, it's been pushed into the main repository (default branch, of course).
Congrats ;)
Or good work. What you prefer.


Anyway following the discussions around this feature was entertaining and I'm sure a lot of people will use it with love.
User avatar
boyamer
Orc
Posts: 459
Joined: Sat Jan 24, 2009 11:16 am
Location: Italy
x 6

Re: New InstanceManager: Instancing done the right way

Post by boyamer »

Tested, and looks great :)
Nice job you've done, i think you should implement all instancing ways as possible.

Thanks
User avatar
Assaf Raman
OGRE Team Member
OGRE Team Member
Posts: 3092
Joined: Tue Apr 11, 2006 3:58 pm
Location: TLV, Israel
x 76

Re: New InstanceManager: Instancing done the right way

Post by Assaf Raman »

I am working on adding "true HW instancing" support (d3d9 - SetStreamSourceFreq, gl - glVertexAttribDivisor).
Watch out for my OGRE related tweets here.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: New InstanceManager: Instancing done the right way

Post by dark_sylinc »

Hi Assaf!
Honestly I haven't got yet the time to write a single line of it.
I was thinking on writting the frequencies somewhere in the VertexDeclarations or RenderOperations.
As long as you're planning something similar, I'm fine with it :)

As a note to a very D3D9 implementation specific detail:
A naive approach in the RenderSystem would look like this:

Code: Select all

if( instancing )
{
 for(...)
   d3dDevice->SetStreamSourceFreq( i, freq );
}

Draw();

if( instancing )
{
//Disable instancing
 for(...)
   d3dDevice->SetStreamSourceFreq( i, 1 );
}
A more efficient approach, is to do nothing at the end and save a bool indicating whether our last render was instanced.
That way, on the next object to render:
* If previous object was instanced and this one isn't, disable instancing.
* If this object is instanced set corresponding frequencies
* If previous and this object aren't instanced, do nothing

This way when rendering multiple objects which make use of instancing in a row, we don't make useless calls "disabling" instancing. May be something similar can be done with OGL.

May be what you're planning is completely different, but there aren't many possible ways of doing it, so it may end up looking very similar.

Cheers and good luck
Dark Sylinc
User avatar
Assaf Raman
OGRE Team Member
OGRE Team Member
Posts: 3092
Joined: Tue Apr 11, 2006 3:58 pm
Location: TLV, Israel
x 76

Re: New InstanceManager: Instancing done the right way

Post by Assaf Raman »

I hope to commit in the next hour.
Watch out for my OGRE related tweets here.
Post Reply