Frame killer in compositors

Discussion area about developing or extending OGRE, adding plugins for it or building applications on it. No newbie questions please, use the Help forum for that.
Post Reply
Sairon
Halfling
Posts: 94
Joined: Fri Apr 20, 2007 5:50 pm

Frame killer in compositors

Post by Sairon »

As we've noticed Ogre likes to eat quite a bit of CPU time we've started doing some profiling, we've found a real killer in the compositor code. CompositorManager::_getTexturedRectangle2D uses a Rectangle2D for the full screen quad, which it updates for every pass. The problem is that Rectangle2D uses 3 vertex buffers flagged with HBU_STATIC_WRITE_ONLY, causing huge stalls when the buffers are updated. As we don't make use of Rectangle2D we simply changed all the buffers from HBU_STATIC_WRITE_ONLY to HBU_DYNAMIC_WRITE_ONLY_DISCARDABLE and shaved of 8 ms for our setup. :)
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 66
Contact:

Re: Frame killer in compositors

Post by sinbad »

Patch please :)
Sairon
Halfling
Posts: 94
Joined: Fri Apr 20, 2007 5:50 pm

Re: Frame killer in compositors

Post by Sairon »

I don't know if it's general enough to warrant a patch though since HBU_STATIC_WRITE_ONLY is valid in case you use an Ogre::Rectangle2D which you don't intend to update often. Here's the patch I use: http://dl.dropbox.com/u/849507/OgreRectangle2D.patch, it might not be possible to apply it directly to the ogre svn though since we keep our own branch on our own servers.
User avatar
Praetor
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 3335
Joined: Tue Jun 21, 2005 8:26 pm
Location: Rochester, New York, US
x 3
Contact:

Re: Frame killer in compositors

Post by Praetor »

I'm interested in this but can't look into it right now. If you can't submit the patch to the tracker could you make an entry in the bug database and link back to this thread? That way it doesn't get lost.
Game Development, Engine Development, Porting
http://www.darkwindmedia.com
Sairon
Halfling
Posts: 94
Joined: Fri Apr 20, 2007 5:50 pm

Re: Frame killer in compositors

Post by Sairon »

Done
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Frame killer in compositors

Post by dark_sylinc »

Whoa! Wait a sec...
Why is the rectangle buffer being updated every frame???

This is what's inside _getTexturedRectangle2D:

Code: Select all

	if(!mRectangle)
	{
		/// 2D rectangle, to use for render_quad passes
		mRectangle = OGRE_NEW Rectangle2D(true);
	}
	RenderSystem* rs = Root::getSingleton().getRenderSystem();
	Viewport* vp = rs->_getViewport();
	Real hOffset = rs->getHorizontalTexelOffset() / (0.5f * vp->getActualWidth());
	Real vOffset = rs->getVerticalTexelOffset() / (0.5f * vp->getActualHeight());
	mRectangle->setCorners(-1 + hOffset, 1 - vOffset, 1 + hOffset, -1 - vOffset);
	return mRectangle;
Can't we pass this as a world matrix and multiply in the vertex shader? We already need the mul() against the projection matrix anyway for cross compatibility between OpenGL & Direct3D9.

Updating it this way seems inneficient. Well, it's just 4 vertices after all, but if there are too many compositors chained, this will get called often per frame, and the bus is somethnig not to be bothered. And we don't get overhead with the matrix multiplication method I'm suggesting because we already do that.

Also, with the current patch, I would prefer Rectangle2D constructor to have an extra parameter, so that we it looks like this:

Code: Select all

Rectangle2D( bool includeTextureCoordinates = false, HardwareBuffer::Usage = HardwareBuffer::HBU_DYNAMIC_WRITE_ONLY )
And then override the default value when using OGRE_NEW to create mRectangle

Cheers
Dark Sylinc
User avatar
lf3thn4d
Orc
Posts: 478
Joined: Mon Apr 10, 2006 9:12 pm
x 12

Re: Frame killer in compositors

Post by lf3thn4d »

Hmm wow, I knew this was an issue but never knew fixing it will get up to 8ms gain. How many compositors are you using per frame?

I do agree with dark_sylinc. The best way here is to set it in the shader. However, I do also understand that there's also the other complication where when user wants to utilize the normals for depth based compositor effects. This will probably not work in the shader? Unless of course we provide more parameter type to generate the value in the vertex shader. The next best thing I can see now is the second suggestion of allowing a different hardware buffer usage parameter when instantiating Rectangle2D. Infact, one could even optimize it away by checking if the rectangle needs update at all. Since viewport doesn't change size within compositor chain anyways.
Sairon
Halfling
Posts: 94
Joined: Fri Apr 20, 2007 5:50 pm

Re: Frame killer in compositors

Post by Sairon »

We use only 1 compositor at the moment, but we have several full screen passes. I'm sure dark_sylincs solution would save even more frame time, as even with HBU_DYNAMIC_WRITE_ONLY the update takes considerable time.
User avatar
lf3thn4d
Orc
Posts: 478
Joined: Mon Apr 10, 2006 9:12 pm
x 12

Re: Frame killer in compositors

Post by lf3thn4d »

I think what we could do here is to have a flag to let shader handle the value. Probably a setUseShaderManagedTexturedRectangle2D(bool) to turn it on. This way we can still keep old compatibility.

For basic quad layout, we just have to get the default rect's vertex to be set with a rect of -1, 1, 1, -1. Then we just +- the value accordingly in the shader. As for normals for depth based compositors, we can probably feed the value in as 4 custom vector3 parameter.
User avatar
Noman
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 714
Joined: Mon Jan 31, 2005 7:21 pm
Location: Israel
x 2
Contact:

Re: Frame killer in compositors

Post by Noman »

Interesting ideas.
dark_sylinc's modification of Rectangle2D's constructor (extra param) is fine by me. I'll add this (and maybe other things) when the discussion finishes.
As for the idea of not updating the rectangle's buffer every frame, we can't just change this behavior.

A) We need to make sure that the calculations are possible in the shader, easily. Probably AutoParams. viewport height/width are accessible, but the texel offsets aren't. Shaders can work around this (profile-specific functions etc) but maybe the correct solution is to add the texel offsets as an autoparam (vector2 with x,y components). Or, setting the projection matrix accordingly (is it enough?)

B) We aren't allowed to assume that the material that is rendering the quad has a vertex shader. We can either :
1 - Have a flag in the render_quad pass whether the rectangle needs recalculation (this is a bit redundant, its a very minor setting)
2 - Start enforcing that render_quad passes have a vertex shader.

If a projection matrix solution is possible, it does allow us to ignore B, as shader-less materials will work correctly since the standard transform would do the trick. So it is the preferred way.
Is this possible?
If so, I would like to understand why it was done differently (I assume that the quad gets rendered with identity view/projection). Does anyone know why?
User avatar
lf3thn4d
Orc
Posts: 478
Joined: Mon Apr 10, 2006 9:12 pm
x 12

Re: Frame killer in compositors

Post by lf3thn4d »

Actually texel offset is already in auto param. In fact they can come predivided with viewport's dimensions. So shader code could simply be:
pos.xy += texOffsetParam.zw * float2(2.0f, -2.0f);

I understand that we cannot assume material will provide proper shader to handle it. This is why I suggested a flag to turn it on for users who will provide proper shader for the quad. I suppose we could make it a compositor script feature too.

As far as I can tell, building the quad in shader is easy. But the generation of the camera view distance value for depth based effect is not so straight forward. I'm sure there is a proper universal formula that can work on the vertex shader level. I just can't think of it right now though.

As for why it's in identity view projection, I have no clue. Probably because it's much easier to build the quad that way?
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Frame killer in compositors

Post by dark_sylinc »

When I was thinking of doing it in the shader, the easiest thing that came to mind was using a new auto binding parameter, i.e. "compositor_projection_matrix" which contains the projection matrix (for cross-renderer compatibility) and a matrix that would scale and move to the right offsets.

The problem may be, as pointed out, with the normals. That would requiere an extra matrix multiplication which I believe could beat the performance gain. Not to mention unnecessary complexity to the shader. (off-topic, how are normals enabled, and furthermore, how to reconstruct position with depth using normals? AFAIK the method involves Far Plane - Near Plane, not the normals)
We aren't allowed to assume that the material that is rendering the quad has a vertex shader
Shader Model 3.0 requieres a vertex shader 3.0 present if a pixel shader 3.0 is used.
So unless the developer is targeting SM 2.0 or doesn't care about cross platform compatibility (i.e. OpenGL only) this isn't much of a problem to assume already.
Though, there could be some compositors using fixed function functionality (using ROP operations).
I think a function similar to setUseShaderManagedTexturedRectangle2D could work here.
I'm sure there is a proper universal formula that can work on the vertex shader level. I just can't think of it right now though.
Personally, I use my own quad and add index numbers, then use it to index a parameter I pass to the shader.
Cg's vp40 profile doesn't like it though (shader compiles, assembly looks fine, result is all screwed up; looks like a driver bug)

Nevertheless I'm more inclined to building the Rectangle2D with the DISCARDABLE flag (but using the constructor override, instead of defaulting it for every Rectangle2D as the original patch is doing) and checking in C++ whether we need to update the offsets; since I think this falls in the "micro optimization" category.
The reason there was an 8ms gain was because there obviously was a botleneck in the CPU or the bus, and we were ensuring the driver something we shouldn't. Not because the actual operation is inherently slow.
I suggested modifying the matrix and doing it in the shader because it seemed like a simple efficient solution, but seems it's more complex than it looks.

Cheers
Dark Sylinc
User avatar
Noman
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 714
Joined: Mon Jan 31, 2005 7:21 pm
Location: Israel
x 2
Contact:

Re: Frame killer in compositors

Post by Noman »

OK, two things :

1) It seems like discussion #2 is bigger than originally thought, so I'll go ahead and add dark_sylinc's version of the patch (extra param, used in the right place) soon.

2) In the compositor demo, we could use the same Rectangle2D instance all the time (since there is only one viewport size, methinks). Can someone try to hardwire a possible solution to discussion #2 so we can understand the magnitude of the performance gain?
User avatar
lf3thn4d
Orc
Posts: 478
Joined: Mon Apr 10, 2006 9:12 pm
x 12

Re: Frame killer in compositors

Post by lf3thn4d »

dark_sylinc wrote: The problem may be, as pointed out, with the normals. That would requiere an extra matrix multiplication which I believe could beat the performance gain. Not to mention unnecessary complexity to the shader. (off-topic, how are normals enabled, and furthermore, how to reconstruct position with depth using normals? AFAIK the method involves Far Plane - Near Plane, not the normals)
The normal part is simply an exploit on vertex semantic to pass in the view space far corner points of the view rect in the far clip plane. Thanks to the vertex interpolation, finding the view space position with a depth map in the fragment shader is simply norm * depth where norm is the interpolated far view point at the given fragment in screen space. Knowing this, I believe it should be quite possible to generate them in the vertex shader.
dark_sylinc wrote:
We aren't allowed to assume that the material that is rendering the quad has a vertex shader
Shader Model 3.0 requieres a vertex shader 3.0 present if a pixel shader 3.0 is used.
So unless the developer is targeting SM 2.0 or doesn't care about cross platform compatibility (i.e. OpenGL only) this isn't much of a problem to assume already.
Though, there could be some compositors using fixed function functionality (using ROP operations).
I think a function similar to setUseShaderManagedTexturedRectangle2D could work here.
I think there shouldn't be any worries about this one. As long as there's a flag to use shader managed quad material, the rest should be up to the user to define. Basically what this means is that the user is on their own when using this mode.
dark_sylinc wrote: Nevertheless I'm more inclined to building the Rectangle2D with the DISCARDABLE flag (but using the constructor override, instead of defaulting it for every Rectangle2D as the original patch is doing) and checking in C++ whether we need to update the offsets; since I think this falls in the "micro optimization" category.
The reason there was an 8ms gain was because there obviously was a botleneck in the CPU or the bus, and we were ensuring the driver something we shouldn't. Not because the actual operation is inherently slow.
I suggested modifying the matrix and doing it in the shader because it seemed like a simple efficient solution, but seems it's more complex than it looks.
Yes, I vote for this as well. Sometimes this is better compared to having more complex shader to handle it for every quad rendering. If you have loads of quad compositor techniques, doing a pre-cached calculation is definitely a better choice. In most situation the quad doesn't really change. The only time the quad need to change it's vertex position is when viewport size changes. The only time it needs to change it's "normal" for view depth effects is when camera changed/moved. In-fact, if you're using view space technique (which for the most part is what we all really need when doing depth based technique), the "normal" only requires updating when the frustum changed. To do this, we will need to keep track of the viewport dimensions and camera frustum.
User avatar
Noman
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 714
Joined: Mon Jan 31, 2005 7:21 pm
Location: Israel
x 2
Contact:

Re: Frame killer in compositors

Post by Noman »

Added the first part to 1.7 branch.
While this is theoretically an API change, the default value makes old code compile cleanly and behave the same, making this a safe change.
cyrfer
Orc
Posts: 424
Joined: Wed Aug 01, 2007 8:13 pm
Location: Venice, CA, USA
x 7

Re: Frame killer in compositors

Post by cyrfer »

Hi, I have to admit I am new to the compositor framework, so this might not be a good idea at all. I was wondering if the problem could be solved, allowing more flexibility, by using a new parameter in the compositor script, something like 'override_source_mesh' and you specify a 'mesh_resource' ? Then client applications could supply the exact mesh they need for the compositor pass.
TheSHEEEP
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 972
Joined: Mon Jun 02, 2008 6:52 pm
Location: Berlin
x 65

Re: Frame killer in compositors

Post by TheSHEEEP »

What is the status of this in 1.8? :)
My site! - Have a look :)
Also on Twitter - extra fluffy
Post Reply