Performance bug: Skeletal Animation on Macintosh

Discussion area about developing or extending OGRE, adding plugins for it or building applications on it. No newbie questions please, use the Help forum for that.
Post Reply
hellcatv
Gremlin
Posts: 163
Joined: Thu Dec 14, 2006 2:11 am

Performance bug: Skeletal Animation on Macintosh

Post by hellcatv »

I have a Power Book dual core with a
Processor Name: Intel Core Duo
Processor Speed: 2.16 GHz
ATI Radeon X1600:

I ran the jaiqua demo with the Ogre 1.4.2 SDK (for download) and my own hacked version of Ogre 1.4.3 I built... and it works fine...but the performance is abysmal at 6 fps.
I can hack the material to make the hardware skinning invalid, and the software skinning is speedy as heck...
Additionally I could take out the reference to the bone index in the shader and have it reference bone [0] all the time (so essentially jaiqua's one big bone [0] transformation) and then it also goes smoothly at 60 fps.

Has anyone had this problem or have any recommended solutions?
This is vanilla jaiqua--nothing has changed

more detailed specs below:
Chipset Model: ATY,RadeonX1600
Type: Display
Bus: PCIe
VRAM (Total): 256 MB
Vendor: ATI (0x1002)
Device ID: 0x71c5
Revision ID: 0x0000
EFI Driver Version: 01.00.068
ATI Radeon X1600 OpenGL Engine
Vendor Name ATI Technologies Inc.
Version 2.0 ATI-1.4.56
GL Shading Language Version 1.10
Renderer Name ATI Radeon X1600 OpenGL Engine
OpenGL Extensions
OpenGL Limits
Display Mask 1 (0x00000001)
Renderer ID 137473 (0x00021901)
Off Screen No
Full Screen Yes
Hardware Accelerated Yes
Robust No
Backing Store No
MP Safe Yes
Window Yes
Multi Screen No
Compliant Yes
Buffer Modes 15 (0x0000000f)
Color Buffer Modes 176194560 (0x0a808400)
Accum Buffer Modes 8421376 (0x00808000)
Depth Buffer Modes 7169 (0x00001c01)
Stencil Buffer Modes 129 (0x00000081)
Max Aux Buffers 2 (0x00000002)
Max Sample Buffers 1 (0x00000001)
Max Samples 6 (0x00000006)
Sample Modes 3 (0x00000003)
Alpha Sampling Yes
Total Video Memory 268435456 (0x10000000)
Total Texture Memory 262803456 (0x0faa1000)


Framebuffers
MAX_COLOR_ATTACHMENTS_EXT 4
MAX_RENDERBUFFER_SIZE_EXT 4096
MAX_VIEWPORT_DIMS {4096, 4096}
MIN_PBUFFER_VIEWPORT_DIMS_APPLE {32, 32}
SUBPIXEL_BITS 3
Points and Lines
ALIASED_LINE_WIDTH_RANGE {1, 64}
ALIASED_POINT_SIZE_RANGE {1, 64}
SMOOTH_LINE_WIDTH_GRANULARITY 0.125000
SMOOTH_LINE_WIDTH_RANGE {1.000000, 64.000000}
SMOOTH_POINT_SIZE_GRANULARITY 0.125000
SMOOTH_POINT_SIZE_RANGE {1.000000, 64.000000}
Textures
MAX_3D_TEXTURE_SIZE 512
MAX_CUBE_MAP_TEXTURE_SIZE 4096
MAX_RECTANGLE_TEXTURE_SIZE_EXT 4096
MAX_TEXTURE_SIZE 4096
MAX_TEXTURE_LOD_BIAS 16.000000
MAX_TEXTURE_MAX_ANISOTROPY_EXT 16
MAX_TEXTURE_UNITS 8
Compression Types
COMPRESSED_RGB_S3TC_DXT1_EXT
COMPRESSED_RGBA_S3TC_DXT1_EXT
COMPRESSED_RGBA_S3TC_DXT3_EXT
COMPRESSED_RGBA_S3TC_DXT5_EXT
COMPRESSED_LUMINANCE_ALPHA_3DC_ATI
Stacks
MAX_ATTRIB_STACK_DEPTH 16
MAX_CLIENT_ATTRIB_STACK_DEPTH 16
MAX_COLOR_MATRIX_STACK_DEPTH 5
MAX_MODELVIEW_STACK_DEPTH 32
MAX_NAME_STACK_DEPTH 100
MAX_PROGRAM_MATRIX_STACK_DEPTH_ARB 2
MAX_PROJECTION_STACK_DEPTH 5
MAX_TEXTURE_STACK_DEPTH 5
Vertex Programs
MAX_PROGRAM_ADDRESS_REGISTERS_ARB 2
MAX_PROGRAM_ATTRIBS_ARB 32
MAX_PROGRAM_ENV_PARAMETERS_ARB 256
MAX_PROGRAM_INSTRUCTIONS_ARB 262144
MAX_PROGRAM_LOCAL_PARAMETERS_ARB 1024
MAX_PROGRAM_MATRICES_ARB 8
MAX_PROGRAM_NATIVE_ADDRESS_REGISTERS_ARB 1
MAX_PROGRAM_NATIVE_ATTRIBS_ARB 18
MAX_PROGRAM_NATIVE_INSTRUCTIONS_ARB 256
MAX_PROGRAM_NATIVE_PARAMETERS_ARB 256
MAX_PROGRAM_NATIVE_TEMPORARIES_ARB 32
MAX_PROGRAM_PARAMETERS_ARB 1024
MAX_PROGRAM_TEMPORARIES_ARB 65535
MAX_PROGRAM_EXEC_INSTRUCTIONS_NV 0
MAX_PROGRAM_CALL_DEPTH_NV 0
MAX_VERTEX_ATTRIBS_ARB 16
Fragment Programs
MAX_PROGRAM_ALU_INSTRUCTIONS_ARB 512
MAX_PROGRAM_ATTRIBS_ARB 10
MAX_PROGRAM_ENV_PARAMETERS_ARB 128
MAX_PROGRAM_INSTRUCTIONS_ARB 1024
MAX_PROGRAM_LOCAL_PARAMETERS_ARB 1024
MAX_PROGRAM_NATIVE_ALU_INSTRUCTIONS_ARB 512
MAX_PROGRAM_NATIVE_ATTRIBS_ARB 10
MAX_PROGRAM_NATIVE_INSTRUCTIONS_ARB 1024
MAX_PROGRAM_NATIVE_PARAMETERS_ARB 64
MAX_PROGRAM_NATIVE_TEMPORARIES_ARB 64
MAX_PROGRAM_NATIVE_TEX_INDIRECTIONS_ARB 4
MAX_PROGRAM_NATIVE_TEX_INSTRUCTIONS_ARB 512
MAX_PROGRAM_PARAMETERS_ARB 64
MAX_PROGRAM_TEMPORARIES_ARB 64
MAX_PROGRAM_TEX_INDIRECTIONS_ARB 4
MAX_PROGRAM_TEX_INSTRUCTIONS_ARB 512
MAX_PROGRAM_EXEC_INSTRUCTIONS_NV 0
MAX_PROGRAM_CALL_DEPTH_NV 0
MAX_PROGRAM_IF_DEPTH_NV 0
MAX_PROGRAM_LOOP_DEPTH_NV 0
MAX_PROGRAM_LOOP_COUNT_NV 0
MAX_TEXTURE_COORDS_ARB 8
MAX_TEXTURE_IMAGE_UNITS_ARB 16
Shaders
MAX_COMBINED_TEXTURE_IMAGE_UNITS_ARB 16
MAX_FRAGMENT_UNIFORM_COMPONENTS_ARB 4096
MAX_TEXTURE_COORDS_ARB 8
MAX_TEXTURE_IMAGE_UNITS_ARB 16
MAX_VARYING_FLOATS_ARB 32
MAX_VERTEX_ATTRIBS_ARB 16
MAX_VERTEX_TEXTURE_IMAGE_UNITS_ARB 0
MAX_VERTEX_UNIFORM_COMPONENTS_ARB 4096
Other
MAX_CLIP_PLANES 6
MAX_CONVOLUTION_HEIGHT 11
MAX_CONVOLUTION_WIDTH 11
MAX_ELEMENTS_INDICES 150000
MAX_ELEMENTS_VERTICES 2048
MAX_EVAL_ORDER 10
MAX_GENERAL_COMBINERS_NV 0
MAX_LIGHTS 8
MAX_LIST_NESTING 64
MAX_PIXEL_MAP_TABLE 256
MAX_PN_TRIANGLES_TESSELATION_LEVEL_ATI 0
MAX_SHININESS_NV 128
MAX_SPOT_EXPONENT_NV 128
MAX_VERTEX_ARRAY_RANGE_ELEMENT_APPLE 65535
MAX_VERTEX_UNITS_ARB 4
QUERY_COUNTER_BITS_ARB 32
Display Mask 1 (0x00000001)
Renderer ID 137473 (0x00021901)
Off Screen No
Full Screen Yes
Hardware Accelerated Yes
Robust No
Backing Store No
MP Safe Yes
Window Yes
Multi Screen No
Compliant Yes
Buffer Modes 15 (0x0000000f)
Monoscopic Yes
Stereoscopic Yes
Single Buffer Yes
Double Buffer Yes
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 66
Contact:

Post by sinbad »

Yes, I've observed this too, and I'm suspecting there's a problem with the GLSL implementation of the Apple drivers somewhere. No other platform displays this behaviour but it occurs on both the X1600 and 8600Go versions of the MBP. I haven't found a solution yet.

I think it's limited to GLSL because the Cg hardware skinning (robot) seems ok.
hellcatv
Gremlin
Posts: 163
Joined: Thu Dec 14, 2006 2:11 am

Post by hellcatv »

I've gotten a response from Apple on the issue:
"You may be hitting unintentional attribute aliasing. If you use either all generic or all builtin attributes, this issue should disappear. The alternative is to avoid the minefield of builtin attribute bindings -- in your case gl_Normal conflicts with generic 2, as listed in the ARB_vertex_program spec table."

so I replaced the use of gl_Normal on line 37 of skinningTwoWeightsVp.glsl with vec3(0,1,0) and the jaiqua demo ran at full speed (with incorrect lighting)

is there any way to coerce ogre to pass blend weights so they do not conflict with the builtin attributes?
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 66
Contact:

Post by sinbad »

The way we pass the weights & indexes is the most appropriate way to do it, which requires extended attributes. If we were to use solely builtins then we'd have to use texture coordinates which would not only be larger (since they would be floats when indexes are packed as a UBYTE4) but wouldl also make it much more difficult to be generic, since different models have different numbers of existing texture coordinates.

It's clearly a GLSL driver bug on Apple because no other platform has this problem, even with the same hardware. I can run the very same shader under Vista on the Macbook Pro and it runs fine.

You simply must be able to mix builtins and generics. Neither of the generic attributes we use conflict with the builtins here, because we look them up after linking the GLSL just like the spec says we should. So for example we do this:

Code: Select all

		attrib = glGetAttribLocationARB(mGLHandle, "blendIndices");
		mBlendIndicesAttrib = (attrib == -1)? NO_ATTRIB : (GLuint)attrib;

		attrib = glGetAttribLocationARB(mGLHandle, "blendWeights");
		mBlendWeightsAttrib = (attrib == -1)? NO_ATTRIB : (GLuint)attrib;
We then bind to those attributes when we need to. So all I can think of is that the GLSL compiler on Apple is making a chronic error and re-using attributes from 0 or something without considering the use of builtins. I think in their response they think we're just assigning an arbitrary attribute value and thus conflicting but we're not, we're asking the GLSL subsystem what it has assigned the attribute to. I'm pretty sure it's supposed to make sure it doesn't overwrite a builtin attribute here.

So unless I'm missing something, I'm pretty damn sure this is a GLSL compiler bug on Apple. Now, we could potentially support the builtins as custom attributes as an option, e.g. looking for custom attributes name 'normal' etc so that we used all custom attributes for this shader, but that's a fairly major change I don't think we should have to (or ask our shader writers to use only builtins or only custom attribs for entire shaders), especially as it's not necessary on other drivers. I don't think the standard says you have to do that at all, you just have to make sure you don't overlap your attrib numbers, and I think we're doing the right thing here in asking the GLSL compiler / linker to ensure that doesn't happen.

It would be useful if you could continue to discuss this with them, I don't have a support contract with them and I've had little response on the Apple forums to my other queries so far.
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 66
Contact:

Post by sinbad »

I've confirmed that the nVidia Windows GL driver behaves correctly.

When building a GLSL shader that only uses only gl_Vertex (== attrib 0), the custom indices and weights attribs get bound to 1 and 2 respectively as reported by glGetAttribLocationARB. When building a GLSL shader which uses both gl_Vertex and gl_Normal (attrib = 2), glGetAttribLocationARB reports that the indices and weights get bound to 1 and 3 respectively, obviously hopping over the builtin as expected. So the Windows driver correctly avoids aliasing during the compile / link stage as I believed it should.

I'm not on OS X right now but I suspect glGetAttribLocationARB is coming back with '2' then for the weights, which is wrong since that's used by gl_Normal. So I'm yet more convinced this is an Apple driver bug.
hellcatv
Gremlin
Posts: 163
Joined: Thu Dec 14, 2006 2:11 am

Post by hellcatv »

Apple responded that "We are working to resolve this bug in future updates."

So I guess it will be handled.... I'm hoping the bug fix will retroactively apply to OS X 10.4 so Ogre can support all Intel Mac users who won't upgrade
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 66
Contact:

Post by sinbad »

Let's hope so!

In the mean time Cg seems unaffected although that does limit you to simpler shaders. Provided you just use builtins in between the vertex and fragment programs it's possible to use Cg for the vertex shader (and hence the skeletal animation) and GLSL for the fragment shader, where you might need higher profiles.

Thanks for being the go-between on this. I assume you have a support contract? Or is there some other bug reporting path I'm not aware of?
Post Reply