Direct3D11 shader compile takes forever

TaaTT4 · Post by **TaaTT4** » Thu Jun 25, 2015 6:05 pm

Hi,

I'm experiencing a tedious delay (between 5 and 10 seconds on average) on the start of my application when I use the Direct3D11 render system.
This doesn't happen when if I switch to the GL3+ render system.

Investigating with a profiler, I discovered this slowness is related to the D3DCompile call of the D3D11HLSLProgram::compileMicrocode method.
Adding some time measurements around it, this is the result:

17:53:42: Shader: ./1610645504VertexShader_vs.hlsl
17:53:42: Compile time: 558ms
17:53:42: Shader: ./1610612737VertexShader_vs.hlsl
17:53:42: Compile time: 27ms
17:53:44: Shader: ./1610612737PixelShader_ps.hlsl
17:53:44: Compile time: 1860ms
17:53:44: Shader: ./536870913VertexShader_vs.hlsl
17:53:44: Compile time: 40ms
17:53:48: Shader: ./536870913PixelShader_ps.hlsl
17:53:48: Compile time: 3750ms
17:53:48: Shader: ./536936449VertexShader_vs.hlsl
17:53:48: Compile time: 41ms
17:53:52: Shader: ./536936449PixelShader_ps.hlsl
17:53:52: Compile time: 4642ms
17:53:52: Shader: ./536969217VertexShader_vs.hlsl
17:53:52: Compile time: 46ms
17:53:56: Shader: ./536969217PixelShader_ps.hlsl
17:53:56: Compile time: 3933ms

The source code of the "heaviest" shader (536936449PixelShader_ps.hlsl), which has been automatically generated by the HLMS system, is here.

This delays aren't acceptable for me, especially considering the application I used for testing had a very minimal set (six) of HLMS materials.

Thanks,
Raffaele.

Crystal Hammer · Post by **Crystal Hammer** » Thu Jun 25, 2015 6:41 pm

I agree. It's horribly long.
We had this issue some time ago in Ogre 1.9 I think it suddenly happened after some shader edits that those would compile few seconds.
I stepped the code back then and it was just spending that much time in a DirectX compile shader method, so completely not Ogre fault.
I forgot the exact thing but it was trying to unwind some for loop (or was it a tex sampler or some complex operation? IDK) and that unwinding operation took so long.
After replacing that for or other instruction with some simpler thing it went back to normal.
I'll try to find that commit, but it's difficult.

Post by **dark_sylinc** » Thu Jun 25, 2015 6:42 pm

Go complain to Microsoft because it's their compiler that takes so damn long even when optimisations are disabled.

We offer an easy workaround though:saveMicrocodeCache & loadMicrocodeCache.
Use these functions to save compiled shaders to disk (e.g. write on exit, load on startup) and these long compile times will go away for the next time.

Works for both D3D11 & GL3+ though HLSL is system agnostic (the cache will work for any GPU on any driver version) while GLSL is not (the cache will only work for the specific GPU and driver version it was compiled for)

Be sure when you write the cache to disk that the operation is atomic (e.g. two instances trying to write to the same file can corrupt the cache)
We will soon implement an integrity check, but until then, be aware of that.

Crystal Hammer · Post by **Crystal Hammer** » Thu Jun 25, 2015 6:57 pm

dark_sylinc wrote:Go complain to Microsoft because it's their compiler that takes so damn long even when optimisations are disabled.

Like that's gonna do anything

Surely somebody did find something already.

I just searched for "directx long compile times hlsl"
Here's something useful:
http://www.gamedev.net/topic/624349-spe ... try4936754

But IMO one good reason to drop DX and go with GL

TaaTT4 · Post by **TaaTT4** » Thu Jun 25, 2015 8:00 pm

dark_sylinc wrote: Go complain to Microsoft because it's their compiler that takes so damn long even when optimisations are disabled.

I'm not blaming OGRE for this.
I know it's totally a Microsoft fault.

dark_sylinc wrote: We offer an easy workaround though:saveMicrocodeCache & loadMicrocodeCache.
Use these functions to save compiled shaders to disk (e.g. write on exit, load on startup) and these long compile times will go away for the next time.

What is the workflow of the loadMicrocodeCache?
I have to call it before starting to parse and load the HLMS material scripts (and, in some magical way, the D3DCompile function will never be called anymore)?

Can I save and then load different set of caches (to have, for example, one cache for metal materials, another one for concrete materials and so on)?

Crystal Hammer wrote: I forgot the exact thing but it was trying to unwind some for loop (or was it a tex sampler or some complex operation? IDK) and that unwinding operation took so long.

I guess is something more related to the texture sampler or maybe branches since the HLMS shader which we're talking about doesn't contain any loop.

Crystal Hammer wrote: Here's something useful:
http://www.gamedev.net/topic/624349-spe ... try4936754

Thanks, I will investigate further.

Crystal Hammer wrote: But IMO one good reason to drop DX and go with GL

My target platform is Windows and Direct3D11 has better driver support and performances on it (at least on my configuration).

Post by **dark_sylinc** » Thu Jun 25, 2015 8:50 pm

TaaTT4 wrote:
dark_sylinc wrote: We offer an easy workaround though:saveMicrocodeCache & loadMicrocodeCache.
Use these functions to save compiled shaders to disk (e.g. write on exit, load on startup) and these long compile times will go away for the next time.
What is the workflow of the loadMicrocodeCache?
I have to call it before starting to parse and load the HLMS material scripts (and, in some magical way, the D3DCompile function will never be called anymore)?

When loading the microcode cache, one of the best places to do it is right after you've registered the Hlms implementations. Perform:

Code: Select all

GpuProgramManager::getSingleton().setSaveMicrocodesToCache( true ); //Make sure it's enabled.
DataStreamPtr shaderCacheFile = root->openFileStream( "D:/MyCache.cache" );
GpuProgramManager::getSingleton().loadMicrocodeCache( shaderCacheFile );

When saving (at exit, before the RenderSystems are shut down):

Code: Select all

DataStreamPtr shaderCacheFile = root->createFileStream( "D:/MyCache.cache", ResourceGroupManager::DEFAULT_RESOURCE_GROUP_NAME, true );
GpuProgramManager::getSingleton().saveMicrocodeCache( shaderCacheFile );

Note these calls can throw if there are IO errors (a folder doesn't exist in the path, the file didn't exist, you don't have write or read access, etc) so make sure to wrap the calls in a try/catch block.

Can I save and then load different set of caches (to have, for example, one cache for metal materials, another one for concrete materials and so on)?

Technically you could. But the cache is only one and loading a new cache destroys the current one. It is easier to treat it like a big can where you throw all your shaders. The cache is not specific to just Hlms shaders. It's for all types of shaders (i.e. low level materials included).
The microcode cache computes a 128-bit hash of the shader's source code, and compares that against the cache to retrieve the bytecode before compiling. If you change the shader's source code, the cache will continue to grow indefinitely and probably these entries will never be used, but won't do any harm other than growing its size (except if you have hundreds of thousands of shaders, and thus hash collisions could happen). You clear the cache by simply deleting the file

As for the reason why it takes to long, I only have guesses. As you can Google, lots of industry people have complained about this. And they all resort to the same (save the microcode/bytecode cache). Btw I did not know that the latest D3D compilers (the one shipped with Windows 8 SDK) was faster. That's cool to know!

My guesses is our array sizes. HLSL compiler takes particularly long unrolling loops and with big arrays (probably some quadratic O(N^2) behavior triggering when parsing the syntax, because the issue is there whether you use the array or not, and whether optimizations are turned on and off).
Our shaders generated by the Hlms don't have loops (we unroll them ourselves with the Hlms script parser), but we do have huge arrays, like Material materialArray[273].
Unfortunately these arrays are key to our AZDO optimizations so they can't go away.
The microcode cache solved our problems in production.

Jayray · Post by **Jayray** » Sat Jun 27, 2015 8:18 pm

Hi!

I have tried the load/save method, and it is indeed great to avoid the very long compilation time of the D3D11 shaders, but... saving only works 50% of the time

In fact, there are 2 cases:
1/ No shaders cache => shaders are compiled at runtime => saving works
2/ Shaders cache loaded from file => no shader is compiled at runtime => saving produces an empty file

I am pretty sure the issue comes from these 2 lines in the GpuProgramManager::saveMicrocodeCache function:

Code: Select all

        if (!mCacheDirty)
            return;

Shouldn't these lines be removed to always save the shaders, even if no new shader has been compiled?

Post by **dark_sylinc** » Sat Jun 27, 2015 8:52 pm

Good catch. You can check if the shader cache is dirty by calling isCacheDirty before saving. If it's not dirty, do not save the file.

Jayray · Post by **Jayray** » Sat Jun 27, 2015 9:03 pm

Good to know, thanks!

Ogre Forums

Direct3D11 shader compile takes forever

Direct3D11 shader compile takes forever

Re: Direct3D11 shader compile takes forever

Re: Direct3D11 shader compile takes forever

Re: Direct3D11 shader compile takes forever

Re: Direct3D11 shader compile takes forever

Re: Direct3D11 shader compile takes forever

Re: Direct3D11 shader compile takes forever

Re: Direct3D11 shader compile takes forever

Re: Direct3D11 shader compile takes forever