Hoo boy! This report was scheduled for January but couldn’t be released on time for various reasons.
We have another report coming as this is old news already! We have another report coming mostly talking about VTC (Voxel Cone Tracing) which is a very interesting feature that has been in development during this year.
But until then, let’s talk about all the other features that have been implemented so far!
Fake and LTC Area lights
We implemented area lights!
We added LT_AREA_APPROX & LT_AREA_LTC types for Light.
LT_AREA_APPROX is a fake approximation to area lights, but for many cases looks convincing enough, supports fully RGB coloured textured lights, and is cheaper in terms of performance compared to its LTC variant.
LT_AREA_LTC is LTC (Linearly Transformed Cosines) is the real deal. A physically correct implementation of area lights. It does not currently support textures though.
Area lights do not support shadows. This isn’t laziness in our behalf: shadow maps are not enough to accurately represent shadows of an area light, unless we had an infinite number of shadow maps (or at least, a very high number of them, scattered across the light’s surface).
The latest developments in raytracing (i.e. DXR, RTX) may solve this issue in the future though. We are also looking into potential VCT (Voxel Cone Tracing) solutions
The differences between Fake and LTC area lights are most noticeable at high rougness.
Added Screen Space Decals
At long last!
A highly requested feature finally lands. It requires Clustered Forward. Because we used Forward Clustered to implement this technique, it does not suffer from the edge artifacts common in Deferred solutions.
Diffuse, normal mapped and emissive decals are supported. Note however, that if you enable one of these settings then this setting affects the performance globally: if emissive decals are enabled, it does not matter if it’s just 1 Decal out 50 that uses emissive. Performance-wise it’s the same as if all 50 decals had emissive.
ShadowNodeHelper for configuring shadows programmatically
Generating a shadow node via script is easy. Generating a shadow node via C++ was absurdly hard.
A very likely reason one would want to do it via C++ is to implement custom quality settings: increasing/lowering resolution, changing the number of PSSM splits, etc.
The class ShadowNodeHelper makes this task much easier.
See ShadowMapFromCode sample. Visually, it’s the same demo as ShadowMapDebugging. However the shadow node was created from C++ using this new class, instead of loading it from a compositor script.
Hlms Disk Cache
Often this would manifest as either long loading times in Ogre, or stutter. Which was particularly bad in D3D11 RenderSystem and macOS GL3+.
We already provided the shader microcode cache to greatly alleviate this problem, and worked particularly well for D3D11.
But we took a step further and added the HlmsDiskCache which is meant to complement (not replace!) the microcode cache. The HlmsDiskCache is of particular importance on systems where the API does not support microcode caching (Metal, macOS’ GL3+, some older Linux Mesa drivers), and will be very important in the future for Vulkan and D3D12 for caching PSOs.
The documentation explains in detail what HlmsDiskCache is for and what it does.
The HlmsDiskCache is API agnostic and OS agnostic. Which means you can create it in your system and deployed on other machines. You’d get this guarantee with D3D11’s microcode cache, but not with the rest of the APIs.
We’ve updated the samples to create & use both caches and its enabled by default.
[2.2] Per Pixel Cubemap probes
Another highly requested feature! And this one is my favourite because, alongside with Decals they make the most visual impact.
PCC (Parallax Corrected Cubemaps) are very pretty and were already implemented in Ogre 2.1.
However having only one probe is not very useful. Ogre 2.1 offered a few ways to blend multiple probes, but they were quite suboptimal and difficult to handle.
Thanks to Ogre 2.2 having good support for GPU -> GPU texture copies, support for cubemap arrays, and easy handling of mipmaps, it became possible to suppot per pixel cubemap probes! Now multiple cubemaps can affect the same area and blending between them will be correct.
The class’ name is ParallaxCorrectedCubemapAuto, which in retrospective, it is perhaps not the most intuitive name.
If cubemap arrays are not supported (i.e. iOS with pre-A11 GPUs and DX10-level Hardware), a fallback using dual parabolloid 2D Array textures is used instead. Note the quality is inferior particularly for high roughness reflections (i.e. the higher mipmaps) and the scene’s brighness tends to be different (due to the highest mip being 1×1 vs 1x1x6) unless SceneManager::setAmbientLight was called with EnvFeatures_DiffuseGiFromReflectionProbe unset.
Forward Clustered must be active for this to work.
The samples have been updated and default to using Per Pixel Cubemap to show how to do it, and compare it with the old solutions.
Backward/Forward compatibility is very high, which means that it is very easy to toggle between per-pixel cubemaps and the old solution.
[2.2] Refactored Shaders
Perhaps it went unnoticed by the community since there was not a big fuss about it, is that Ogre 2.2 refactored its shaders.
It wasn’t a rewrite, but rather “moving around” snippets of shader code into API-agnostic .any files.
Like 90% or more of our shader code was an almost exact copy-paste, 3 times per API: GLSL, HLSL and Metal. This often caused bugs when one of them got out of sync, and was hard to maintain.
The shared parts were moved into centralized files, and only the highly divergent parts (usually texture and uniform argument declarations) were kept in separate per-API files, while the subtle differences are addressed via macros or @property Hlms evaluations.
Another minor change is that several variables that lived throughout the entire execution of the shader and were very important for calculating the pixel’s value was moved into a single variable:
Most of these variables used to live with the same name outside of pixelData, with a few exceptions.
As a result, the code is much easier to read, handle and maintain.
Another benefit is that the shader snippets became more modular. This allows reuse by custom Hlms implementations. For example Terra now derives from HlmsPbs (more on this down below)
[2.2] Implemented Texture metadata cache
We mentioned last year that a big problem we had with background texture streaming was that many shaders were unnecessarily being generated while we tried to load the textures, causing severe stutter.
And we also mentioned that a texture metadata cache would solve these problems.
Well guess what got implemented! The texture metadata cache is a very simple, human readable JSON file, and makes a ton of difference.
Additionally, some users spotted we were very inefficient with our DescriptorSetTexture, causing multiple shaders to be generated unnecessarily.
We also fixed a lot of bugs regarding TextureGpu and implementing all features that were missing.
[2.2] Ported Terra to 2.2
Another roadblock towards adoption of 2.2 was that Terra did not work on Ogre 2.2. Fear no more: Terra has been ported!
As we mentioned, the shaders were refactored and the snippets became more modular. In 2.1, HlmsTerra fell a little behind compared to HlmsPbs, as the latest features had not yet been implemented there (for example, planar reflections).
Now HlmsTerra derives from HlmsPbs and tries to reuse as much as possible, including its C++ and shader code. This increases the likehood of Terra automatically being up to date whenever HlmsPbs is updated. And if something still breaks, it still should be much easier to fix.
Another side effect of the Pbs refactor is that Terra got ported to Metal (both macOS and iOS) with very little effort.
Further discussion in the forum post.