The variable depthDeviation is in range [0; 2] (which we store in the alpha channel) and thus 8-bit should be enough.
Technically this could introduce a few artifacts because we only store the depth in range [0; maximum_distance * 2]
Storing the depth deviation dramatically improves the hybrid’s quality!
But let’s not rush.
HW texture filtering issues
VCT reconstructs the reflected position by cone tracing along the reflection dir. This means we have to perform a lot of texture samples, and trilinear filtering helps a lot in improving quality. Except, HW texture filtering is good enough for reconstructing colour, but not so much for reconstructing 3D positions.
You see, GPUs are cheap. They interpolate in 8-bit. That means between pixel 0 and pixel 1, you only get 256 different steps of interpolation, even if the texture is stored in RGBA32_FLOAT.
By combining the error mask with our existing hybrid (quantized) code, we can mask many of the artifacts we were having.
We must combine the mask with the existing code, rather than replacing it because this error mask isn’t perfect as it has false negatives.
If the probe is built like this with the probe camera at the specified location:
…then during rendering reflections would go through the separating wall straight into the other wall. The depthDeviation mask would not notice anything wrong because the reflected wall matches exactly the probe’s shape, thinking the error variation is minimum. But it’s blatantly wrong:
Thus this mask isn’t perfect and needs to be combined with extra informaiton. Of course, a simple solution in this example would be use two probes, but more probes means more RAM consumption, lower performance, and I fear not every example could be fixed this way.
Furthermore, we can’t rely on the artist knowing all this so he can make the perfect probe placement. The algorithm needs to be robust enough to handle this case.
After using the compressed depth to improve accuracy, and also using it as a mask to hide errors, these are the results:
But there’s still a problem. Sometimes pictures need to be manually tweaked in brightness to match.
PCC and VCT need to match
Although quality was an important goal, our VCT implementation wasn’t rigorously tested against reference raytracers. It’s brand new and besides, Global Illumination is already a huge improvemnt right? No one’s gonna notice.
Except that now that VCT and PCC are being combined, this little detail becomes to obvious. When they don’t match, transitioning when between the two the contrast change is quite stark (exaggerated for this picture):
Some contrast differences are always going to be expected because VCT tends to have blurrier reflections (due to its limited resolution) than PCC.
But right now we have differences in brightness we can fix.
I suspect part of this problem is that we’re building PCC probe data with specular reflections, and VCT has no such info. So probably if we want them to match, we would need to make PCC generation diffuse only. This is a theory though.
Another big reason is that clearly VCT needs more rigorous benchmarking against reference material, such as raytracers.
So that’s what we will probably have to focus next: Working towards making our PCC and/or VCT implementations more exact so that they both match reality, and more importantly… with each other.
Another issue is that right now I force-disabled the use of VCT while generating the PCC probes, and that takes away GI from the probes! That can cause huge differences in lighting. I need to reevaluate this decision. At the very least, diffuse VCT needs to be present during PCC generation
Last but not least, if the probe is blocked by an object, and the reflection comes from behind, the hybrid will think the reflection is OK when it’s not.
For example this causes reflections in the ground to reflect what’s on the other side of a table:
The only solution I see to this is to detect & discard reflections that come from behind, or to use more probes (i.e. one below the table).
Unrelated to all this, it seems a bug creeped into per-pixel PCC where influence areas are not always being respected. It seems either ForwardClustered from C++ or shader side is using the wrong area (i.e. using the probe’s shape instead of the area of influence). That’s a bug in need of fixing.
Over the last couple months we have been working on Voxel Cone Tracing (VCT), a Global Illumination solution.
Voxel Cone Tracing could be seen as an approximated version of ray marching.
The scene is voxelized and three voxels are generated, containing albedo, normals and emissive properties. In other words, we build a Minecraft-looking representation of the world:
We run a compute shader (‘VCT/Voxelizer’) to voxelize the scene. The voxelization process isn’t cheap. If we were to try it every frame, it could run anywhere between 0.5-10 fps depending on scene complexity, voxel resolution and GPU performance.
Voxelization isn’t fast enough to do it at real time framerate, but it is fast enough for interactive scene editing in a level, for example.
Once we have the scene voxelized, we need to inject light into another 3D texture. Our ‘VCT/LightInjection’ and ‘VCT/LightVctBounceInject’ compute shader take care of that, for the first and then multiple bounces respectively.
After it’s ran, the lighting 3D texture looks like this:
This texture will be sent directly to use for Voxel Cone Tracing during rendering. rendering commands, so that scenes can have GI on them
Voxel Cone Tracing is similar to Ray Marching in the sense that we trace a few rays, and march across them until we hit obstacles. But instead of sending hundreds of rays, we only trace 4 to 6 ‘rays’ (called cones) and use 3D mipmapping to detect obstacles further away
The results are amazingly good, particularly because the technique adapts very well to all sorts of situations:
It shall be noted that injecting the light is very fast. If you change your lighting setup (e.g. dynamic time of day).
Without updating your voxelized scene, you can change light dynamically in real time. While it isn’t free, you shouldn’t have trouble reaching 60 fps even if you update the lighting every frame.
But it doesn’t stop there. Specular reflections are also captured by VCT:
It can even make mirror-like reflections, but the blockiness of voxels become more apparent:
Another reason why you may think there’s something wrong, is because we don’t support multiple bounces on specular reflections. If you look carefully, the house reflection on the wall looks like the original house, but the house itself is also a mirror. If these reflections were accurate, the wall should reflect whatever the house is reflecting.
Note that specular reflections at low roughness take a much higher toll on performance, because for the lowest of roughness, cone tracing becomes effectively raymarching as the cone apperture is 0.
We only trace one cone/ray though, but at higher roughess cone tracing can skip many pixels by sampling the higher mip levels.
VCT is also ready for HDR rendering, by using HDR multipliers to keep the RGBA8 buffers from under/overflowing.
Anisotropic vs Isotropic
VCT supports Anisotropic and Isotropic modes.
VCT relies heavily on voxel’s mips. The problem with mipmaps is that they are an average in all directions. That means higher mips lose a lot of information. If the left wall is blue and the right wall is red, eventually they get averaged into a dark pink!
Anisotropic fixes this problem by storing directional information. That means we need to store mips for all 6 directions (+X, -X, +Y, -Y, +Z, -Z).
This increases our memory consumption by roughly 80% and slight to moderate performance hit, but the quality differences are very noticeable.
This is a Sibenik cathedral comparison (unfortunately I was running out of battery when I took these screenshots so FPS counter is all over the place due to throttling):
Parallax Corrected Cubemaps give gorgeous reflections… when the reflected surfaces match the defined rectangle, that is.
The following diagram illustrates the problem:
When looking up, the table should be reflected on the ceiling. However PCC calculates the reflection to be at the red point (wrong) instead of being at the green cross (correct), because PCC can only deal with rectangular shapes.
Likewise happens with reflections outside of a room:
In this case, the reflection should be outside the rectangle defined by PCC, but PCC incorrectly calculates the reflection at the border, e.g. where the window is. This can cause incorrect parallax when looking at reflections.
Sometimes these error are not noticeable. Sometimes they’re visibly noticeable. It depends on what objects were included in the PCC probe, and the shape of the scene. The more rectangular and indoor it looks, the less likely the reflection errors are going to be noticed.
Enter PCC / VCT hybrid
VCT can also perform reflections, as already shown earlier. But VCT reflections have the problem that they look blocky / Minecraft-like. And it gets worse for large scenes and/or low resolution.
But VCT is much more than that! VCT does not only know what colour the reflection should be. It also knows where the reflection happens.
This means we can know whether the PCC rectangular approximation is correct!
If the PCC error is within threshold, we use PCC reflections. If the error is large enough, we fallback to VCT.
The hybrid simply boils to:
w = distance( pccBoxIntersection.xyz, vctReflectionPos.xyz )
finalReflection = lerp( pccReflectionColour, vctReflectionColour, w );
Please note we’re doing this to increase quality. It is not a performance optimization.
This is the original PCC reflections from the LocalCubemapsManual sample:
The room is mostly rectangular. Except tor the red wall which is layed out diagonally. Note the red wall’s reflection on the blue wall.
In contrast, these are the reflections produced by VCT:
VCT is producing visible aliasing (staircase effect) due to the voxelization, but the red wall is correctly reflected onto the blue one.
When we run the hybrid code and merge both, the red wall’s reflection is taken from VCT, while the rest of the reflections are from PCC
Looking from another angle, we can see the same effect with the red wall:
We’re still working on improving this hybrid. These pictures don’t really do it justice because it does a really good job on complex scenes at fixing extremely bad looking reflections (reflections which are blatantly wrong to the naked eye).
Right now we are fixing two problems:
VCT precision errors due to interpolation
Right now there are gross errors in the VCT position reconstruction. It appears to be a floating point error at first but it is too large to be just that. We strongly suspect it has to do that raymarching relies on sampling a 3D texture, and GPU only offers 8-bit of precision for interpolation. We still need to test that theory.
Because of this issue, we only use the Hybrid for roughness <= 0.02; as the errors are quite visible, since Specular reflections are high frequency and thus very noticeable
Storing distance to center in alpha channel
We know the VCT position, we also know the projected PCC position in the defined rectangle. But we don’t know if the object that was rendered to the PCC probe isn’t actually close to the probe’s bounds (such as a chair inside the probe, or a tree outside the probe)
By storing the distance to the center of the probe in the alpha channel, we could reject additional PCC reflections that are innacurate
The hybrid isn’t perfect, but we have high hopes that it can make good quality compromises to achieve very good realtime reflections.
Once we’ve tested our interpolation precision theory and store distance to probe’s center in alpha channel, we may have more updates
You can think of DDGI as a 3D grid array of two very low resolution cubemaps (e.g. as low as 4×4).
The technique itself is not entirely new, the novelty is that it uses Raytracing to build the cubemaps (because rasterization is very inefficient for rendering to low resolution targets) which speeds up the process a lot, enough to be performed in real time, and that they introduced a second cubemap with depth information to fight light and shadow leaks.
We have been extremely impressed by the presentation.
The technique only handles diffuse GI, and strictly speaking it’s very likely it is of slightly lower quality than diffuse VCT. However DDGI has O(1) complexity during rendering, which makes it ideal from a performance standpoint. DDGI also consumes far less memory (but has no specular reflections).
We plan on rolling DDGI in two stages:
DDGI built from VCT. Instead of using raytracing to build the DDGI probes, we use cone tracing.
DDGI built from raytracing, whether it’s using open solutions such as RadeonRays, or hw accelerated RTX
This plan allows us to work on DDGI much sooner (i.e. without having to prepare for raytracing) and also compare the two techniques.
Later, once raytracing is in place, we can focus on moving DDGI generation to raytracing.
You probably have heard RTX (aka NVIDIA’s raytracing, ‘DXR’ in D3D12 lingo) to have been doing a lot of fuzz lately.
It’s not exactly clear whether the current generation of HW accelerated raytracing is the way to go (due to the the acceleration tree structures used by vendors is currently a black box, which could result in wild performance variations in the future depending on the scenes), but it is undeniable the industry will be moving towards raytracing hybrids in the future.
At the very least, raytracing succeeds in performing specular reflections where current gen of techniques fail. While we can use VCT for rough specular reflections.
This is why we will be moving towards raytracing. Starting with the integration of RadeonRays and Metal Raytracing in the near future, and later we’ll integrate RTX/DXR as the Vulkan and/or D3D12 RenderSystems are implemented.
Our current plan is to implement the following algorithm:
When raytracing is not available:
For every pixel on screen:
1. If roughness is above threshold, use VCT for spec reflections
2. If roughness is below threshold, raymarch through the VCT until we find a
cell with alpha != 0 and test against the triangles in that cell If hit, stop
1. If miss, continue with the raymarch and repeat
2. Unless roughness is 0, as we raymarch the cone aperture will keep getting higher.
3. Past certain threshold, stop raytracing and only use the VCT result.
When raytracing is available:
For every pixel on screen:
1. If roughness is above threshold, use VCT for spec reflections
2. If roughness is below threshold but not 0:
1. Raymarch through the VCT until we find a cell with alpha != 0
and test against the triangles in that cell.
2. Separately run RTX query.
3. Blend the result based on cone aperture (we may even discard
the RTX result if the aperture got too big)
3. If roughness is 0, only run RTX
Futhermore, we are interested in futher expanding VCT solutions rather than just being locked up to DXR. Building an SDF (Signed Distance Field) could accelerate traversal of the VCT probe.
Other interesting solutions is to store geometry information in the voxels, which would allows us to perform raytracing in current-gen HW.
A VCT implementation that internally stores indices to sets of geometry would allow us to reduce the VCT resolution (thus reducing memory footprint), build SDF acceleration structures. There could also be the possibility of storing indices to per-mesh voxels.
It is unclear which approaches will be a definitive win because this is mostly uncharted territory, and we don’t have the resources to pursue all possibilities. We’ll try to focus on what appears to be most promising.
Unzip them and run the script that matches your platform and OS!
For example if you’re on Windows and have Visual Studio, run either:
depending on the architecture you want to build for (e.g. 32-bit vs 64-bit)
The scripts will automatically start building, but you will also find the Visual Studio solution files under Ogre/build/OGRE.sln
If you’re on Linux, run either:
Which one you need depends on the C++ version you’re targetting. C++98 compiles much faster than the rest, but may have incompatibilities (particularly with std::string) if mixed in a project build for C++11 or newer
There are currently no build scripts for Apple ecosystem. For building for iOS, refer to the Setting up Guide. The instructions for Linux should work for building for macOS, but may require additional manual intervention.
We hope this makes your life easier! And let us know if you encounter problems with these scripts! The goal is that building Ogre from scratch becomes as simple as tapping a few clicks.
Hoo boy! This report was scheduled for January but couldn’t be released on time for various reasons.
We have another report coming as this is old news already! We have another report coming mostly talking about VTC (Voxel Cone Tracing) which is a very interesting feature that has been in development during this year.
But until then, let’s talk about all the other features that have been implemented so far!
Fake and LTC Area lights
We implemented area lights!
We added LT_AREA_APPROX & LT_AREA_LTC types for Light.
LT_AREA_APPROX is a fake approximation to area lights, but for many cases looks convincing enough, supports fully RGB coloured textured lights, and is cheaper in terms of performance compared to its LTC variant.
LT_AREA_LTC is LTC (Linearly Transformed Cosines) is the real deal. A physically correct implementation of area lights. It does not currently support textures though.
Area lights do not support shadows. This isn’t laziness in our behalf: shadow maps are not enough to accurately represent shadows of an area light, unless we had an infinite number of shadow maps (or at least, a very high number of them, scattered across the light’s surface). The latest developments in raytracing (i.e. DXR, RTX) may solve this issue in the future though. We are also looking into potential VCT (Voxel Cone Tracing) solutions
The differences between Fake and LTC area lights are most noticeable at high rougness.
Added Screen Space Decals
At long last!
A highly requested feature finally lands. It requires Clustered Forward. Because we used Forward Clustered to implement this technique, it does not suffer from the edge artifacts common in Deferred solutions.
Diffuse, normal mapped and emissive decals are supported. Note however, that if you enable one of these settings then this setting affects the performance globally: if emissive decals are enabled, it does not matter if it’s just 1 Decal out 50 that uses emissive. Performance-wise it’s the same as if all 50 decals had emissive.
ShadowNodeHelper for configuring shadows programmatically
Generating a shadow node via script is easy. Generating a shadow node via C++ was absurdly hard.
A very likely reason one would want to do it via C++ is to implement custom quality settings: increasing/lowering resolution, changing the number of PSSM splits, etc. The class ShadowNodeHelper makes this task much easier.
See ShadowMapFromCode sample. Visually, it’s the same demo as ShadowMapDebugging. However the shadow node was created from C++ using this new class, instead of loading it from a compositor script.
Hlms Disk Cache
A common problem with Ogre are shader compilation times. This problem is notexclusive to us.
Often this would manifest as either long loading times in Ogre, or stutter. Which was particularly bad in D3D11 RenderSystem and macOS GL3+.
We already provided the shader microcode cache to greatly alleviate this problem, and worked particularly well for D3D11.
But we took a step further and added the HlmsDiskCache which is meant to complement (not replace!) the microcode cache. The HlmsDiskCache is of particular importance on systems where the API does not support microcode caching (Metal, macOS’ GL3+, some older Linux Mesa drivers), and will be very important in the future for Vulkan and D3D12 for caching PSOs.
The HlmsDiskCache is API agnostic and OS agnostic. Which means you can create it in your system and deployed on other machines. You’d get this guarantee with D3D11’s microcode cache, but not with the rest of the APIs.
We’ve updated the samples to create & use both caches and its enabled by default.
[2.2] Per Pixel Cubemap probes
Another highly requested feature! And this one is my favourite because, alongside with Decals they make the most visual impact.
PCC (Parallax Corrected Cubemaps) are very pretty and were already implemented in Ogre 2.1.
However having only one probe is not very useful. Ogre 2.1 offered a few ways to blend multiple probes, but they were quite suboptimal and difficult to handle.
Thanks to Ogre 2.2 having good support for GPU -> GPU texture copies, support for cubemap arrays, and easy handling of mipmaps, it became possible to suppot per pixel cubemap probes! Now multiple cubemaps can affect the same area and blending between them will be correct. The class’ name is ParallaxCorrectedCubemapAuto, which in retrospective, it is perhaps not the most intuitive name.
If cubemap arrays are not supported (i.e. iOS with pre-A11 GPUs and DX10-level Hardware), a fallback using dual parabolloid 2D Array textures is used instead. Note the quality is inferior particularly for high roughness reflections (i.e. the higher mipmaps) and the scene’s brighness tends to be different (due to the highest mip being 1×1 vs 1x1x6) unless SceneManager::setAmbientLight was called with EnvFeatures_DiffuseGiFromReflectionProbe unset.
Forward Clustered must be active for this to work.
The samples have been updated and default to using Per Pixel Cubemap to show how to do it, and compare it with the old solutions. Backward/Forward compatibility is very high, which means that it is very easy to toggle between per-pixel cubemaps and the old solution.
[2.2] Refactored Shaders
Perhaps it went unnoticed by the community since there was not a big fuss about it, is that Ogre 2.2 refactored its shaders. It wasn’t a rewrite, but rather “moving around” snippets of shader code into API-agnostic .any files.
Like 90% or more of our shader code was an almost exact copy-paste, 3 times per API: GLSL, HLSL and Metal. This often caused bugs when one of them got out of sync, and was hard to maintain.
The shared parts were moved into centralized files, and only the highly divergent parts (usually texture and uniform argument declarations) were kept in separate per-API files, while the subtle differences are addressed via macros or @property Hlms evaluations.
Another minor change is that several variables that lived throughout the entire execution of the shader and were very important for calculating the pixel’s value was moved into a single variable:
Most of these variables used to live with the same name outside of pixelData, with a few exceptions. As a result, the code is much easier to read, handle and maintain.
Another benefit is that the shader snippets became more modular. This allows reuse by custom Hlms implementations. For example Terra now derives from HlmsPbs (more on this down below)
[2.2] Implemented Texture metadata cache
We mentioned last year that a big problem we had with background texture streaming was that many shaders were unnecessarily being generated while we tried to load the textures, causing severe stutter. And we also mentioned that a texture metadata cache would solve these problems.
Well guess what got implemented! The texture metadata cache is a very simple, human readable JSON file, and makes a ton of difference.
Additionally, some users spotted we were very inefficient with our DescriptorSetTexture, causing multiple shaders to be generated unnecessarily. We also fixed a lot of bugs regarding TextureGpu and implementing all features that were missing.
[2.2] Ported Terra to 2.2
Another roadblock towards adoption of 2.2 was that Terra did not work on Ogre 2.2. Fear no more: Terra has been ported!
As we mentioned, the shaders were refactored and the snippets became more modular. In 2.1, HlmsTerra fell a little behind compared to HlmsPbs, as the latest features had not yet been implemented there (for example, planar reflections).
Now HlmsTerra derives from HlmsPbs and tries to reuse as much as possible, including its C++ and shader code. This increases the likehood of Terra automatically being up to date whenever HlmsPbs is updated. And if something still breaks, it still should be much easier to fix.
Another side effect of the Pbs refactor is that Terra got ported to Metal (both macOS and iOS) with very little effort.
It means the branch is stabilizing. Back when it was v2-2-WIP, it was very unstable. Checking out that branch meant you could find crashes, memory leaks, missing or broken features; and the API interface was changing very quickly, thus updating the repository could mean your code would no longer compile and required adapting.
Over the last couple months, the API interfaces on 2.2 had begun to settle down, bugs were fixed and there were no apparent leaks. In fact some teams started already using it.
Now that it is no longer WIP, while there could still be API breakage on the 2.2 branch or accidental crashes/regressions, it shouldn’t be anything serious or that requires intensive porting.
We still have a few missing features (such as saving textures as DDS) but they’re not used frequently.
Coming up next
We still owe you a Progress Report of what’s been going on in 2.1 and 2.2 in the past year and a half; we have that written down but still needs a few reviews.
Coming up next is:
More real time GI improvements
VR performance optimizations
We are planning on a Vulkan/D3D12 backend
Additionally we have a community member working on GPU-driven rendering and GPU particle FXs; while another community member just contributed Morph animations to 2.1
Yes, Morph animations are finally HW accelerated again! We are evaluating on porting this contribution to 2.2; it shouldn’t take long but we’re evaluating if it can be improved with the use of Compute Shaders
What about Ogre 2.1?
If someone wants to teach Matias aka dark_sylinc a quick automated way to create installer SDKs, that is welcomed! (he never liked handling that!!!)
Ogre 2.1 has been very stable. Eugene ported several improvements from the 1.x branch; and we currently are dealing with a regression that caused due to how PF_BYTE_LA / PF_L8 format is treated in D3D11, but other than that 2.1 is ready for release.
The morph animation contribution is brand new so that may need a bit more testing.
If you don’t see an SDK that is mostly due to time and knowledge to package an SDK.
If someone else wants to step in and maintain packaging, that is welcomed!