Upcoming Global Illumination improvements in Ogre-Next

Note: This work is being sponsored by Open Source Robotics for the Ignition Project

Ogre-Next offers a wide amount of Global Illumination solutions.

Some better than others, but VCT (Voxel Cone Tracing) stands out for its high quality at an acceptable performance (on high end GPUs).

However the main problem right now with our VCT implementation is that it’s hard to use and needs a lot of manual tweaking:

  1. Voxelization process is relatively slow. 10k triangles can take 10ms to voxelize on a Radeon RX 6800 XT, which makes it unsuitable for realtime voxelization (only load time or offline)
  2. Large scenes / outdoors need very large resolution (i.e. 1024x32x1024) or just give up to large quality degradations
  3. It works best on setting up static geometry on a relatively small scene like a room or a house.

If your game is divided in small sections that are paged in/out (i.e. PS1 era games like Resident Evil, Final Fantasy 7/8/9, Grim Fandango) VCT would be ideal.

But in current generation of games with continous movement over large areas, VCT falls short, not unless you do some insane amount of tricks.

So we’re looking to improve this and that’s where our new technique Cascaded Image VCT (CIVCT… it wouldn’t be a graphics technique if we didn’t come up with a long acronym) comes in:

  1. Voxelizes much faster (10x to 100x), enabling real time voxelization. Right now we’re focusing on static meshes but it should be possible to support dynamic stuff as well
  2. User friendly
  3. Works out of the box
  4. Quality settings easy to understand
  5. Adapts to many conditions (indoor, outdoor, small, large scenes)

That would be pretty much the holy grail of real time GI.

Step 1: Image Voxelizer

Our current VctVoxelizer is triangle based: It feeds on triangles, and outputs a 3D voxel (Albedo + Normal + Emissive). This voxel is then fed to VctLighting to produce the final GI result:

Right now we’re using VctVoxelizer voxelizes the entire scene. This is slow.

Image Voxelizer is image-based and consists in two steps:

  1. Reuse VctVoxelizer to separately voxelize each mesh offline and save results to disk (or during load time). At 64x64x64 a mesh would need between 2MB and 3MB of VRAM per mesh (and disk space) depending on whether the object contains emissive materials. Some meshes require much lower resolution though. This is user tweakable. You’d want to dedicate more resolution to important/big meshes, and lower resolution for the unimportant ones.
    • This may sound too much, but bear in mind it is a fixed cost independent of triangle count. A mesh with a million triangles and a mesh with a 10.000 triangles will both occupy the same amount of VRAM.
    • Objects are rarely square. For example desk table is often wider than it is tall or deep. Hence it could just need 64x32x32, which is between 0.5MB and 0.75MB
  2. Each frame, we stitch together these 3D voxels of meshes via trilinear interpolation into a scene voxel. This is very fast.

This feature has been fast thanks to Vulkan, which allows us to dynamically index an arbitrary number of bound textures in a single compute dispatch.

OpenGL, Direct3D 11 & Metal* will also support this feature but may experience degraded performance as we must perform the voxelization in multiple passes. How much of a degradation depends on the API, e.g. OpenGL actually will let us dynamically index the texture but has a hard limit on how many textures we can bind per pass.

(*) I’m not sure if Metal supports dynamic texture indexing or not. Needs checking.

Therefore this is how it changed:

This step is done offline or at loading time:

This step can be done every frame or when the camera moves too much, or if an object is moved

Downside

There is a downside of this (aside from VRAM usage): We need to voxelize each mesh + material combo. Meaning that if you have a mesh and want to apply different materials, we need to consume 2-3MB per material

This is rarely a problem though because most meshes only use one set of materials. And for those that do, you may be able to get away with baking a material set that is either white or similar; the end results after calculating GI may not vary that much to worth the extra VRAM cost.

Non-researched solutions:

  • For simple colour swaps (e.g. RTS games, FPS with multiplayer teams), this should be workaroundeable by adding a single multiplier value, rather than voxelizing the mesh per material
  • It should be possible to apply some sort of BC1-like compression, given that the mesh opaqueness and shape is the same. The only thing that changes is colour; thus a delta colour compression scheme could work well

Trivia

At first I panicked a little while developing the Image Voxelizer because the initial quality was far inferior than that of the original voxelizer.

The problem was that the original VCT is a ‘perfect’ voxelization. i.e. if a triangle touches a single voxel, then that voxel adopts the colour of the triangle. Its neighbour voxels will remain empty. Simple.
That results in a ‘thin’ voxel result.

However in IVCT, voxels are interpolated into a scene voxel that will not match in resolution and may be arbitrarily offsetted by subpixels. It’s not aligned either.

The result is that certain voxels have 0.1, 0.2 … 0.9 of the influence of the mesh. This generates ‘fatter’ voxels.

In 2D we would say that the image has a halo around the contours

Once I understood what was going on, I tweaked the math to thin out these voxels by looking at the alpha value of the interpolated mesh and applying an exponential curve to get rid of this halo.

And now it looks very close to the reference VCT implementation!

Step 2: Row Translation

We want to use cascades (a similar concept from shadow mapping, i.e. Cascaded Shadow Maps. In Ogre we call it Parallel Split Shadow Maps but it’s the same thing) concentric around the camera.

That means when the camera moves, once the camera has moved too much, we must move the cascades and re-voxelize.

But we don’t need to voxelize the entire thing from scratch. We can translate everything by 1 voxel, and then revoxelize the new row:

As the camera wants to move, once it moved far enough, we must translate the voxel cascade

When we do that, there will be a region that is no longer covered (it will be covered by a higher, lower quality cascade) marked in grey, and a row of missing information we must revoxelize, marked in red.

Given that we only need to partially update the voxels after camera movement, it makes supporting cascades very fast

Right now this step is handled by VctImageVoxelizer::partialBuild

Step 3: Cascades

This step is currently a work in progress. The implementation is planned to have N cascades (N user defined). During cone tracing, after we reach the end of a cascade we move on to the next cascade, which covers more ground but has coarser resolution, hence lower quality.

Wait isn’t this what UE5’s Lumen does?

AFAIK Lumen is also a Voxel Cone Tracer. Therefore it’s normal there will be similarities. I don’t know if they use cascades though.

As far as I’ve read, Lumen uses an entirely different approach to voxelizing which involves rasterizing from all 6 sides, which makes it very user hostile as meshes must be broken down to individual components (e.g. instead of submitting a house, each wall, column, floor and ceiling must be its own mesh).

With Ogre-Next you just provide the mesh and it will just work (although with manual tunning you could achieve greater memory savings if e.g. the columns are split and voxelized separately).

Wait isn’t this what Godot does?

Well, I was involved in SDFGI advising Juan on the subject, thus of course there are a lot of similarities.

The main difference is that Godot generates a cascade of SDFs (signed distance fields), while Ogre-Next is generating a cascade of voxels.

This allows Godot to render on slower GPUs (and is specially better at specular reflections), but at the expense of accuracy (there’s a significant visual difference when toggling between Godot’s own raw VCT implementation and its SDFGI; but they both look pretty) but I believe these quality issues could be improved in the future.

Having an SDF of the scene also offers interesting potential features such as ‘contact shadows’ in the distance.

Ogre-Next in the future may generate an SDF as well as it offers many potential features (e.g. contact shadows) or speed improvements. Please understand that VCT is an actively researched topic and we are all trying and exploring different methods to see what works best and under what conditions.

The underlying techniques aren’t new, but what made it possible are the new APIs and the raw power provided by current generation of GPUs that can keep up with them (although the current GPU shortage might delay the widespread adoption of these techniques).

Since this technique will be used in Ignition Gazebo for simulations, I had to err on the side of accuracy.

When is it coming?

CIVCT isn’t done yet but hopefully it should be ready 1-2 more weekends (I can only work on this during the weekends). Maybe 3? (I hope not!). I want to release Ogre-Next 2.3 RC0 in the meantime, and when CIVCT is ready a proper Ogre-Next 2.3 release.

The reason it’s taking so little time is because we’re improving on our existing technology and reusing lots of code. We’re just changing a few details to make it faster and more use friendly now that Vulkan gives us that freedom (but again, we plan on supporting this feature on all our API backends).

These improvements are currently living in vct-image branch but has no sample yet showcasing it as it is WIP.

Btw! Remember there is an active poll to decide on Ogre-Next 2.3 name. Don’t forget to vote!

Vulkan RenderSystem in Ogre 13

The Vulkan RenderSystem backport from Ogre-next, now has landed in the master branch and will be available with Ogre 13.2. See the screenshot below for the SampleBrowser running on Vulkan

The code was simplified during backporting, which shows by the size reduction from ~33k loc in Ogre-Next to ~9k loc that are now in Ogre.

The current implementation pretends to have Fixed Function capabilities, which allows operating with one default shader – similarly to what I did for Metal. This shader only supports using a single 2D texture without lighting. E.g. vertex color is not supported. This is why the text is white instead of black in the screenshot above.
Nevertheless, it already runs on Linux, Windows and Android.

Proper lighting and texturing support, will require some adaptations to the GLSL writer in the RTSS, as Vulkan GLSL is slightly different to OpenGL GLSL. This, and the other currently missing features will hopefully come together during the 13.x development cycle. If you are particularly keen on using Vulkan, consider giving a hand.
Right now, the main goal is to get Vulkan feature-complete first, so dont expect it to outperform any of the other RenderSystems. Due to being incomplete, the Vulkan RenderSystem is tagged EXPERIMENTAL.

Ogre ecosystem roundup #8

following the last post about what is going on around Ogre, here is another update. With the Ogre 13.1 release, mainly the usability of Ogre was improved with the following additions.

Table of Contents

Ogre 13.1 release

The per-pixel RTSS stage gained support for two sided lighting. This is useful if you want to have a plane correctly lit from both sides or for transparency effects, as shown below:

single sided/ two-sided lighting

Furthermore, PCF16 filtering support was added to the PSSM RTSS stage. This gives you softer shadows at the cost of 4x the texture lookups. The images below show crops from the ShaderSystem sample at 200% highlighting the effect

PCF4/ PCF16

blender2ogre improved even further

Thanks to the continued work by Guillermo “sercero” Ojea Quintana, blender2ogre gained some exciting new features.

The first is support for specifying Static and Instanced geometry like this. You might wonder whether you should be using that and if yes, which variant. Therefore, he also collected the respective documentation which is available here.

The second notable feature is support for .mesh import, which might come handy if you are modding some Ogre based game or just lost the source .blend file. This feature is based on the respective code found in the Kenshi Blender Plugin (which in turn is based on the Torchlight plugin).

Then, old_man_auz chimed in and fixed some bugs when exporting to Ogre-Next, while also cleaning up the codebase and improving documentation.

Finally, yours truly added CI unit-tests, which make contributing to blender2ogre easier.

OpenAL EFX support in ogre-audiovideo

Again, contributed by sercero are some important additions to the audio part of the ogre-audiovideo project which drastically improve the useability.

The first one is that you no longer need boost to enable threading. OgreOggSound will now follow whatever Ogre is configured with.

The second one is being able to use EFX effects with openal-soft instead of the long-dead creative implementation. This enables effects like reverb or bandpass filters.

Read more in the release-notes. This release was too, done by sercero which kindly took the burden of co-maintaining the project.

Ogre 13 released

We just tagged the Ogre 13 release, making it the new current and recommended version. We would advise you to update wherever possible, to benefit from all the fixes and improvements that made their way into the new release.
This release represents 2.5 years of work from various contributors when compared to the previous major 1.12 release. Compared to the last Ogre minor release (1.12.12), however we are only talking about 4 months. Here, you will mainly find bugfixes and the architectural changes that justify the version number bump.

Table of Contents

For source code and pre-compiled SDKs, see the downloads page.

(more…)

Ogre ecosystem roundup #7

following the last post about what is going on around Ogre, here is another update. With the Ogre 1.12.12 release, mainly the usability of Ogre was improved with the following additions.

Table of Contents

Ogre 1.12.12 release

The last 1.12 release had some serious regressions in D3D9 and GL1, therefore I scheduled one more release in the 1.12.x series.

Updated release notes

As the Ogre 1.12 series was an LTS release, many important features landed after the initial 1.12.0 release. To take this into account and to give an overview which version you need, the “New and Noteworthy” document was updated with the post .0 additions. (search for “12.” to quickly skim through them).

Nevertheless, there are also some new features in the 1.12.12 release itself:

Cubemap support in compositors

Compositors render targets can now be cubemaps by adding the cubic keyword to the texture declaration – just like with material texture_units.

To really take advantage of this, you can now also specify the camera to use when doing render_scene passes. This way any camera in your scene can be used as an environment-probe for cube mapping.

Finally, to really avoid touching any C++, there is now the align_to_face keyword which automatically orients the given camera to point to the targeted cubemap face.

See this commit on how these things can simplify your code and this for further documentation.

Terrain Component in Bindings

Thanks to a contribution by Natan Fernandes there is now initial support of the Terrain Component in our C#/ Java/ Python bindings.

Python SDK as PIP package

Python programmers can now obtain a Ogre SDK directly from PyPI as they are used to with:

pip install ogre-python

Just as the MSVC and Android SDKs, it includes the assimp plugin which allows to load almost any mesh format and ImGui, so you can create a UI in a breeze.
For now only Python 3.8 is covered – but on all platforms. This means you can now have a pre-compiled package for OSX and Linux too.

Improved blender2ogre

Thanks to some great work by Guillermo “sercero” Ojea Quintana, the blender2ogre export settings are much more user friendly now:

On top of having some context what a option might do, the exporter can now also let Ogre generate the LOD levels. This gives you the choice to

  • Iteratively apply blender “decimate” as in previous releases. This will generate one .mesh file per LOD level, but may result in a visually better LOD
  • use the Ogre MeshLOD Component. This will store all LOD levels in one .mesh file, only creating a separate index-buffer per LOD. This will greatly reduce storage compare to the above method.

SceneNode animations

But he did not stop there, blender2ogre now also exports NodeAnimationTrack based animations. To this end it follows the format introduced by EasyOgreExporter, so both exporters are compatible to each other.

To formalise this, he even extended the .scene type definition, so other exporters implementing this function can validate their output.

Needless to say, he also extended the DotScene Plugin shipped with 1.12.12 to actually load these animations.

.scene support in ogre-meshviewer

Picking up the work by Guillermo, I exteded ogre-meshviewer to load .scene file – in addition to .mesh files and whatever formats assimp knows about.

However, for now it will merely display the scene – there are no inspection tools yet.