With default settings on Clang on Linux, there’s no errors (except on OgreMeshTool).
But we haven’t taken a look at deprecated warnings yet.
On Apple/XCode and MSVC there’s very few warnings now
GCC needs Wconversion off because it’s incredibly dumb. It’s got too many false positives. Fixing them would hurt readability too much, causing more harm than good.
Some CMake settings may cause warnings, e.g. we generate lots of warnings if OGRE_CONFIG_DOUBLE is on.
We also applied clang format to everything, and added C++11 override everywhere.
We’ve also removed dead code and added an option to turn off custom memory allocators off which is the default.
We’ve removed Boost, thus CMake won’t waste its time (and that was a lot) looking for an optional dependency we didn’t really need anymore.
Overall in Debug builds we’re seeing a 20% reduction in binary size (incl. symbols)!! While Release builds have between 1% and 3% reductions.
The only drawback is that merging code 2.3 -> 2.4 is now harder as it’s almost guaranteed to cause a merge conflict, since nearly every line got touched.
Nonetheless it seems that merging by “merge using always theirs”, then applying clang format, then seeing the diff of what’s going to be merge is a good approach to prevent bad merges and fix accidentally introducing bugs
The numbers are in MBs
Linux Clang 9 – Debug
Linux Clang 9 – Release
MSVC 2019 – Debug
MSVC 2019 – Release
That’s all for now. Ogre 2.3 was released just a week ago and we wanted to share such an exciting development already happening on 2.4.
First of all, Merry Christmas to all those who celebrate it on behalf of the OGRE Team! (and if you don’t, have a nice day too!)
Second, after a bit more than a year in development, Ogre-Next 2.3.0 is released!
Magnificent work on Device Lost handling by Eugene Golushkov!
Most games don’t care too much about device lost because games can assume they own almost the entire computer while they’re running, and nothing else will be happening. A device lost is considered a critical failure and very uncommon, typically because of a Hardware or Software malfunction. Or a Windows Update in the middle of a gaming session, in which case the gaming experience is already interrupted anyway.
Switching from power saving mode to performance or viceversa (mostly on laptops or other mobile devices)
Due to these two reasons, device lost becomes an almost certainty for long-running applications that could encounter a graphics driver suddenly upgrading; or for mobile/laptop-oriented applications where power mode switching can be very frequent.
Recovering from device lost can range from very easy to very difficult; depending on the complexity of an application and what the application was doing at the time the device was lost.
Eugene’s work goes to great lengths to try to gracefully recover from a Device Lost.
Switch importV1 to createByImportingV1
In 2.2.2 and earlier we had a function called Mesh::importV1 which would populate a v2 mesh by filling it with data from a v1 mesh, effectively importing it.
In 2.2.3 users should use MeshManager::createByImportingV1 instead. This function ‘remembers’ which meshes have been created through a conversion process, which allows device lost handling to repeat this import process and recreate the resources.
Aside from this little difference, there are no major functionality changes and the function arguments are the same.
Shadow’s Normal Offset Bias
We’ve had a couple complaints, but it wasn’t until user SolarPortal made a more exhaustive research where we realized we were not using state of the art shadow mapping techniques.
We were relying on hlmsManager->setShadowMappingUseBackFaces( true ) to hide most self-occlussion errors, but this caused other visual errors.
Normal Offset Bias is a technique from 2011 (yes, it’s old!) which drastically improves self occlussion and shadow acne while improving overall shadow quality; and is much more robust than using inverted-culling during the caster pass.
Therefore this technique replaced the old one and the function HlmsManager::setShadowMappingUseBackFaces()has been removed.
Users can globally control normal-offset and constant biases per cascade by tweaking ShadowTextureDefinition::normalOffsetBias and ShadowTextureDefinition::constantBiasScale respectively.
You can also control them via compositors scripts in the shadow node declaration, using the new keywords constant_bias_scale and normal_offset_bias
Users porting from 2.2.x may notice their shadows are a bit different (for the better!), but may encounter some self shadowing artifacts. Thus they may have to adjust these two biases if they need to.
Unlit vertex and pixel shaders unified
Unlit shaders were still duplicating its code 3 times (one for each RenderSystem) and all of its vertex & pixel shader code has been unified into a single .any file.
Although this shouldn’t impact you at all, users porting from 2.2.x need to make sure old Hlms shader templates from Unlit don’t linger and get mixed with the new files.
Pay special attention the files from Samples/Media/Hlms/Unlit match 1:1 the ones in your project and there aren’t stray .glsl/.hlsl/.metal files from an older version.
If you have customized the Unlit implementation, you may find your customizations to be broken. But they’re easy to fix. For reference look at Colibri’s twocommits which ported its Unlit customizations from 2.2.x to 2.3.0
It is now possible to toggle Depth Clamp on/off. Check if it’s supported via RSC_DEPTH_CLAMP. All desktop GPU should support it unless you’re using extremely old OpenGL drivers. iOS supports it since A11 chip (iPhone 8 or newer)
Users upgrading from older Ogre versions should be careful their libraries and headers don’t get out of sync. A full rebuild is recommended.
The reason being is that HlmsMacroblock (which is used almost anywhere in Ogre) added a new member variable. And if a DLL or header gets out of sync, it likely won’t crash but the artifacts will be very funny (most likely depth buffer will be disabled).
Added shadow pancaking
With the addition of depth clamp, we are now able to push the near plane of directional shadow maps in PSSM (non-stable variant). This greatly enhances depth buffer precision and reduces self-occlusion and acne bugs.
This improvement may make it possible for users to try using PFG_D16_UNORM instead of PFG_D32_FLOAT for shadow mapping, halving memory consumption.
Shadow pancaking is automatically disabled when depth clamp is not supported.
Vulkan is ready!
In Ogre-Next 2.3, Vulkan is considered stable. If you find a bug, please report it.
Most notable known issue is that it appears there are some issues when integrating with Qt we haven’t looked into yet.
Old timers may remember that Ogre could crash if the latest DirectX runtimes were not installed, despite having an OpenGL backend as a fallback.
This was specially true during the Win 9x and Win XP eras which may not have DirectX 9.0c support. And stopped being an issue in the last decade since… well everyone has it now.
This problem came back with the Vulkan plugin, as laptops having very old drivers (e.g. from 2014) with GPUs that were perfectly capable of running Vulkan would crash due to missing system DLLs.
Furthermore, if the GPU cannot do Vulkan, Ogre would also crash.
We added the keyword PluginOptional to the Plugins.cfg file. With this, Ogre will try to load OpenGL, D3D11, Metal and/or Vulkan; and if these plugins fail to load, they will be ignored.
Make sure to update your Plugins.cfg to use this feature to provide a good experience to all of your users, even if they’ve got old HW or SW.
Rather than rendering features, Ogre 2.4 will be focusing on robusting its source code base. There is a lot of code debt which needs to be addressed.
We will change the project from “Ogre” to “Ogre-Next”. The PR is already on its way and has been sitting in the backburner because we didn’t want to risk such a potentially breaking change so close to 2.3’s release. This change will allow installing Ogre 1.x and Ogre-Next side by side at the same time
Move to C++11 and up
Users may remembers my stance on C++11 adoption. Since then, while sadly the bloat is still there (literally compiling with C++98 is just faster because std headers bring in a lot of unnecessary baggage) HW has become faster, compilers did make some marginal improvements on build speeds, and most importantly we’re seeing more trouble maintaining C++98/03 support than just moving to C++11.
Additionally, we’ve long been wanting to use some of the C++11 (and up) built in features such as override keyword which help improve code quality.
Remove dead and deprecated code
Remove Boost (all Boost functionality we depended on can be found on the STL in C++11)
As for features, we will work on those needed by CIVCT:
Metal will start using Root Layouts, just like Vulkan. This will allow us to support a lot more textures and UAVs per shader.
Hlms implementations have a lot of duplicate Samplers for per-pass resources. We must merge them because on D3D11 CIVCT runs out of the limit of 16 samplers.
Some better than others, but VCT (Voxel Cone Tracing) stands out for its high quality at an acceptable performance (on high end GPUs).
However the main problem right now with our VCT implementation is that it’s hard to use and needs a lot of manual tweaking:
Voxelization process is relatively slow. 10k triangles can take 10ms to voxelize on a Radeon RX 6800 XT, which makes it unsuitable for realtime voxelization (only load time or offline)
Large scenes / outdoors need very large resolution (i.e. 1024x32x1024) or just give up to large quality degradations
It works best on setting up static geometry on a relatively small scene like a room or a house.
If your game is divided in small sections that are paged in/out (i.e. PS1 era games like Resident Evil, Final Fantasy 7/8/9, Grim Fandango) VCT would be ideal.
But in current generation of games with continous movement over large areas, VCT falls short, not unless you do some insane amount of tricks.
So we’re looking to improve this and that’s where our new technique Cascaded Image VCT (CIVCT… it wouldn’t be a graphics technique if we didn’t come up with a long acronym) comes in:
Voxelizes much faster (10x to 100x), enabling real time voxelization. Right now we’re focusing on static meshes but it should be possible to support dynamic stuff as well
Works out of the box
Quality settings easy to understand
Adapts to many conditions (indoor, outdoor, small, large scenes)
That would be pretty much the holy grail of real time GI.
Step 1: Image Voxelizer
Our current VctVoxelizer is triangle based: It feeds on triangles, and outputs a 3D voxel (Albedo + Normal + Emissive). This voxel is then fed to VctLighting to produce the final GI result:
Right now we’re using VctVoxelizer voxelizes the entire scene. This is slow.
Image Voxelizer is image-based and consists in two steps:
Reuse VctVoxelizer to separately voxelize each mesh offline and save results to disk (or during load time). At 64x64x64 a mesh would need between 2MB and 3MB of VRAM per mesh (and disk space) depending on whether the object contains emissive materials. Some meshes require much lower resolution though. This is user tweakable. You’d want to dedicate more resolution to important/big meshes, and lower resolution for the unimportant ones.
This may sound too much, but bear in mind it is a fixed cost independent of triangle count. A mesh with a million triangles and a mesh with a 10.000 triangles will both occupy the same amount of VRAM.
Objects are rarely square. For example desk table is often wider than it is tall or deep. Hence it could just need 64x32x32, which is between 0.5MB and 0.75MB
Each frame, we stitch together these 3D voxels of meshes via trilinear interpolation into a scene voxel. This is very fast.
This feature has been fast thanks to Vulkan, which allows us to dynamically index an arbitrary number of bound textures in a single compute dispatch.
OpenGL, Direct3D 11 & Metal* will also support this feature but may experience degraded performance as we must perform the voxelization in multiple passes. How much of a degradation depends on the API, e.g. OpenGL actually will let us dynamically index the texture but has a hard limit on how many textures we can bind per pass.
(*) I’m not sure if Metal supports dynamic texture indexing or not. Needs checking.
Therefore this is how it changed:
This step is done offline or at loading time:
This step can be done every frame or when the camera moves too much, or if an object is moved
There is a downside of this (aside from VRAM usage): We need to voxelize each mesh + material combo. Meaning that if you have a mesh and want to apply different materials, we need to consume 2-3MB per material
This is rarely a problem though because most meshes only use one set of materials. And for those that do, you may be able to get away with baking a material set that is either white or similar; the end results after calculating GI may not vary that much to worth the extra VRAM cost.
For simple colour swaps (e.g. RTS games, FPS with multiplayer teams), this should be workaroundeable by adding a single multiplier value, rather than voxelizing the mesh per material
It should be possible to apply some sort of BC1-like compression, given that the mesh opaqueness and shape is the same. The only thing that changes is colour; thus a delta colour compression scheme could work well
At first I panicked a little while developing the Image Voxelizer because the initial quality was far inferior than that of the original voxelizer.
The problem was that the original VCT is a ‘perfect’ voxelization. i.e. if a triangle touches a single voxel, then that voxel adopts the colour of the triangle. Its neighbour voxels will remain empty. Simple. That results in a ‘thin’ voxel result.
However in IVCT, voxels are interpolated into a scene voxel that will not match in resolution and may be arbitrarily offsetted by subpixels. It’s not aligned either.
The result is that certain voxels have 0.1, 0.2 … 0.9 of the influence of the mesh. This generates ‘fatter’ voxels.
In 2D we would say that the image has a halo around the contours
Once I understood what was going on, I tweaked the math to thin out these voxels by looking at the alpha value of the interpolated mesh and applying an exponential curve to get rid of this halo.
And now it looks very close to the reference VCT implementation!
Step 2: Row Translation
We want to use cascades (a similar concept from shadow mapping, i.e. Cascaded Shadow Maps. In Ogre we call it Parallel Split Shadow Maps but it’s the same thing) concentric around the camera.
That means when the camera moves, once the camera has moved too much, we must move the cascades and re-voxelize.
But we don’t need to voxelize the entire thing from scratch. We can translate everything by 1 voxel, and then revoxelize the new row:
As the camera wants to move, once it moved far enough, we must translate the voxel cascade
Given that we only need to partially update the voxels after camera movement, it makes supporting cascades very fast
Right now this step is handled by VctImageVoxelizer::partialBuild
Step 3: Cascades
This step is currently a work in progress. The implementation is planned to have N cascades (N user defined). During cone tracing, after we reach the end of a cascade we move on to the next cascade, which covers more ground but has coarser resolution, hence lower quality.
Wait isn’t this what UE5’s Lumen does?
AFAIK Lumen is also a Voxel Cone Tracer. Therefore it’s normal there will be similarities. I don’t know if they use cascades though.
As far as I’ve read, Lumen uses an entirely different approach to voxelizing which involves rasterizing from all 6 sides, which makes it very user hostile as meshes must be broken down to individual components (e.g. instead of submitting a house, each wall, column, floor and ceiling must be its own mesh).
With Ogre-Next you just provide the mesh and it will just work (although with manual tunning you could achieve greater memory savings if e.g. the columns are split and voxelized separately).
Wait isn’t this what Godot does?
Well, I was involved in SDFGI advising Juan on the subject, thus of course there are a lot of similarities.
The main difference is that Godot generates a cascade of SDFs (signed distance fields), while Ogre-Next is generating a cascade of voxels.
This allows Godot to render on slower GPUs (and is specially better at specular reflections), but at the expense of accuracy (there’s a significant visual difference when toggling between Godot’s own raw VCT implementation and its SDFGI; but they both look pretty) but I believe these quality issues could be improved in the future.
Having an SDF of the scene also offers interesting potential features such as ‘contact shadows’ in the distance.
Ogre-Next in the future may generate an SDF as well as it offers many potential features (e.g. contact shadows) or speed improvements. Please understand that VCT is an actively researched topic and we are all trying and exploring different methods to see what works best and under what conditions.
The underlying techniques aren’t new, but what made it possible are the new APIs and the raw power provided by current generation of GPUs that can keep up with them (although the current GPU shortage might delay the widespread adoption of these techniques).
Since this technique will be used in Ignition Gazebo for simulations, I had to err on the side of accuracy.
When is it coming?
CIVCT isn’t done yet but hopefully it should be ready 1-2 more weekends (I can only work on this during the weekends). Maybe 3? (I hope not!). I want to release Ogre-Next 2.3 RC0 in the meantime, and when CIVCT is ready a proper Ogre-Next 2.3 release.
The reason it’s taking so little time is because we’re improving on our existing technology and reusing lots of code. We’re just changing a few details to make it faster and more use friendly now that Vulkan gives us that freedom (but again, we plan on supporting this feature on all our API backends).
These improvements are currently living in vct-image branch but has no sample yet showcasing it as it is WIP.