A little late report. We know we missed April & May in the middle. But don’t worry. We’ve been busy!
So…what’s new in the Ogre 2.1 development branch?
1. Added depth texture support! This feature has been requested many times for a very long time. It was about time we added it!
Now you can write directly to depth buffers (aka depth-only passes) and read from them. This is very useful for Shadow Mapping. It also allows us to do PCF filtering in hardware with OpenGL.
But you can also read the depth buffers from regular passes, which is useful for reconstructing position in Deferred Shading systems, and post-processing effects that need depth, like SSAO and Depth of Field, without having to use MRT or another emulation to get the depth.
We make the distinction between “depth buffer” and “depth textures”. In Ogre, “depth textures” are buffers that have been requested to be read from as a texture at some point in time. If you want to ever use it as a texture, you’ll want to request a depth texture (controlled via RenderTarget::setPreferDepthTexture).
A “depth buffer” is a depth buffer that you will never be reading from as a texture and that can’t be used as such. This is because certain hardware gets certain optimizations or gets more precise depth formats available that would otherwise be unavailable if you ask for a depth textures.
For most modern hardware though, there’s probably no noticeable performance difference in this flag.
2. Added tutorial sample on how to render skies (was a popular enough request to already warrant it at this point in time) and added DynamicGeometry and CustomRenderable samples. There were a lot of requests on how to manipulate v2 vertex buffer objects for dynamically changing vertex information in real time (or baking your own custom solutions). These samples now show how to do it.
3. Added HDR! Be sure to check out our new HDR sample!
HDR combined with PBR lets us use real world values as input for our lighting and a real world scale such as lumen, lux and EV Stops (photography). Or you can just use if for artistic expression. By default we perform an anomorphic lens flare effect as bloom instead of regular bloom.
4. Added ambient lighting and hemisphere-based ambient lighting. This feature was missing from our PBS system and had been requested multiple times. It’s not Global Illumination, but hemisphere lighting is a pretty cheap fake.
Checkout SceneManager::setAmbientLight, and HlmsPbs::setAmbientLightMode.
Hemisphere lighting works by simulating a sphere with two colours in both extremes and interpolating between the two based on dot(hemisphereDir, normal) for diffuse and dot(hemisphereDir, reflection) for specular and lastly combining them to the final colour using the same formulas we use for cubemap IBLs.
This feature became necessary with the addition of HDR. Having a scene being lit & burnt with the power of a 97.000-lumen sun next to a shadowed pixel with 0 luminance (that’s darker than a black hole!) causes the auto exposure system to go haywire. Who would’ve guessed?
The following pictures show the difference. The effect was strengthened to show the differences:
- Everything is blueish due to the sky (effect pronounced on purpose in the picture).
- The sphere’s lower-left border has an orange outline. This is the fake simulated bounce of the sunlight.
- Not easy to see in the picture: The ambient colour depends on the fresnel setting. Mirror like materials (high fresnel) show much more ambient lighting and the light reflected depends on the camera. Non-mirrors (low fresnel) exhibit less ambient lighting and the ambient light reflection doesn’t depend on the camera.
5. Working on UAV (Unordered Access Views), memory barriers, asynchronous shaders and preparing the ground work for D3D12 & Vulkan. Whoa! What a title! There’s a lot of it chunked together but it’s all related. Let’s go in parts:
5.a UAVs: Unordered Access Views were introduced in D3D11 and in two stages in OpenGL 4.2 (via GL_ARB_shader_image_load_store) and 4.3 (via GL_ARB_shader_storage_buffer_object). Therefore we require GL 4.3 to actually get UAV functionality. UAVs allow a shader to write or read in any random order without hazard checking. It enables advanced users to implement powerful techniques like Order Independent Transparency, Bokeh, Depth Of Field, Forward+ lighting (aka Clustered Forward), Raytraced shadows, etc.
This is a great addition we have been looking forward to for a while. This is already working.
5.b Memory barriers and asynchronous shaders: D3D11 forces an implicit memory barrier between each command that uses an UAV. However OpenGL, Vulkan & D3D12 require explicit memory barriers.
This means that if compute shader A is writing to the UAV and compute shader B (which was launched afterwards) reads from that same UAV, it is not guaranteed that B will see the writes from A. It is possible that B reads from A before A finished. It is even possible that A has already finished but the writes are still in the cache and not flushed to the main memory, hence B will still not see the writes. To guarantee that B sees the writes from A and gets executed afterwards, a memory barrier is needed.
This enables a lot of parallel performance optimizations (if a compute shader writes to A and another reads from B, they can both execute at the same time in the GPU as they do not depend from each other).
Our new compositor system requires you to declare the access flags every time an UAV is used (read, write, read & write) so that the compositor can calculate where memory barriers are needed, bake & place them at the exact moments that are required to avoid hazards & race conditions.
5.c Preparing the ground work for D3D12 & Vulkan: In D3D12 & Vulkan, not only UAVs require a barrier. Everything does. Every time a chunk of GPU memory is bound to the pipeline as something different (e.g. as a render target, as a texture, as a vertex buffer, as an UAV), not only an explicit barrier is needed, but also a “Resource Transition” to tell the driver we will be no longer be using the resource e.g. as a render target but rather as a texture.
A naive approach would be to check if we need to create a “Resource Transition” right before we bind the resource. But that’s just what D3D11 & OpenGL do…which pretty much eliminates the performance optimization advantages that can be taken in D3D12 and Vulkan in this area.
Fortunately, the compositor sits at the best place to determine most (if not all) of the resource transitions that will be needed and evaluate once and bake the resource transitions into the chain of nodes & passes executions, while removing redundant transitions that would otherwise be hard to detect.
Most of this ground work code has already been written, but it’s not ready yet. Still work in progress.
Well, that’s all for today. /Signing off!