Hi everybody! It’s me, Matias aka dark_sylinc!
Time to give a progress report:
At the time that I am writing these lines, I am making the final preparations for publicly releasing the AZDO branch aka “Ogre 2.1” (still unnamed as of yet). This doesn’t mean it’s ready / finished, but rather that it becomes public, as there has been a lot of development that has been done behind closed doors.
You can also follow my current ToDo list on my trello board.
As I blogged on my personal website, there is a new material system called “HLMS” (High Level Material System) which is intended to be used for most of your entities.
Let me get this straight: You should be using the HLMS. The usual “materials” are slow. Very slow. They’re inefficient and not suitable for rendering most of your models.
However, old materials are still useful for:
- Quick iteration: You need to write a shader? Just define the material and start coding. Why would you deal with the template’s syntax or a C++ module when you can just write a script and start coding? The HLMS though comes with a command line tool to know how your template translates into a final shader (which is very handy for iteration since it’s fast and will check for syntax errors!), but it’s most useful when you want to write your own C++ module or change the template, not when you want to just experiment. Besides, old timers are used to writing materials.
- Postprocessing effects: Materials are much better suited for this. Materials are data driven, easy to write. Postprocessing FXs don’t need an awful lot of permutations (i. e. having to deal with shadow mapping, instancing, skeleton animation, facial animation). And they’re at no performance disadvantage compared to HLMS: Each FX is a fullscreen pass that needs different shaders, different textures, its own uniforms. Basically, API overhead we can’t optimize. But it doesn’t matter much either, because it’s not like there are 100 fullscreen passes. Usually there’s less than 10.
Under the hood there is an HLMS C++ implementation (HLMS_LOW_LEVEL) that acts just as a proxy to the old material system. The HLMS is an integral part of Ogre 2.1, not just a fancy add-in.
We’re introducing the concept of blocks, most of them are immutable. Being immutable means you can’t change the Macro- Blend- & Samplerblocks after being created. If you want to make a change, you have to create a new block and assign the new one. The previous one won’t be destroyed until asked explicitly.
Macroblocks are like rasterizer states in D3D11, they contain depth read/write settings, culling mode. Blendblocks are like blend states in D3D11, containing the blending modes and their factors. Samplerblocks are like sampler objects in GL3+ or sampler states in D3D11, containing filtering information, texture addressing modes (wrap, clamp, etc), mipmap settings, etc.
Technically on OpenGL render systems (GL3+, GLES2) you could const_cast the pointers, change the block’s parameters (mind you, the pointer is shared by other datablocks, so you will be changing them as well as side effect) and it would probably work. But it will definitely fail on D3D11 render system.
Because of the big overhaul that Ogre 2.1 went through, there were a lot of optimizations that were just incompatible with how the old code worked. On the other hand, removing the old code was unwise as there are a lot of features that haven’t been ported to v2 interfaces yet.
The solution was to build a parallel system that runs alongside the old one, enclosing the old ones in the v1 namespace.
As a result, ‘Entity’ got replaced by ‘Item’ and thus now to access them, you need to write v1::Entity and Item.
Some objects may have the same name but live in a different namespace and thus are not the same: v1::Mesh (defined in OgreMesh.h) and Mesh (defined in OgreMesh2.h).
The Overlay system for example now lives in v1 namespace.
The RenderQueue has been completely refactored. It sorts based on a 64-bit hash which is calculated in RenderQueue::addRenderable. Mesh ID, material ID, texture hash, depth, macro and blend block IDs are taken into account.
Each RenderQueue ID can run in any of the following modes of operation:
V1_LEGACY: It runs as closely as possible to how Ogre 1.x ran. Only low level materials and mobile HLMS materials can work in this mode and only v1 objects can be stored in RenderQueue IDs operating in this mode. This mode of operation is not recommended for a large number of objects.
V1_FAST: Certain obscure features from Ogre 1.x won’t be available (i. e. the HW GlobalInstance buffer). The RenderQueue will first iterate through all objects, auto-instancing when possible and updating the shader’s constant and texture buffers, then using the Command Buffer for all necessary state changes and draw calls. Only v1 objects can live in these RenderQueue IDs, and they must use desktop HLMS materials.
FAST: The new system. It’s similar to V1_FAST. However, only v2 objects can be used and they must be using desktop HLMS materials. The API overhead is extremely low and is more multicore-friendly because RenderQueue::addRenderable is executed in parallel.
You cannot mix v1 and v2 objects in the same RQ ID, however you can store them in different RQ IDs.
What works very well
- OpenGL 3+ RenderSystem: It has been through a thorough overhaul (except for texture image loading). Even in fast legacy mode (there are three modes: Slow legacy, fast legacy, and fast); it is very fast. Faster than D3D9, which has been the king so far in all other Ogre versions. Both real world and synthetic benchmarks show that CPU time is between 7-10% of total frame time (culling, updating the scene graph, preparing the draw commands, etc.), shifting the bottleneck towards the GPU.
In retrospective the CPU time spent could be even less, but I won’t focus on that for now since now the elephant in the room is the GPU (i. e. setting up LOD needs to be easy!).
I have verified that GL3+ is rendering stall-free on my AMD Radeon HD 7770. That is an awesome achievement. (Except when particles are used, or any other system that updates dynamic vertex buffers on a per-frame basis using the v1 buffer interfaces; i. e. SW Pose animation). Runs well on both Windows and Linux. Though on AMD Linux it seems there’s a nasty bug to report (causes severe graphical corruption and/or occasional crashes), but if it weren’t because of it (the corruption manifests itself inconsistently) I would say that surprisingly the Linux version works better than the Windows one.
I’ve got reports that it runs on NVIDIA well. But I have yet to try Intel cards, which are usually the black sheep of OpenGL support.
- The HLMS: The materials “just work”. Define the material properties (diffuse colour, specular, its textures, etc.), and just assign it to your objects. No need for shader knowledge. HW Skeleton animation, normal mapping, detail blending, alpha testing and shadow mapping is taken care for you automatically.
- RenderDoc Debugging: We worked together with the maker of RenderDoc and it was a very satisfying relationship. We helped him fix bugs/crashes in his tool and his tool helped us figure out multiple issues in our code. You must use the nightly builds though; and on AMD use the latest drivers, otherwise crashes may happen. qapitrace doesn’t work. It can trace the application fine, but examining the replay with qapitrace starts leaking memory very quickly until it crashes. GPUPerfStudio works, but not completely. This has been a longstanding issue. Forcing monolithic shaders helps a lot with compatibility.
- Billboards & particle effects: They’re screaming for a performance refactoring. Particle effects use the old hardware buffer interface and often stall inside glMapBufferRange, literally halving the performance.
Sometimes they may crash when assigning the HLMS material via a pointer because HLMS will try to analyze the object’s structure (= executing checks such as: Does it use HW skeleton animation? Does it have normals? Vertex colour?) but the vertexData pointer isn’t created yet, so you have to workaround by specifying just the name of the material and Ogre will delay setting the material to the right moment.
It works, but it isn’t elegant.
- Overlays: They seem to work very well. But the Overlay component is complex and obscure functionality may not work as expected. Additionally, I would like to refactor it in a way that the Compositor doesn’t need a “includeOverlays” parameter, and rather associate a RenderQueue ID with overlay rendering.
- The v2 mesh format: I wrote a new format that is friendly to load for the v2 interfaces. It isn’t thoroughly tested. Furthermore I need to write the serializer that loads v2 meshes into v1 entities. This is quite a bit complicated right now and would need a longer post.
Long story short, if you’re just starting with Ogre and have the luxury of only using the new interfaces, go for the v2 mesh format. But otherwise my recommendation is to use the v1 format and use an import function we provide that will import v1 meshes into v2 meshes.
- Tools: I haven’t tested the Tools. They should be working though (XMLSerializer, MeshUpgrader, etc.).
- HlmsTextureManager: The manager packs together multiple textures into one (texture atlas in GLES2, texture arrays in GL3+) but it needs work. Right now the algorithm is a bit naive and a messy asset folder can quickly cause out of GPU memory exceptions in GL3+.
The reason behind this is that the manager will preallocate X number of entries per texture format. So if you’ve got textures of all different resolutions and formats, a lot of room is reserved that won’t be put to use.
We either make a stronger algorithm for deciding when to preallocate, or build tools that help tidying up the asset folder.
The system is designed to allow defining manual texture packs (and in the future automatic packing could perform a two-pass loading) so that so that no space goes wasted. But right now this is very WIP.
The HlmsTextureManager is also very picky with normal map formats right now. By default they will be converted to R8G8_SIGNED, but it expects source textures to be in uncompressed format (either signed or unsigned). This is the “safest” approach because most artists save their normal maps in R8G8B8 format.
R8G8 is still a fat format. The best and recommended format is BC5S (signed BC5), but there won’t be any automatic conversion done for you yet, so you will have to ensure all the normal maps are BC5S dds textures.
Failure to follow these rules with normal maps (use R8G8_S and provide sources uncompressed, or use BC5S and provide all textures in BC5S format) and expect the Ogre to crash and trigger asserts and log warnings all over the place.
What does not yet work
- D3D11 RenderSystem: At all. I need to merge all the fixes added to the default branch (this is theoretically easy, since the entire D3D11 folder could just be overwritten, there’s barely any Ogre 2.1 specific code in it), then begin to port the VaoManager which is the central memory manager for AZDO rendering in Ogre 2.1.
To be honest, porting to D3D12 will be easier than D3D11 since it has explicit buffer management and persistent mapping.
The VaoManager has been written in mind for APIs that don’t do any kind of hazard tracking (i. e. Mantle, D3D12, and GL4-if-you-know-what-paths-to-use).
- GLES2: It was working 2 months ago, but got out of sync. It wouldn’t be thaaat hard to fix though. Perhaps one big issue is that the project mixes GLES 2 with 3 and we could take a lot of advantage out of GLES3 if they were separated (since GLES3 overcomes many limitations we can take advantage of, and has uniform buffers!).
- OS X support: The GL3+ requires key (basic!) extensions that the current OS X doesn’t expose. The best chance of getting good OS X support is GLES3 or wait until Apple dignifies themselves with a more up to date OpenGL support. There are several extensions that are useful to achieve maximum performance, but particularly the lack of GL_ARB_base_instance (we need glDrawElementsInstancedBaseVertexBaseInstance) is what is blocking the support.
- Volume, Cg, Paging and Volume components: I don’t plan on maintaining them. Someone else will have to do the job (I seriously recommend dropping Cg support though).
- RTSS: In theory became irrelevant by the HLMS. But if you have already based your entire pipeline on RTSS, porting the RTSS shouldn’t be hard, as most of the required work would be to create a proxy HLMS implementation that delegates its work on the RTSS. In fact that may even work better than in 1.x since RTSS used to attach very hacky listeners to obtain the information it needed from the objects utilizing the materials; while an HLMS proxy implementation can provide this information by design.
- Samples: None of the samples are currently in a working state (there are new basic samples though!).
- Intel HD 3000 or below (except HD 2500): The hardware ought to be supported. Unfortunately Intel has decided to not update their OpenGL drivers for these cards, and as such don’t have the necessary features to support Ogre 2.1. Support for these cards should be restored once D3D11 support catches up though, and Linux users may hope for the Open Source Mesa implementation to also catch up eventually. Until then, these cards will not be able to run our bleeding edge code.
As the the branch becomes public, we will upload the new Ogre 2.x Porting Guide which adds three new sections, detailing the HLMS, the AZDO pipeline and the Command Buffer, with a total of 31 extra pages!