Wolfmanfx wrote:* Create a heavy scene to showcase OGRE's performance (which we could use to optimize the culling - maybe a city of boxes)
I'm afraid you will be quite disappointed. Ogre's performance is below other AAAs engine standards (Anvil engine, CryEngine, Frostbite 2)
I'm struggling to get 1.000 of rendercalls @20 fps, while Anvil engine is doing three times those render calls at the same frame rate (in both cases, not being GPU bound)
Profiling reveals the compositor wastes a lot of time parsing the scene manager multiple times (when using something other than render_quad) and there are A LOT of cache misses.
The lack of threaded culling makes this even worse. Furthermore, with DirectX 11 threading model, it's possible to process a scene and batch render calls in multiple threads in a very concurrent way:
- One thread handles shadow rendering.
- One thread handles main scene
- One thread handles environment mapping (i.e. reflections)
Ogre's already struggling to get a high amount of entities in scenes doing main scene's & shadow's rendering in the same thread. When I add env. mapping (one pass, not 6) the amount of cache misses inside the scene manager is gigantic.
I'm afraid, as someone suggested, fixing this may require some strong redesign of the Ogre core. For instance, automatic reference counting of pointers goes against a concurrent system. Singletons don't help (as it's very easy for programmers to make a mistake and access a singleton when it isn't safe to do so)
* Refactor OGRE's Scenegraph to support CHC++ and other advanced techniques (tuan's scenemanager could be used as a base)
The industry is moving away from those fancy "advanced techniques" while going for a more raw-power approach taking advantage of the "buy more CPU cores" trend. And since most of those algorithms tend to increase the number of cache misses, that makes them an uglier option.
Using a software rasterizer for occlusion queries is quite a popular one these days.