Lockless queue for threaded mode

Lf3THn4D

05-09-2009 17:00:29

Hi again :)
I've been looking at your progress on this fantastic lib and it pleases me to see it improving day by day. :D I've decided to use this lib for my game Aftershock. However, I've yet to find time to integrate it into my game engine. :( Hopefully it won't be long before I get it in.

Anyways, I noticed for the threaded mode of your library, you are doing full mutex locks on every pool update of the sound system. I'm not sure if I saw it correctly or not, but you might have forgot to lock the mutex for the queuing of sound requests. At anyrate, my proposal isn't about adding yet another mutex lock. My proposal is about removal of redundant mutex locks. If you are to use a lockless queue for all the sound requests (creation, removal, play, pause, stop), you should beable to cut down the usage of mutexes to the minimum. This will definitely improve performance by a huge scale and remove any frame locks on the main thread.

About lockless queue implementation, here's my implementation that I used for my network code:

//! LocklessQueue template
/*!
* \tparam Type Object type to queue.
* This is a lockless queue system to pass
* items from one thread to another.
* \note
* Only 1 thread can push and 1 thread can pop it.
*/
template <class Type>
class LocklessQueue
{
private:
//! buffer to keep the queue.
Type* m_buffer;

//! head of queue list.
size_t m_head;

//! tail of queue list.
size_t m_tail;

//! size of buffer.
size_t m_size;

public:
//! constructor.
inline LocklessQueue(size_t size) :
m_head(0), m_tail(0), m_size(size + 1)
{
m_buffer = new Type[m_size];
}

//! destructor.
inline ~LocklessQueue()
{
delete [] m_buffer;
}

//! push object into the queue.
inline bool push(const Type& obj)
{
size_t next_head = (m_head + 1) % m_size;
if (next_head == m_tail) return false;
m_buffer[m_head] = obj;
m_head = next_head;
return true;
}

//! pop object out from the queue.
inline bool pop(Type& obj)
{
if (m_tail == m_head) return false;
obj = m_buffer[m_tail];
m_tail = (m_tail + 1) % m_size;
return true;
}
};

The code above is a revised version I did of what was given here: http://mooproductions.blogspot.com/2006 ... ision.html

Usage is pretty simple. Main thread pushes while sound thread pops. When push() returns false, that means the queue is full and is unable to add to queue. When pop() returns false, it means there's nothing left in the queue to remove.

Hope it helps. :)

xadh00m

05-09-2009 17:52:14

There was a recent change about the mutexes last week. Did you check the latest resvision?

I don´t want to crosspost, but as a quick question. It seems that this queue only works with one dedicated push and one dedicated pop thread.
Do you by chance know a version where this limitation does not exist?

xad

Lf3THn4D

06-09-2009 01:39:10

Yes, I know about the recent mutex changes. What I've pointed out was in current trunk. It's much better now that it uses a unified lock, but it's locking for the whole of processing queue request which isn't really good. Also that the insertion of the queue request doesn't seem to be locked. This would cause random timing corruption.

From what I understand about lockless queue, it's design is only for a single thread in and single thread out. I've yet to see any suggestion or solution for a multi in and multi out. Not even single in and multi out or vice verse. But this should work well for sound since sound should be handled only in a single sub thread.

stickymango

06-09-2009 22:16:44

@Lf3THn4D

Hmm, sounds like an interesting concept, not come across this before.

I can see the limitation should still work fine with my current implementation, I'm all for improving the speed in any way possible, I'll look into the current lock issues you highlighted, its very possible theres a loophole there..

Regarding lockless implementation, I gather its a case of 'pushing' messages via soundmanager and 'popping' messages in my worker thread? When popping, is the idea to pop a single message in an iteration or clear out the list? Think I'm going to look to integrate this very soon.

Thanks for the heads up!

Lf3THn4D

07-09-2009 06:38:45

Yes :) That's the idea. One catch about this system is that it's limited to the given queue size you initialized it with. But I'm assuming it shouldn't matter much if you give a good acceptable size. After all, too many queue request in the list is bad usage.

stickymango

07-09-2009 11:28:18

Right, I'm looking into this but struggling to pick an easy route for integration. I can see that I have too many lists at present which I use to perform a particular action on a sound, so I'm thinking of implementing one main 'PendingOperations' list which will be linear, where I can push messages onto, without a lock, and pop a message off in my worker thread using the LocklessQueue method.

I'm thinking either operate on a single sound per-iteration or maybe a maximum of 5 sound operations per-iteration to hopefully balance responsiveness against lock stalling, does this sound like a good solution? Any advice or pointers for this?

Thanks.

Lf3THn4D

07-09-2009 12:12:47

I think giving it a limit of 32 or 64 would be good. Just in case stalling happens on the worker thread or when one suddenly executes multiple sounds at once. Using one single queue is a good idea to keep things in sequence. You'll just need a queue operation struct to define the commands for the queue. :) You could perhaps allow users who initialize the sound manager to pick how much queue they want while leaving a good default. Just incase some insane guy does crazy stuffs filling up the queue too fast.

stickymango

07-09-2009 12:30:43

Yeah sounds like what I was planning, I'm assuming you are talking about about queue limit size there, not objects to work on in single iteration?

I was going to use a struct to hold an action ID and a sound pointer, however some functions require some additional parameters (fadeSound() for example), so I'll need to think about that, the main concern is re-writing the handling of the actions, as currently they are executed immediately. I want to keep that option for the non-threaded version but handle the action as a 'request' for the mulit-threaded version, not sure the best way to do that yet, am I over-complicating that or did you see a simple solution to it?

Your help is much appreciated! :D

Lf3THn4D

07-09-2009 12:52:27

Yes, i was talking about the queue limit size.

About the actions handling, what I would do is have some sort of custom param data pointer in the struct for actions that requires additional parameters. It could be a void pointer or it could use a proxy base class where all custom param structs will inherit from. The void pointer method would allow tricks like treating it as a basic type like float or int through static casting for actions that only need one param. Just make sure to delete it when your queue action object that keeps it gets deleted. :)

[edit]
While at it, i'm just wondering. Did you unified the sound source and sound sample into one object? If so, doesn't that means that if I have 10 same sound in the scene, It gets loaded up 10x and takes up 10x more ram space than it actually needs? I think you need to split up sound sample(buffer) and sound source. Have the source share a common samples with other sources. This way, things won't keep getting loaded up just because we need to spawn a new sound source for an explosion or death scream. :P You could make it a smart pointer system such that the sample will get destroyed when no sources are using it. What this calls for is basically a central place where all samples are kept. The only exception to this case is probably streaming where the samples data are streamed as it's being played.
[/edit]

xadh00m

07-09-2009 12:54:51

AFAIK the Ogre worker threads, which you could also use btw., use Any´s to distribute parameters...

http://www.ogre3d.org/forums/viewtopic.php?f=4&t=50160

xad

Lf3THn4D

07-09-2009 13:20:31

Yes, you could even use the Any object. Though I would discourage from using it. Through personal experience, it's a pain to use and I end up having to do illegal force cast all the time which is totally ugly. Not to mention the requirement to overload the << operator for your objects that needs to be set in the Any object. It just complicates things more than help.

The ogre worker thread could be a nice thing to use. However it's designed to allow multiple subthreads. This poses a problem for lockless queuing which is why it doesn't use it. I would bet that the Ogre's worker thread system is actually a little too heavy weight for sound especially due to it's ticketing and channel system. It's meant more for file streaming or doing heavy processing while rendering. So you would end up sharing the same thread with all these heavy processing stuffs which would stall sound. That is a bad idea I think.

Good that you point out these possible solutions though. :) I did thought of it myself but didn't really seemed to be applicable so didn't mention it.

xadh00m

07-09-2009 13:50:13

Hmm, interesting points. AFAIK the implementation is meant to provide unique threads for your custom wishes, too. Short abstract from sinbad:

I also wasn't happy about having a separate thread for background loading and doing other things like this - I really just wanted a fixed pool of threads, and have a queue of tasks that they could pick off as and when they had time.

So, I added the WorkQueue class which allows me to do this. It's not domain-specific, instead it is a general background request/response queue, with a concept of 'channels' for categorising domains of interest (resource, paging, terrain, etc). There is one instance of this class in Root which will control all of Ogre's background processing, for any purpose, with a set of reserved channels. Users can push requests on to the queue too so long as they use their own channels.Request and response handlers can be plugged in to provide the code to implement the behaviour both in the background thread and dealing with the response in the main thread. All the data required to be passed between the background and main threads is encapsulated in an Any(), so you can pass whatever you like (for flexibility, passing by value wherever possible is best).

Users could also instantiate their own instance of WorkQueue if they wanted, although they wouldn't be able to share the same thread pool that way.

Maybe channels in this terminology would be the way to go... Just wondering where the limitations of Ogre internal thread support lies.

xad

stickymango

07-09-2009 14:04:55

Thanks for the suggestions guys, need some time to plan it out before beginning anything, looks like a fair re-write at the moment but it would definitely be a worthwhile improvement!
[edit]
While at it, i'm just wondering. Did you unified the sound source and sound sample into one object? If so, doesn't that means that if I have 10 same sound in the scene, It gets loaded up 10x and takes up 10x more ram space than it actually needs? I think you need to split up sound sample(buffer) and sound source. Have the source share a common samples with other sources. This way, things won't keep getting loaded up just because we need to spawn a new sound source for an explosion or death scream. :P You could make it a smart pointer system such that the sample will get destroyed when no sources are using it. What this calls for is basically a central place where all samples are kept. The only exception to this case is probably streaming where the samples data are streamed as it's being played.
[/edit]

I implement a sharedBuffer list which should re-use buffers whenever a static sound is created more then once, is that what your referring to? I don't use sharedPtr's yet for that but its a good suggestion.

Lf3THn4D

07-09-2009 14:53:27

Oh, now I see it. So you do have a shared buffer system. :) Looks like you're pretty much doing "smart" pointer by yourself with that reference count :P

Back on topic, I think you're doing the right thing too :) This improvement will definitely be worthwhile. :D Go for it! :)

P.S. I have another question/suggestion about API usage, but I'll do it in another thread.

oyvind

15-09-2009 21:09:03

Hi. Just thought I'd let you know of a small issue I had:
It seems OgreOggSoundManager::_requestSoundAction will just drop the action and return false if the queue is full. My problem was that I had a loop that called createSound many times, to load a bunch of files, and some of those sound files would not be loaded.

I changed the code to this, and it seems to work:

void OgreOggSoundManager::_requestSoundAction(const SoundAction& action)
{
if ( !mActionsList ) return;
int i=0;
while (!mActionsList->push(action) && !mShuttingDown && (i++<100))
boost::this_thread::sleep(boost::posix_time::millisec(10));
}

By the way, my version of the code was from September 7th.

stickymango

16-09-2009 09:19:06

:oops: Your quite right, I'd forgotten about handling dropped queue requests so shall add that asap, and probably increase the queue list size as I think its only 32 at the moment which maybe a little low, maybe I'll allow the user to tweak that??

Lf3THn4D

16-09-2009 09:32:01

Hmm.. maybe upping the buffer would be good. Or you could have a temporary queue list that keeps the request when it fails to push to the lockless queue. Then whenever update() or future request is called, just try to push them into the queue until it fails again. That way, you won't end up locking the main thread.

stickymango

16-09-2009 09:45:00

Yeah I've thought about that, might implement the delayed queue list idea.

stickymango

16-09-2009 11:44:55

@oyvind

I've added A commit which has implemented a delayed queue system to handle the situation you came across, I've also added an optional parameter to init() to allow customisation of the action list if the user desires. Can you try it and see if there are any issues you come across?

Thanks,