1. Introduction : We need different thread safe data structures/containers in multithreaded programming. Even though many applications use MPMC ( multi producer multi consumer ) , sometimes we need single producer single consumer solutions.
In this article , I will show a SPSC thread safe queue implementation without using mutual exclusion/critical section based on ring buffer data structure.This implementation is based on a video I watched on that address : https://skillsmatter.com/skillscasts/6163-high-performance-single-producer-single-consumer-in-memory-queue
And based on the video , the original is implementation is based on LMAX Distruptor project , which is an opensource high performance library used in LMAX trading system : https://lmax-exchange.github.io/disruptor/
You can find source code for this article on :
2. What is a ring buffer ? : Ring buffer is a cyclic/circular buffer. It can easily be used as underlying data structure for FIFO ( First in First out ) queue implementations.
Below you can see its structure :
Below you can see how it is implemented on memory :
For the implementation you need to pointers pointing to head and tail. And you increment them as you make enqueue and dequeue operations. Below you can see a non-thread-safe ring buffer operating on 64 bit ints :
It is quite advantageous using it under memory constraints as you don`t need to allocate more memory. On the other hand one disadvantage is you have to know your maximum queue size at the beginning. Therefore it might lead to data overriding.
3. Lock based thread safe queue implementation : For a simple lock based thread safe implementation , we can use mutual exclusion. For this one we will add locks to push and pop functions :
4. Lockless thread safe queue implementation : Rather than using locks, we will get help from atomic variables. To make it thread safe, what we need to ensure is that head pointer never catches tail pointer from behind or vice versa. For the implementation we will introduce 2 more helper functions called try_push and try_pop :
As you can see above , try_push tries to see if head is about to catch the tail from its behind.And as for synchronisation we make an atomic load for the comparison. The same logic applies to try_pop. Actually by doing this we introduce a virtual limit to a ring buffer queue , therefore our unbounded queue somehow becomes a virtually bounded queue.
5. Benchmark code :
For the bechmark we can create only 1 producer and 1 consumer thread. However we can do as many operation as we want. For benchmarking I specified ring buffer size as 1000 and each thread are making 1 million operations :
6. Benchmark result – Lockless vs Locked queue : Here is my test system :
Linux akhin-GS60-2PC-Ghost 3.16.0-40-generic #54~14.04.1-Ubuntu SMP Wed Jun 10 17:30:45 UTC 2015 x86_64 x86_64 x86_64 GNU/Lin
model name : Intel(R) Core(TM) i7-4700HQ CPU @ 2.40GHz
For benchmarking, I have used “perf stat” on my Ubuntu 14.0.4 LTS :
Lock based queue : 0.437855869 seconds time elapsed
Lockless queue : 0.109763995 seconds time elapsed
Note : Please note that the example classes are only for displaying features, they are not production ready as they are not even freeing memory allocated. For a more complete example , please see :