For the convolution of a 5x5 kernel with a 1000x1000 image of type float32 (time in ms):
opencv 5.43189048767
nidimage 36.602973938
And this factor of performance is visible in the implementations of other libraries as well, e.g. leptonica, theano.
It has been a goal of scikits.image to operate without too many explicit dependencies, so pulling in a fast convolution algorithm has been stated as a very desired goal.
The reason why opencv performs so well, is because of its use of SSE operators. In convolution where we apply the same operation on multiple data items the gains in perfomance are considerable.
The following command for example,
__m128 t0 = _mm_loadu_ps(S);
loads 4 values from the S pointer into the 128 bit register t0, and all operations on on this register operate on these values in parallel.
s0 = _mm_add_ps(s0, s1);
I have implemented a SSE based float32 convolution routine and though a bit slower than opencv, it diminishes the performance gap considerably. Each type needs some additional work, including support for row and column separable convolutions. With this we will get a good foundation for a fast convolution implementation.
Benchmark of current results for the test case:
scikits.image 11.029958725
opencv 5.04112243652
scipy.ndimage 43.2901382446
What about the leptonica vs opencv which one has good performance in terms of speed
ReplyDeleteHow much of an exciting piece of writing, continue creating companion https://python.engineering/17095101-compare-two-dataframes-and-output-their-differences-side-by-side/
ReplyDelete