10% speed improvement in filterSpecklesImpl by switching insertion order of connected components (Feature #3692)

Added by Hernan Badino almost 11 years ago. Updated almost 11 years ago.

Status:	Done	Start date:	2014-05-15
Priority:	Normal	Due date:
Assignee:	Vadim Pisarevsky	% Done:	50%
Category:	core
Target version:	3.0-alpha
Difficulty:	Easy	Pull request:	https://github.com/Itseez/opencv/pull/2764

Description

Intro:
filterSpeckles is a method that performs blob detection based on disparity similarity and removes all those found blobs that contain fewer elements than a predefined threshold. The filterSpeckles is used both in BM and SGBM methods and I it is very useful to get rid of spurious disparities. It is also very well implemented and runs real fast.

Feature:
I found a way of improving the computation time by performing a very simple change. I think that the computation improvement of 10% is due to the reduction of cache misses when iterating over the image.
Description:
The blob detection works starting from a seed (pixel with unassigned label), and performs a wavefront propagation from that seed that checks for neighboors that are similar to the seed. Each of those neighbors then become the seed and the propagation continues until no more similar neighbors are found. The order in which the neighbors are check are: right pixel, left pixel, pixel down, and pixel up. The neighbor pixel is added to a stack, so that the first one to be processed is the last inserted.

Proposal:
Change order in which the neighbors are checked and inserted into the stack. The order should be pixel up, pixel down, pixel left, pixel right. Basically, it means changing these lines in the file calib3d/src/stereosgbm.cpp

if( p.x < width-1 && !lpp[+1] && dpp[+1] != newVal && std::abs(dp - dpp[+1]) <= maxDiff ) {
lpp[+1] = curlabel;
*ws++ = Point2s(p.x+1, p.y);
}
if( p.x > 0 && !lpp[-1] && dpp[-1] != newVal && std::abs(dp - dpp[-1]) <= maxDiff ) {
lpp[-1] = curlabel;
*ws++ = Point2s(p.x-1, p.y);
}
if( p.y < height-1 && !lpp[+width] && dpp[+dstep] != newVal && std::abs(dp - dpp[+dstep]) <= maxDiff ) {
lpp[+width] = curlabel;
*ws++ = Point2s(p.x, p.y+1);
}
if( p.y > 0 && !lpp[-width] && dpp[-dstep] != newVal && std::abs(dp - dpp[-dstep]) <= maxDiff ) {
lpp[-width] = curlabel;
*ws++ = Point2s(p.x, p.y-1);
}

with these:

if( p.y < height-1 && !lpp[+width] && dpp[+dstep] != newVal && std::abs(dp - dpp[+dstep]) <= maxDiff )
{
        lpp[+width] = curlabel;
        *ws++ = Point2s(p.x, p.y+1);
    }
    if( p.y > 0 && !lpp[-width] && dpp[-dstep] != newVal && std::abs(dp - dpp[-dstep]) <= maxDiff )
{
        lpp[-width] = curlabel;
        *ws++ = Point2s(p.x, p.y-1);
    }
    if( p.x < width-1 && !lpp[+1] && dpp[+1] != newVal && std::abs(dp - dpp[+1]) <= maxDiff )
{
        lpp[+1] = curlabel;
        *ws++ = Point2s(p.x+1, p.y);
    }
    if( p.x > 0 && !lpp[-1] && dpp[-1] != newVal && std::abs(dp - dpp[-1]) <= maxDiff )
{
        lpp[-1] = curlabel;
        *ws++ = Point2s(p.x-1, p.y);
    }

where only the first pair of if statements was interchanged with the second pair of if statements.

Rationale:
The above if statements are in a loop. The last element in ws is evaluated next. In the original method, the next one to be evaluated is the one below the current pixel. This might require to load the next disparity image scanline in the cache, leading to a larger probablity of a cache miss. If, instead, the lateral neighborhooding pixels are evaluated next, the required memory is already in the cache.

Improvements:
Measured as the average of 10 trials of processing 555 stereo images of size 720x480 (i.e., average of 5550 images) with ~80% disparity coverage.
Processor: Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
Compiler: gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Platform: Ubuntu 12.04.4 LTS

Proposed version: 2.143 (/+-0.0296816) ms
Original version: 2.367 (/+-0.0316386) ms

Improvement of 10%.

These improvements might be larger on devices with less cache memory than my machine.

History

#1
Updated by Vladislav Vinogradov almost 11 years ago

Hello Hernan Badino,

Thank you for your report!

Could you create a pull request with your fix (http://code.opencv.org/projects/opencv/wiki/How_to_contribute)? All help to the project is highly appreciated!

Status changed from New to Open

#2
Updated by Hernan Badino almost 11 years ago

Vladislav,

I did it. I push the change to master. Should I push the same change to 2.4?

Hernan

% Done changed from 0 to 50

#3
Updated by Vladislav Vinogradov almost 11 years ago

Target version set to 3.0-alpha
Pull request set to https://github.com/Itseez/opencv/pull/2764

#4
Updated by Vladislav Vinogradov almost 11 years ago

The patch was merged into master branch.

Hernan Badino, thank you for your contribution!

Status changed from Open to Done

#5
Updated by Hernan Badino almost 11 years ago

Vladislav,

so you know, I've created a pull request for 2.4 as well.

Happy to contribute.

Hernan

Also available in: Atom PDF

Login	Password

Issues

10% speed improvement in filterSpecklesImpl by switching insertion order of connected components (Feature #3692)

History

#1 Updated by Vladislav Vinogradov almost 11 years ago

#2 Updated by Hernan Badino almost 11 years ago

#3 Updated by Vladislav Vinogradov almost 11 years ago

#4 Updated by Vladislav Vinogradov almost 11 years ago

#5 Updated by Hernan Badino almost 11 years ago

#1
Updated by Vladislav Vinogradov almost 11 years ago

#2
Updated by Hernan Badino almost 11 years ago

#3
Updated by Vladislav Vinogradov almost 11 years ago

#4
Updated by Vladislav Vinogradov almost 11 years ago

#5
Updated by Hernan Badino almost 11 years ago