NEON optimisation of cv::threshold() for iOS (Feature #1455)


Added by Yasuhiro Yoshimura over 13 years ago. Updated over 13 years ago.


Status:Done Start date:
Priority:Normal Due date:
Assignee:Vadim Pisarevsky % Done:

0%

Category:imgproc, video
Target version:-
Difficulty: Pull request:

Description

I implemented NEON optimisation of cv::threshold() for iOS.
I checked that patched cv::threshold() become fast
about 10x speed up on the device(iPod touch 4th).

Maybe, I think that this patch is also effective on Android.


opencv231_neon_thresh_8u.patch - NEON optimisation of cv::threshold() for iOS (6.3 kB) Yasuhiro Yoshimura, 2011-10-31 02:03 pm


Associated revisions

Revision 2dceb68a
Added by Alexander Smorkalov over 11 years ago

Merge pull request #1455 from ilya-lavrenov:ocl_test_output

History

Updated by Andrey Kamaev over 13 years ago

This code can not be included into the OpenCV, because it can fail with SIGSEG after attempt to write unallocated memory.

  • Status changed from Open to Done
  • (deleted custom field) set to invalid

Updated by Yasuhiro Yoshimura over 13 years ago

Replying to [comment:1 andrey.kamaev]:

This code can not be included into the OpenCV, because it can fail with SIGSEG after attempt to write unallocated memory.

Thank you for your comment.
I understand. I should add the following processing at
the beginning of this function.

if( _src.empty() || _dst.empty() ) {
return;
}

But, if "src" and "dst" Mat are NULL, roi.width and roi.width are initialized to 0.
So, unallocated memory in not accessed.

Updated by Andrey Kamaev over 13 years ago

Replying to [comment:2 dandelion]:

Replying to [comment:1 andrey.kamaev]:

This code can not be included into the OpenCV, because it can fail with SIGSEG after attempt to write unallocated memory.

Thank you for your comment.
I understand. I should add the following processing at
the beginning of this function.

if( _src.empty() |@@| _dst.empty() ) {
return;
}

But, if "src" and "dst" Mat are NULL, roi.width and roi.width are initialized to 0.
So, unallocated memory in not accessed.

Empty Mats is not a real problem of your code. You are wrong in the leftovers processing (all the cycles making j+=8). Also I should note that copying SSE optimization in NEON intrinsics rarely result in good code and I think that you can make a noticeably faster version using more suitable instructions.

Also available in: Atom PDF