the problem of SSE2 optimization (Bug #1514)


Added by Yasuhiro Yoshimura about 13 years ago. Updated about 13 years ago.


Status:Done Start date:
Priority:High Due date:
Assignee:Vadim Pisarevsky % Done:

0%

Category:core
Target version:2.4.0
Affected version: Operating System:
Difficulty: HW Platform:
Pull request:

Description

OpenCV's document says that
"By default, the optimized code is enabled unless you disable it in CMake.
The current status can be retrieved using useOptimized.".
http://opencv.itseez.com/modules/core/doc/utility_and_system_functions_and_macros.html#void%20setUseOptimized%28bool%20onoff%29

But, SSE2 optimization is disabled when ENABLE_SSE2 to ON and WITH_IPP to OFF.
Because, useOptimizedFlag becomes false in this case.

modules\core\src\system.cpp

#ifdef HAVE_IPP
volatile bool useOptimizedFlag = true;

struct IPPInitializer {
IPPInitializer(void) { ippStaticInit(); }
};

IPPInitializer ippInitializer;
#else
volatile bool useOptimizedFlag = false;
#endif

In the result, some SSE2 optimization is disabled(Because USE_SSE2 is false).

template<typename T, class Op, class Op8>
void vBinOp8(const T* src1, size_t step1, const T* src2, size_t step2, T* dst, size_t step, Size sz) {
Op8 op8;
Op op;

for( ; sz.height--; src1 += step1/sizeof(src1r0),
src2 += step2/sizeof(src2r0),
dst += step/sizeof(dstr0) ) {
int x = 0;

#if CV_SSE2
if( USE_SSE2 ) {

On the other hand, some SSE2 optimization is enalbled(thresh_8u etc...)

static void scaleAdd_32f(const float* src1, const float* src2, float* dst,
int len, float* _alpha) {
float alpha = *_alpha;
int i = 0;
#if CV_SSE2
if( USE_SSE2 ) {
+m128 a4 = _mm_set1_ps(alpha);
if( (((size_t)src1|(size_t)src2|(size_t)dst) & 15) == 0 )

I think that useOptimizedFlag should be "true" by the default,
and SSE2 optimization processing should check "CV_SSE2" and "USE_SSE2".
Is my understanding correct?


History

Updated by Vadim Pisarevsky about 13 years ago

the problem has been fixed recently in the SVN.

  • Status changed from Open to Done
  • (deleted custom field) set to fixed

Updated by Andrey Kamaev about 13 years ago

  • Target version set to 2.4.0

Also available in: Atom PDF