openCV 2.3.1 crashes with SSE/SSE2 and -O3 (Bug #1932)
Description
Some applications crash immediately when using openCV 2.3.1 libs compiled with mingw32 and SSE/SSE2 and compiler optimization set to -O3.
For instance applications using cvThreshold and cvFindContour crash immediately with SIGSEGV when using libs compiled as Release and SSE/SSE2 enabled and -O3.
This bug appears related to http://code.opencv.org/issues/596 and http://code.opencv.org/issues/1896 however, unlike in these cases where -O2 appears to work for some users, I could only succeed in compiling 2.3.1 with SSE/SSE2 and compiler optimization set to -O1.
I am compiling static .a libs in Release, with all other settings as default except for Cuda support turned off and New Python Support turned off. For the record I could successfully execute applications with cvThreshold and/or cvFindContours if:
1. I use Debug, SSE/SSE2 enabled, optimize -O3
2. I use Release, SSE/SSE2 disabled, optimize -O3
3. I use Release, SSE/SSE2 enabled, optimize -O1 (suggested compromise)
This bug was discovered whilst upgrading openFrameworks core libraries to 2.3.1 - you can see more information about which combinations of Release/Debug, SSE/SSE2 enabled/disabled and -O1/-O2/-O3 I have tried, and other performance tests at https://github.com/openframeworks/openFrameworks/issues/1253
I am using Windows 7 Professional 64-bit on a E8400 @ 3.00Ghz with 4Gb Ram
g++ --version reports TDM-2 mingw32 4.4.1 it is the version that shipped with Codeblocks 10.05.
CB About... reports "Build: May 27 2010 19:10:05 - wx2.8.10 (Windows, unicode) - 32 bit, this version is available at http://www.codeblocks.org/downloads/26
Related issues
duplicated by Bug #1668: crash using namedWindow | Cancelled | |||
duplicated by Bug #1494: imshow crash on Windows 7 32bits | Cancelled | |||
duplicated by Bug #583: mingw32 SSE/SSE2 instabilities | Cancelled | |||
duplicated by Patch #1896: imshow crash | Cancelled | 2012-05-06 |
Associated revisions
#1932 Fixed SSE instability on mingw32
Merge pull request #1932 from seth-planet:master
History
Updated by Andrey Kamaev almost 13 years ago
I've tried the same version of Codeblocks but OpenCV tests and examples work for me even with -O3 flag.
Does the problem occur only in your applications or OpenCV examples fail as well?
Can you check if adding -mpreferred-stack-boundary=2
compiler flag to the OpenCV build resolves your problem?
Updated by Matthew Gingold almost 13 years ago
Hey Andrey
I just re compiled openCV and exmaples with both -O3 (and default options) and -O3 -mpreferred-stack-boundary=2.
The -mpreferred-stack-boundary=2 compiler flags fixes my crashes with openFrameworks applications - which is great news! Do you have an explanation for why this flag is making such a difference? I found some explanation about this flag here: http://gcc.gnu.org/onlinedocs/gcc-4.2.4/gcc/i386-and-x86_002d64-Options.html but would be good to know more...
Also I checked the openCV examples...strangely contour example is not crashing with -O3 (although oF contour finder always crashes), however contour2 example DOES crash, but not straight away - I have to move the slider up and down for a while, but then it always crashes with -O3. This does not happen with the libs compiled with -mpreferred-stack-boundary=2 flag. Strange...
Again thanks for your help, and let me know if there is something else I can check.
Updated by Andrey Kamaev almost 13 years ago
Could you also check the -mstackrealign
flag? It supposed to be a better workaround than -mpreferred-stack-boundary=2
.
You can read details about the problem and flags in this gcc bug report: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
Updated by Matthew Gingold almost 13 years ago
Ok tried -mstackrealign
flag: works fine for both openFrameworks and openCV examples...many thanks for the two workarounds and the link to the GCC bug report. Enlightening would probably overstate my understanding of the issue - but perversely reassuring to see that even the gcc guru's may occasionally have a really really annoying bug ;-)
So would it be correct to say that there is a need for individual functions that rely on SSE/SSE2 in openCV to make sure they are 16 bit aligned, so that non-aligned functions that call them do not crash the stack?
That would seem to be the case if I am understanding the GCC bug report...better than waiting for GCC 4.6.x - as it looks like this bug has been around in many flavors for many years...
I am taking it that you think the -mstackrealign
flag workaround is the preferred option for mingw32 libs using SSE/SSE2?
Updated by Matthew Gingold almost 13 years ago
Not sure why half of that is crossed out, nor how to edit it...if a moderator could change it, that would be ace!
Updated by Andrey Kamaev almost 13 years ago
Not sure why half of that is crossed out, nor how to edit it...if a moderator could change it, that would be ace!
Fixed. Redmine had interpreted "-" as a formatting signs.
So would it be correct to say that there is a need for individual functions that rely on SSE/SSE2 in openCV to make sure they are 16 bit aligned, so that non-aligned functions that call them do not crash the stack?
Seems so. Function prologue has to ensure that incoming stack pointer is properly aligned to use aligned instructions without additional checks.
I am taking it that you think the
-mstackrealign
flag workaround is the preferred option for mingw32 libs using SSE/SSE2?
Yes. It slightly increases the code size but should not affect the performance. The -mpreferred-stack-boundary=2
option forces compiler to always use unaligned SSE loads/stores which are usually slower.
I've committed these options to the OpenCV trunk as default for MinGW (with preference to -mstackrealign
). This will be included into the OpenCV 2.4.1 release.
- Status changed from Open to Done
- Target version set to 2.4.1
Updated by Matthew Gingold almost 13 years ago
Hey Andrey
Unfortunately it looks like -mstackrealign does not solve the problem entirely.
Please see this bug report for openFrameworks (https://github.com/openframeworks/openFrameworks/issues/1300).
I've tested this and indeed cvFlip crashes with libs compiled with -mstackrealign but does not crash when using libs compiled with -mpreferred-stack-boundary=2.
I think baring any other issue that might be causing cvFlip to sigsegv you may need to revert to the (possibly less efficient) -mpreferred-stack-boundary=2 for mingw32.
Let me know if I should test something else - or if there may be some other bug/issue.
Updated by Andrey Kamaev almost 13 years ago
Unfortunately even -mpreferred-stack-boundary=2
does not solve all SSE+MinGW problems. But it is very difficult to debug these problems because crashes are not portable between machines. For now it might be helpful if you can provide more examples of unstable code. Probably I'll try to dig into the MinGW stability problems by the end of the month.
And does cv::flip
also crashes or only cvFlip
is affected?