OpenCL crashes on Mali-T628 MP6 GPU (Bug #4010)
Description
In some cases, the ARM-based Mali-T628 MP6 GPU (on eg Odroid XU3) cannot support 256 workers, so function openCLVerifyKernel throws an exception. I tickled the bug by trying to run SURF_OCL, but from the stack trace it looks like cv::ocl::integral is responsible, so it is likely that many other parts of the OpenCL module are also affected.
Steps to reproduce:
1. Compile attached code
2. make sure attached image is in same directory as compiled binary
3. run binary
Expected output:
Got 715 64-dim descriptors
Actual output:
OpenCV Error: Assertion failed (localThreads0 * localThreads1 * localThreads2 <= kernelWorkGroupSize) in openCLVerifyKernel, file ../modules/ocl/src/cl_operations.cpp, line 349
terminate called after throwing an instance of 'cv::Exception'
what(): ../modules/ocl/src/cl_operations.cpp:349: error: (-215) localThreads0 * localThreads1 * localThreads2 <= kernelWorkGroupSize in function openCLVerifyKernel
Aborted
Stacktrace:
#0 0xb6c6428c in cv::ocl::openCLVerifyKernel(cv::ocl::Context const*, cl_kernel*, unsigned int*) ()
from /usr/local/lib/libopencv_ocl.so.2.4
#1 0xb6c64ab8 in cv::ocl::openCLExecuteKernel(cv::ocl::Context*, _cl_kernel*, unsigned int*, unsigned int*, std::vector<std::pair<unsigned int, void const*>, std::allocator<std::pair<unsigned int, void const*> > >&) () from /usr/local/lib/libopencv_ocl.so.2.4
#2 0xb6c64c24 in cv::ocl::openCLExecuteKernel(cv::ocl::Context*, cv::ocl::ProgramEntry const*, std::string, unsigned int*, unsigned int*, std::vector<std::pair<unsigned int, void const*>, std::allocator<std::pair<unsigned int, void const*> > >&, int, int, char const*) ()
from /usr/local/lib/libopencv_ocl.so.2.4
#3 0xb6c64cbe in cv::ocl::openCLExecuteKernel(cv::ocl::Context*, cv::ocl::ProgramEntry const*, std::string, unsigned int*, unsigned int*, std::vector<std::pair<unsigned int, void const*>, std::allocator<std::pair<unsigned int, void const*> > >&, int, int, char const*) ()
from /usr/local/lib/libopencv_ocl.so.2.4
#4 0xb6c64d68 in cv::ocl::openCLExecuteKernel(cv::ocl::Context*, cv::ocl::ProgramEntry const*, std::string, unsigned int*, unsigned int*, std::vector<std::pair<unsigned int, void const*>, std::allocator<std::pair<unsigned int, void const*> > >&, int, int) ()
from /usr/local/lib/libopencv_ocl.so.2.4
#5 0xb6cd786c in cv::ocl::integral(cv::ocl::oclMat const&, cv::ocl::oclMat&) () from /usr/local/lib/libopencv_ocl.so.2.4
#6 0xb6da1c94 in SURF_OCL_Invoker::SURF_OCL_Invoker(cv::ocl::SURF_OCL&, cv::ocl::oclMat const&, cv::ocl::oclMat const&) ()
from /usr/local/lib/libopencv_nonfree.so.2.4
#7 0x0009e808 in ?? ()
Associated revisions
Merge pull request #4010 from cr333:triangulation_fix_master
History
Updated by Nicu Stiurca over 10 years ago
- File image_1409187896.603828323.bmp added
- File ocl-mali-bug.cpp added
Updated by Nicu Stiurca over 10 years ago
Here is an updated stack trace with line numbers:
#0 cv::ocl::openCLVerifyKernel (ctx=ctx@entry=0x28938, kernel=kernel@entry=0x300e00, localThreads=localThreads@entry=0xbeffe634)
at ../modules/ocl/src/cl_operations.cpp:346
#1 0xb6cab2b0 in cv::ocl::openCLExecuteKernel (ctx=ctx@entry=0x28938, kernel=kernel@entry=0x300e00,
globalThreads=globalThreads@entry=0xbeffe628, localThreads=localThreads@entry=0xbeffe634, args=...)
at ../modules/ocl/src/cl_operations.cpp:406
#2 0xb6cab41c in cv::ocl::openCLExecuteKernel_ (ctx=ctx@entry=0x28938, source=source@entry=0xb6dcc84c <cv::ocl::imgproc_integral_sum>,
kernelName=..., globalThreads=globalThreads@entry=0xbeffe628, localThreads=localThreads@entry=0xbeffe634, args=...,
channels=channels@entry=-1, depth=depth@entry=4, build_options=build_options@entry=0x0) at ../modules/ocl/src/cl_operations.cpp:451
#3 0xb6cab4b6 in cv::ocl::openCLExecuteKernel (ctx=ctx@entry=0x28938, source=source@entry=0xb6dcc84c <cv::ocl::imgproc_integral_sum>,
kernelName=..., globalThreads=globalThreads@entry=0xbeffe628, localThreads=localThreads@entry=0xbeffe634, args=...,
channels=channels@entry=-1, depth=depth@entry=4, build_options=build_options@entry=0x0) at ../modules/ocl/src/cl_operations.cpp:468
#4 0xb6cab560 in cv::ocl::openCLExecuteKernel (ctx=ctx@entry=0x28938, source=source@entry=0xb6dcc84c <cv::ocl::imgproc_integral_sum>,
kernelName=..., globalThreads=globalThreads@entry=0xbeffe628, localThreads=localThreads@entry=0xbeffe634, args=...,
channels=channels@entry=-1, depth=depth@entry=4) at ../modules/ocl/src/cl_operations.cpp:459
#5 0xb6d111da in cv::ocl::integral (src=..., sum=...) at ../modules/ocl/src/imgproc.cpp:985
#6 0xb6dda9ec in SURF_OCL_Invoker::SURF_OCL_Invoker (this=0xbeffe90c, surf=..., img=..., mask=...)
at ../modules/nonfree/src/surf_ocl.cpp:150
#7 0xb6dde526 in cv::ocl::SURF_OCL::operator() (this=this@entry=0xbeffeb64, img=..., mask=..., keypoints=..., descriptors=...,
useProvidedKeypoints=useProvidedKeypoints@entry=false) at ../modules/nonfree/src/surf_ocl.cpp:410
#8 0xb6dde878 in cv::ocl::SURF_OCL::operator() (this=0xbeffeb64, img=..., mask=..., keypoints=..., descriptors=...,
useProvidedKeypoints=false) at ../modules/nonfree/src/surf_ocl.cpp:440
#9 0x00009e7e in main ()
Updated by Ilya Lavrenov over 10 years ago
Hi Nicu,
We know about this problem and indeed we already have these fixes for Android. You can see multiple `#ifdef ANDROID` to fix work group size for it.
So, you can do the same for your GPU and send us an appropriate pull-request. Your help will be appreciated.
Updated by Ilya Lavrenov over 10 years ago
- Status changed from New to Open
- Assignee set to Nicu Stiurca
- Category set to ocl
- Target version set to 2.4.11
Updated by Nicu Stiurca over 10 years ago
Hi Ilya,
Thank you for your input. How do you suggest to detect my platform since I am running Ubuntu, not Android, on the Mali? The platform is capable of running Android by the way, and I suspect some Android phones and/or tablets may use this chip. The point being that these fixes are more tied to the hardware rather than the OS, so maybe add an 'EMBEDDED_LINUX' platform that CMake can detect, and do something like '#if defined(ANDROID) || defined(EMBEDDED_LINUX)' where appropriate? Does this make sense?
Updated by Andreas Flåten about 10 years ago
Any news on this? I've recently purchased an ODROID XU3 and tried to run some OpenCL tests. I checked out the master branch on github, 04/02/2015, OpenCV 3.0.0-dev and compiled with OpenCL support. 1176/1539 test fail when running .../opencv/build/bin/opencv_perf_imgproc -gtest_filter=*OCL*
Updated by Jeong-pyo Kong almost 10 years ago
Hi, Ilya and Nicu.
How is it going?
Also, Ilya.
Could you give anything to refer to this issue?
I'd like to apply it for my mali T628 board(Odroid XU3)
Thanks in advance.
Regards,
JP
Updated by Seunghwa Song almost 10 years ago
Hi, I am using the same board with Nicu's. the odroid XU3 board made by Hardkernel.
I tested ocl-example-facedetection example using camera device and got an error.
In my case, kernerWorkGroupSize updated by calling clGetKernelWorkGroupInfo() was 256 while it is being called seven times.
However, my error occured when integral_cols kernel is executed in integral() function.
modules/ocl/src/imgproc.cpp
Its kernelWorkGroupSize is 64 and this is because CV_Assert occurred.
I ignored this line for test and got an error code of CL_OUT_OF_RESOURCES.
As far I know, we should adjust some thread size as Ilya mentioned above.
(Ilya said this bug was fixed in ANDROID platform with Multiple #ifndef ANDROID)
BTW, ocl-example-clahe example works well.
.
.
- Assignee deleted (
Nicu Stiurca) - File IMG_20150407_230815.jpg added
Updated by Jeong-pyo Kong almost 10 years ago
Jeong-pyo Kong wrote:
Here is the test result.Hi, Ilya and Nicu.
How is it going?
Also, Ilya.
Could you give anything to refer to this issue?
I'd like to apply it for my mali T628 board(Odroid XU3)Thanks in advance.
Regards,
JP
------------------------------------------------
- test
odroid@odroid:~/workspace/opencv/opencv-2.4.10/build/bin$ ./opencv_perf_ocl --gtest_filter=OCL_ErodeFixture_Erode.Erode/1*
...
[==========] 11 tests from 1 test case ran. (3313 ms total)
[ PASSED ] 6 tests.
[ FAILED ] 5 tests, listed below:
[ FAILED ] OCL_ErodeFixture_Erode.Erode/1, where GetParam() = (640x480, 8UC1, 5)
[ FAILED ] OCL_ErodeFixture_Erode.Erode/14, where GetParam() = (1280x720, 32FC4, 3)
[ FAILED ] OCL_ErodeFixture_Erode.Erode/15, where GetParam() = (1280x720, 32FC4, 5)
[ FAILED ] OCL_ErodeFixture_Erode.Erode/16, where GetParam() = (1920x1080, 8UC1, 3)
[ FAILED ] OCL_ErodeFixture_Erode.Erode/17, where GetParam() = (1920x1080, 8UC1, 5)
- full list of Erode from --gtest_list_tests
Erode/1 # GetParam() = (640x480, 8UC1, 5) <----- FAILED
... not tested
Erode/10 # GetParam() = (1280x720, 32FC1, 3)
Erode/11 # GetParam() = (1280x720, 32FC1, 5)
Erode/12 # GetParam() = (1280x720, 8UC4, 3)
Erode/13 # GetParam() = (1280x720, 8UC4, 5)
Erode/14 # GetParam() = (1280x720, 32FC4, 3) <----- FAILED
Erode/15 # GetParam() = (1280x720, 32FC4, 5) <----- FAILED
Erode/16 # GetParam() = (1920x1080, 8UC1, 3) <----- FAILED
Erode/17 # GetParam() = (1920x1080, 8UC1, 5) <----- FAILED
Erode/18 # GetParam() = (1920x1080, 32FC1, 3)
Erode/19 # GetParam() = (1920x1080, 32FC1, 5)
... not tested
------------------------------------------------
I wonder why some items are passed while others are failed.
It seems that ocl is needed to be corrected for arm mali t628.
With the test result, 8UC1 and 32FC4 may make the issue.
I'll check if there are some issues with color channel and kernel size.
Updated by Vadim Pisarevsky almost 10 years ago
OpenCL code in 2.4.x is considered obsolete. Try the new 3.0-dev, where the problem may have been fixed
- Status changed from Open to Cancelled
Updated by Jeong-pyo Kong almost 10 years ago
Vadim Pisarevsky wrote:
OpenCL code in 2.4.x is considered obsolete. Try the new 3.0-dev, where the problem may have been fixed
Oh. Thank you for the information.
One more question.
Also, could we use 3.0 RC1 for this?
https://github.com/Itseez/opencv/archive/3.0.0-rc1.zip