Assertion Fail on CascadeClassifier on GPU (Bug #1640)
Description
The packaged demos cascadeclassifier.cpp throws assertion failures.
OpenCV Error: Gpu Api call ( NCV CUDA Assertion Failed: cudaError_t=4, file=pipeline_bin/packages/packages_64bit/opencv-2.3.1_64bit/opencv/modules/gpu/src/nvidia/NCVHaarObjectDetection.cu, line=1271 ) in NCVDebugOutputHandler, file pipeline_bin/packages/packages_64bit/opencv-2.3.1_64bit/opencv/modules/gpu/src/cascadeclassifier.cpp, line 135 terminate called after throwing an instance of 'cv::Exception' what(): pipeline_bin/packages/packages_64bit/opencv-2.3.1_64bit/opencv/modules/gpu/src/cascadeclassifier.cpp:135: error: (-217) NCV CUDA Assertion Failed: cudaError_t=4, file=pipeline_bin/packages/packages_64bit/opencv-2.3.1_64bit/opencv/modules/gpu/src/nvidia/NCVHaarObjectDetection.cu, line=1271 in function NCVDebugOutputHandler
This also happens when I try to run the cascadeclassifier_nvidia_api.cpp
My card is a GeForce GTX 470, arch=2.0
Let me know if you need any more details.
Associated revisions
temporary disabled optimized version of CascadeClassifier (bug #1640)
fixed HaarCascadeLoader test (incorrect behavior due to macros usage)
fixed bug #1640
Merge pull request #1640 from alalek:ocl_fix_exp_test
History
Updated by Anatoly Baksheev about 13 years ago
It seems a kernel crash ( cudaError_t=4 means CUDA_ERROR_DEINITIALIZED, i.e. indicates that the CUDA driver is in the process of shutting down).
Anton, do you have any ideas?
Updated by Anton Obukhov about 13 years ago
Please update the the most recent NVIDIA driver, and if the issue persists, tell us the OS, bitness, and driver version.
Updated by Mark Galea about 13 years ago
Hi Anton,
I updated the drivers to the latest using this
http://us.download.nvidia.com/XFree86/Linux-x86_64/295.20/NVIDIA-Linux-x86_64-295.20.run
Unfortunately the problem still persists.
I am currently using Debian GNU/Linux 6.0.4 (squeeze) (64 bit).
Anton, I also tried the cascadeclassifier_nvidia_api_gpu and it is throwing this exception too.
NCV Assertion Failed: cudaError_t=4, file=/opencv-trunk/modules/gpu/src/nvidia/NCVHaarObjectDetection.cu, line=1204 NCV Assertion Failed: cudaError_t=4, file=/opencv-trunk/modules/gpu/src/nvidia/NCVHaarObjectDetection.cu, line=1487 NCV Assertion Failed: retcode=2, file=/opencv-trunk/modules/gpu/src/nvidia/NCVHaarObjectDetection.cu, line=1724 NCV Assertion Failed: NcvStat=2, file=/opencv-trunk/samples/gpu/cascadeclassifier_nvidia_api.cpp, line=127 NCV Assertion Failed: Error in memory counting pass, file=/opencv-trunk/samples/gpu/cascadeclassifier_nvidia_api.cpp, line=316 NCV Assertion Failed: cudaError_t=4, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.cu, line=335 NCV Assertion Failed: cudaError_t=4, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.cu, line=332 NCV Assertion Failed: cudaError_t=4, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.cu, line=486 NCV Assertion Failed: NCVVectorAlloc dtor:: dealloc failed, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.hpp, line=650 NCV Assertion Failed: cudaError_t=4, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.cu, line=486 NCV Assertion Failed: NCVVectorAlloc dtor:: dealloc failed, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.hpp, line=650 NCV Assertion Failed: cudaError_t=4, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.cu, line=486 NCV Assertion Failed: NCVVectorAlloc dtor:: dealloc failed, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.hpp, line=650 NCV Assertion Failed: cudaError_t=4, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.cu, line=489 NCV Assertion Failed: NCVVectorAlloc dtor:: dealloc failed, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.hpp, line=650 NCV Assertion Failed: cudaError_t=4, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.cu, line=489 NCV Assertion Failed: NCVVectorAlloc dtor:: dealloc failed, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.hpp, line=650 NCV Assertion Failed: cudaError_t=4, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.cu, line=489 NCV Assertion Failed: NCVVectorAlloc dtor:: dealloc failed, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.hpp, line=650
Thanks,
Mark
Updated by Anton Obukhov about 13 years ago
Hi Mark, do most of the compute-intensive samples from the CUDA C SDK (particles, nbody) run normally on your setup?
If they run normally then I would advise reducing the app to a minimum-code reprocase project and submitting it to NVIDIA bug tracker.
Updated by Mark Galea about 13 years ago
Hi Anton,
I have tried the optical flow stuff on GPU and it works.
Specifically the /opencv-trunk/samples/gpu/pyrlk_optical_flow.cpp and /opencv-trunk/samples/gpu/opticalflow_nvidia_api.cpp examples.
I have attached a bare example project that is throwing the assertion.
To build:- Extract the examples.zip
- Export the OpenCV_DIR environment variable or update line 6 in the CMakelists.txt to your opencv-trunk path.
- Run cmake .
- ./example --cascade haarcascade_frontalface_alt.xml face_example.png
Let me know if you require more information.
Thanks
Mark
- File example.zip added
Updated by Anton Obukhov about 13 years ago
Hi Mark,
Just one more question: is this issue reproducible on any other OS/bitness/hardware combination? Is this a bug which revealed itself after a major driver revision update?
I no longer work at NVIDIA, thus please file the bug as a registered CUDA developer from your account. (I don't have access to internal bugtracker, neither can I reproduce your issue on a machine).
As for the reprocase, it should be self-contained (no dependencies) and be possible to compile with a traditional Makefile. It should be rather simple to take NCV out of OpenCV framework as it has almost no dependencies.
Thanks, and please keep this tracker posted about the progress,
Anton
Updated by Mark Galea about 13 years ago
Hi Anton,
I tried the same example on a MacBookPro (64-bit) with the following device:
"GeForce 320M" 253Mb, sm_12 (not Fermi), 48 cores, Driver/Runtime ver.4.10/4.10
and it works. Seems to be a problem specific to the hardware combination described.
As for the test example I am kind of lost. I do not know what you mean by 'take NCV out of OpenCV framework'. I am not experience in this area and would appreciate your help.
Thanks,
Mark
Updated by Anton Obukhov about 13 years ago
Mark,
What I meant is to report a bug with a minimum standalone app, compiling from a minimum amount of source code. For that one should take the buggy code and strip it until stripping further is hard or impossible. Please note that all interaction with user should be eliminated, and all unnecessary IOs removed too. The result should be a no-parameters console app built from a Makefile for linux, which reproduces the failure. Then the resulting app should be verified to reproduce the issue (sometimes stripping makes the bug go away - need to verify after every step!), and then it can be submitted to the bugtracker using your CUDA registered developer account. In case you don't have a registered account, it is probably a good time to request for one here:
http://developer.nvidia.com/nvidia-registered-developer-program
Unfortunately, this process is rather time and other resource consuming (on the user side - on the stripping phase, on NVIDIA side - on bug triaging stage), but it is necessary to resolve the issue and help to improve the driver/compiler or any other component causing the issue to happen. You may also try to workaround the issue while NVIDIA is working on the bug by modifying and disabling kernels, which invocations precede the observed failure.
Please feel free to ask any more questions if you have them,
Anton
Updated by Mark Galea about 13 years ago
Hi Anton,
I did some progress on this issue and managed to get the code running by changing the line 1047 in NCVHaarObjectDetection.cu to the following.
NcvBool bDoAtomicCompaction = false; //devProp.major >= 2 || (devProp.major == 1 && devProp.minor >= 3);
This was motivated by comparing two executions; one which runs successfully on my mac and the other one which fails on this machine. When the bDoAtomicCompaction flag is hard coded to false both the cascadeclassifier_nvidia_api and the cascadeclassifier in the gpu samples work. Could there be a missing condition there? Also could anyone shed some light on what the bDoAtomicCompaction flag does.
I have included the dump from the deviceInfo for my card maybe it helps in some way.
Device 0: "GeForce GTX 470" CUDA Driver Version / Runtime Version 4.2 / 4.1 CUDA Capability Major/Minor version number: 2.0 Total amount of global memory: 1279 MBytes (1341325312 bytes) (14) Multiprocessors x (32) CUDA Cores/MP: 448 CUDA Cores GPU Clock Speed: 1.22 GHz Memory Clock rate: 1674.00 Mhz Memory Bus Width: 320-bit L2 Cache Size: 655360 bytes Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048) Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 32768 Warp size: 32 Maximum number of threads per block: 1024 Maximum sizes of each dimension of a block: 1024 x 1024 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535 Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and execution: Yes with 1 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Concurrent kernel execution: Yes Alignment requirement for Surfaces: Yes Device has ECC support enabled: No Device is using TCC driver mode: No Device supports Unified Addressing (UVA): Yes Device PCI Bus ID / PCI location ID: 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Thanks,
Mark
Updated by Anton Obukhov about 13 years ago
This is just a switch between a more and less efficient algorithms. The switch is turned on for the newer GPUs. So if you are experiencing problems and don't want to do the whole reprocase thing - just leave it to false.
Updated by Mark Galea about 13 years ago
Hi Anton,
Will try to get a minimal example and submit it to the NVidia Team.
Thanks for your help.
Mark
Updated by Anatoly Baksheev about 13 years ago
I am able to reproduce it. It seems we should look into your kernels. Anton, can't it be a FD bug? BTW, NVidia tests fail.
Updated by Vladislav Vinogradov about 13 years ago
- Category set to gpu (cuda)
Updated by Vladislav Vinogradov about 13 years ago
- Status deleted (
Open) - Assignee set to Vladislav Vinogradov
Updated by Alexander Shishkov almost 13 years ago
- Status set to Open
Updated by Vladislav Vinogradov almost 13 years ago
- Status changed from Open to Done
- Target version deleted ()
Updated by Alexander Shishkov almost 13 years ago
- Target version set to 2.4.0
Updated by T Abdullah almost 12 years ago
Hi All,
I am trying to run GPU sample for cascadeclassifier from GPU module in Opencv2.4.3 on Windows 7
I am getting the following error while running......
Device 0: "GeForce 310" 512Mb, sm_12 (not Fermi), 16 cores, Driver/Runtime ver.5.0/4.20
OpenCV Error: Gpu API call (NCV Assertion Failed: NcvStat=25, file=C:/slave/WinInstallerMegaPack/src/opencv/modules/gpu/src/nvidia/NCVHaarObjectDetection.cu, line=2197
) in unknown function, file C:\slave\WinInstallerMegaPack\src\opencv\modules\gpu\src\cascadeclassifier.cpp, line 172
I have already updated my GeForce driver, and running the project with CUDA 4.2. Any help is appreciated.
Mark Galea wrote:
Hi Anton,
I updated the drivers to the latest using this
http://us.download.nvidia.com/XFree86/Linux-x86_64/295.20/NVIDIA-Linux-x86_64-295.20.run
Unfortunately the problem still persists.
I am currently using Debian GNU/Linux 6.0.4 (squeeze) (64 bit).
Anton, I also tried the cascadeclassifier_nvidia_api_gpu and it is throwing this exception too.
[...]
Thanks,
Mark