Assertion Fail on CascadeClassifier on GPU (Bug #1640)


Added by Mark Galea about 13 years ago. Updated almost 12 years ago.


Status:Done Start date:2012-02-29
Priority:Normal Due date:
Assignee:Vladislav Vinogradov % Done:

0%

Category:gpu (cuda)
Target version:2.4.0
Affected version: Operating System:
Difficulty: HW Platform:
Pull request:

Description

The packaged demos cascadeclassifier.cpp throws assertion failures.

OpenCV Error: Gpu Api call (
NCV CUDA Assertion Failed: cudaError_t=4, file=pipeline_bin/packages/packages_64bit/opencv-2.3.1_64bit/opencv/modules/gpu/src/nvidia/NCVHaarObjectDetection.cu, line=1271
) in NCVDebugOutputHandler, file pipeline_bin/packages/packages_64bit/opencv-2.3.1_64bit/opencv/modules/gpu/src/cascadeclassifier.cpp, line 135
terminate called after throwing an instance of 'cv::Exception'
  what():  pipeline_bin/packages/packages_64bit/opencv-2.3.1_64bit/opencv/modules/gpu/src/cascadeclassifier.cpp:135: error: (-217) 
NCV CUDA Assertion Failed: cudaError_t=4, file=pipeline_bin/packages/packages_64bit/opencv-2.3.1_64bit/opencv/modules/gpu/src/nvidia/NCVHaarObjectDetection.cu, line=1271
 in function NCVDebugOutputHandler

This also happens when I try to run the cascadeclassifier_nvidia_api.cpp

My card is a GeForce GTX 470, arch=2.0

Let me know if you need any more details.


example.zip (284.8 kB) Mark Galea, 2012-03-01 01:34 am


Associated revisions

Revision 63b5cf6d
Added by Vladislav Vinogradov about 13 years ago

temporary disabled optimized version of CascadeClassifier (bug #1640)
fixed HaarCascadeLoader test (incorrect behavior due to macros usage)

Revision 5aae21c0
Added by Vladislav Vinogradov almost 13 years ago

fixed bug #1640

Revision 4cbf0cb3
Added by Andrey Pavlenko over 11 years ago

Merge pull request #1640 from alalek:ocl_fix_exp_test

History

Updated by Anatoly Baksheev about 13 years ago

It seems a kernel crash ( cudaError_t=4 means CUDA_ERROR_DEINITIALIZED, i.e. indicates that the CUDA driver is in the process of shutting down).

Anton, do you have any ideas?

Updated by Anton Obukhov about 13 years ago

Please update the the most recent NVIDIA driver, and if the issue persists, tell us the OS, bitness, and driver version.

Updated by Mark Galea about 13 years ago

Hi Anton,

I updated the drivers to the latest using this

http://us.download.nvidia.com/XFree86/Linux-x86_64/295.20/NVIDIA-Linux-x86_64-295.20.run

Unfortunately the problem still persists.

I am currently using Debian GNU/Linux 6.0.4 (squeeze) (64 bit).

Anton, I also tried the cascadeclassifier_nvidia_api_gpu and it is throwing this exception too.

NCV Assertion Failed: cudaError_t=4, file=/opencv-trunk/modules/gpu/src/nvidia/NCVHaarObjectDetection.cu, line=1204
NCV Assertion Failed: cudaError_t=4, file=/opencv-trunk/modules/gpu/src/nvidia/NCVHaarObjectDetection.cu, line=1487
NCV Assertion Failed: retcode=2, file=/opencv-trunk/modules/gpu/src/nvidia/NCVHaarObjectDetection.cu, line=1724
NCV Assertion Failed: NcvStat=2, file=/opencv-trunk/samples/gpu/cascadeclassifier_nvidia_api.cpp, line=127
NCV Assertion Failed: Error in memory counting pass, file=/opencv-trunk/samples/gpu/cascadeclassifier_nvidia_api.cpp, line=316
NCV Assertion Failed: cudaError_t=4, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.cu, line=335
NCV Assertion Failed: cudaError_t=4, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.cu, line=332
NCV Assertion Failed: cudaError_t=4, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.cu, line=486
NCV Assertion Failed: NCVVectorAlloc dtor:: dealloc failed, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.hpp, line=650
NCV Assertion Failed: cudaError_t=4, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.cu, line=486
NCV Assertion Failed: NCVVectorAlloc dtor:: dealloc failed, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.hpp, line=650
NCV Assertion Failed: cudaError_t=4, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.cu, line=486
NCV Assertion Failed: NCVVectorAlloc dtor:: dealloc failed, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.hpp, line=650
NCV Assertion Failed: cudaError_t=4, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.cu, line=489
NCV Assertion Failed: NCVVectorAlloc dtor:: dealloc failed, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.hpp, line=650
NCV Assertion Failed: cudaError_t=4, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.cu, line=489
NCV Assertion Failed: NCVVectorAlloc dtor:: dealloc failed, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.hpp, line=650
NCV Assertion Failed: cudaError_t=4, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.cu, line=489
NCV Assertion Failed: NCVVectorAlloc dtor:: dealloc failed, file=/opencv-trunk/modules/gpu/src/nvidia/core/NCV.hpp, line=650

Thanks,

Mark

Updated by Anton Obukhov about 13 years ago

Hi Mark, do most of the compute-intensive samples from the CUDA C SDK (particles, nbody) run normally on your setup?
If they run normally then I would advise reducing the app to a minimum-code reprocase project and submitting it to NVIDIA bug tracker.

Updated by Mark Galea about 13 years ago

Hi Anton,

I have tried the optical flow stuff on GPU and it works.

Specifically the /opencv-trunk/samples/gpu/pyrlk_optical_flow.cpp and /opencv-trunk/samples/gpu/opticalflow_nvidia_api.cpp examples.

I have attached a bare example project that is throwing the assertion.

To build:
  • Extract the examples.zip
  • Export the OpenCV_DIR environment variable or update line 6 in the CMakelists.txt to your opencv-trunk path.
  • Run cmake .
To run:
  • ./example --cascade haarcascade_frontalface_alt.xml face_example.png

Let me know if you require more information.

Thanks

Mark

  • File example.zip added

Updated by Anton Obukhov about 13 years ago

Hi Mark,

Just one more question: is this issue reproducible on any other OS/bitness/hardware combination? Is this a bug which revealed itself after a major driver revision update?

I no longer work at NVIDIA, thus please file the bug as a registered CUDA developer from your account. (I don't have access to internal bugtracker, neither can I reproduce your issue on a machine).

As for the reprocase, it should be self-contained (no dependencies) and be possible to compile with a traditional Makefile. It should be rather simple to take NCV out of OpenCV framework as it has almost no dependencies.

Thanks, and please keep this tracker posted about the progress,
Anton

Updated by Mark Galea about 13 years ago

Hi Anton,

I tried the same example on a MacBookPro (64-bit) with the following device:

"GeForce 320M" 253Mb, sm_12 (not Fermi), 48 cores, Driver/Runtime ver.4.10/4.10

and it works. Seems to be a problem specific to the hardware combination described.

As for the test example I am kind of lost. I do not know what you mean by 'take NCV out of OpenCV framework'. I am not experience in this area and would appreciate your help.

Thanks,

Mark

Updated by Anton Obukhov about 13 years ago

Mark,

What I meant is to report a bug with a minimum standalone app, compiling from a minimum amount of source code. For that one should take the buggy code and strip it until stripping further is hard or impossible. Please note that all interaction with user should be eliminated, and all unnecessary IOs removed too. The result should be a no-parameters console app built from a Makefile for linux, which reproduces the failure. Then the resulting app should be verified to reproduce the issue (sometimes stripping makes the bug go away - need to verify after every step!), and then it can be submitted to the bugtracker using your CUDA registered developer account. In case you don't have a registered account, it is probably a good time to request for one here:
http://developer.nvidia.com/nvidia-registered-developer-program

Unfortunately, this process is rather time and other resource consuming (on the user side - on the stripping phase, on NVIDIA side - on bug triaging stage), but it is necessary to resolve the issue and help to improve the driver/compiler or any other component causing the issue to happen. You may also try to workaround the issue while NVIDIA is working on the bug by modifying and disabling kernels, which invocations precede the observed failure.

Please feel free to ask any more questions if you have them,
Anton

Updated by Mark Galea about 13 years ago

Hi Anton,

I did some progress on this issue and managed to get the code running by changing the line 1047 in NCVHaarObjectDetection.cu to the following.

NcvBool bDoAtomicCompaction = false; //devProp.major >= 2 || (devProp.major == 1 && devProp.minor >= 3);

This was motivated by comparing two executions; one which runs successfully on my mac and the other one which fails on this machine. When the bDoAtomicCompaction flag is hard coded to false both the cascadeclassifier_nvidia_api and the cascadeclassifier in the gpu samples work. Could there be a missing condition there? Also could anyone shed some light on what the bDoAtomicCompaction flag does.

I have included the dump from the deviceInfo for my card maybe it helps in some way.

Device 0: "GeForce GTX 470" 
  CUDA Driver Version / Runtime Version          4.2 / 4.1
  CUDA Capability Major/Minor version number:    2.0
  Total amount of global memory:                 1279 MBytes (1341325312 bytes)
  (14) Multiprocessors x (32) CUDA Cores/MP:     448 CUDA Cores
  GPU Clock Speed:                               1.22 GHz
  Memory Clock rate:                             1674.00 Mhz
  Memory Bus Width:                              320-bit
  L2 Cache Size:                                 655360 bytes
  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 65535
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and execution:                 Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Thanks,

Mark

Updated by Anton Obukhov about 13 years ago

This is just a switch between a more and less efficient algorithms. The switch is turned on for the newer GPUs. So if you are experiencing problems and don't want to do the whole reprocase thing - just leave it to false.

Updated by Mark Galea about 13 years ago

Hi Anton,

Will try to get a minimal example and submit it to the NVidia Team.

Thanks for your help.

Mark

Updated by Anatoly Baksheev about 13 years ago

I am able to reproduce it. It seems we should look into your kernels. Anton, can't it be a FD bug? BTW, NVidia tests fail.

Updated by Vladislav Vinogradov about 13 years ago

  • Category set to gpu (cuda)

Updated by Vladislav Vinogradov about 13 years ago

  • Status deleted (Open)
  • Assignee set to Vladislav Vinogradov

Updated by Alexander Shishkov almost 13 years ago

  • Status set to Open

Updated by Vladislav Vinogradov almost 13 years ago

  • Status changed from Open to Done
  • Target version deleted ()

Updated by Alexander Shishkov almost 13 years ago

  • Target version set to 2.4.0

Updated by T Abdullah almost 12 years ago

Hi All,

I am trying to run GPU sample for cascadeclassifier from GPU module in Opencv2.4.3 on Windows 7
I am getting the following error while running......

Device 0: "GeForce 310" 512Mb, sm_12 (not Fermi), 16 cores, Driver/Runtime ver.5.0/4.20
OpenCV Error: Gpu API call (NCV Assertion Failed: NcvStat=25, file=C:/slave/WinInstallerMegaPack/src/opencv/modules/gpu/src/nvidia/NCVHaarObjectDetection.cu, line=2197
) in unknown function, file C:\slave\WinInstallerMegaPack\src\opencv\modules\gpu\src\cascadeclassifier.cpp, line 172

I have already updated my GeForce driver, and running the project with CUDA 4.2. Any help is appreciated.

Mark Galea wrote:

Hi Anton,

I updated the drivers to the latest using this

http://us.download.nvidia.com/XFree86/Linux-x86_64/295.20/NVIDIA-Linux-x86_64-295.20.run

Unfortunately the problem still persists.

I am currently using Debian GNU/Linux 6.0.4 (squeeze) (64 bit).

Anton, I also tried the cascadeclassifier_nvidia_api_gpu and it is throwing this exception too.

[...]

Thanks,

Mark

Also available in: Atom PDF