Matchtemplate on GPU fails with large images (Bug #1713)
Description
- When matching large images (with mulitple channels) negative match values are obtained, even when using the SQDIFF method, which is impossible.
- When matching large images (single channel) an error is reported with the template and the scene are both large.
For example, This happens for instance (see attachment) on an 3508x4960 scene with a 100x100 template, but not with a 3508x4960 scene with a 60x30 template. This is the error:
OpenCV Error: Unknown error code -219 (CUFFT_ALLOC_FAILED [Code = 2]) in convolve, file /home/tzaman/OpenCV-2.3.1/modules/gpu/src/imgproc.cpp, line 1486
terminate called after throwing an instance of 'cv::Exception'
what(): /home/tzaman/OpenCV-2.3.1/modules/gpu/src/imgproc.cpp:1486: error: (-219) CUFFT_ALLOC_FAILED [Code = 2] in function convolve
Aborted
Associated revisions
#1713 Added the possibility of setting user_block_size manually for gpu::matchTemplate function (and gpu::convolve). Added a buffer param into these functions. Removed using of 2^n block sizes when it's not necessary.
Merge pull request #1713 from pengx17:patch-2
History
Updated by Tim Zaman almost 13 years ago
Edit: I have tested this on two seperate systems on Ubuntu 64bit. I have tested the cards GTX 520, GTX 560, Quadro 600.
Updated by Anatoly Baksheev almost 13 years ago
- Assignee changed from Anatoly Baksheev to Alexey Spizhevoy
Updated by Alexey Spizhevoy almost 13 years ago
Hi Tim,
Thanks for the repro case!
I fixed negative match values issue for the SQDIFF and SQDIFF_NORMED flags in r7628.
I also reproduced GpuMat "out of memory" errors using your code instead of CUFFT failures. Allocation issues are possible due to memory fragmentation, even when there is enough amount of free memory according to API (see http://stackoverflow.com/questions/8684770/how-is-cuda-memory-managed).
- Status changed from Open to Done
- Target version deleted ()
Updated by Tim Zaman almost 13 years ago
A+, Alexey, thanks. Can i ask how i can work around the memory issue? what gpu are you using? 1gb mem? would you advise more? Also, would the new GTx 680 work fine with opencv gpu?
Updated by Alexey Spizhevoy almost 13 years ago
Tim,
I have out-of-memory error using your data on GPU with 1280mb memory.
I'd suggest you to try decreasing resolution of source and template images. E.g. the pattern you attached can be found successfully on the source image after resizing both with 0.2 ratio (each dimension). If it's not an option, a GPU with bigger amount of memory can help.
According to http://developer.nvidia.com/cuda-gpus GTX 680 has 2.1 compute capability, which means OpenCV GPU module compiled by default must work with it.
Updated by Tim Zaman almost 13 years ago
Thanks, I have investigated the out of memory error using 3 cards, and it seems the problem at least scales with the amount of free memory. But i am certain there still is a bug:
This is the maximum height when having 1000mb free and 1000px width:
1.000x*4.992* scene (792x408template)
But this is the maximum width when having 1000mb free and 1000px height:
*46.463*x1.000 scene (792x408template)
Now that is a giant difference. I noticed this when i saw the maximum scene size possible without running out of memory was scaling linearly on the width and height, but not on the area (width*height) as one would expect. Surely i can cope with larger images (resizing is not an option) when i use a 3Gb card, but this seems like a big inconsistency?
Updated by Alexey Spizhevoy almost 13 years ago
- Status changed from Done to Open
Updated by Alexey Spizhevoy almost 13 years ago
What version of OpenCV do you use?
Updated by Tim Zaman almost 13 years ago
CUDA Driver version: 4000
CUDA Runtime version: 4000
OpenCV 2.3.1
Updated by Alexey Spizhevoy almost 13 years ago
Try the latest version from the repository. I can't reproduce such behaviour using http://pastebin.com/5YseqYCX, post your output using trunk OpenCV, please.
Updated by Tim Zaman almost 13 years ago
Confirmed, updating to CUDA 4.1 fixed the problem, and i get a nice out-of-memory error. But can you explain why it fails at, lets say, 22000x1000?
22e6 pixels * 4 bytes (single prec result) =88Mb?
Updated by Alexey Spizhevoy almost 13 years ago
That's because matchTemplate, convolve (used internally) and CUFFT (also used internally) together allocate a lot of auxiliary buffers.
For width=22000 and height=1000:- matchTemplate (e.g. in case of SQDIFF*) allocates image_area*sizeof(long long) buffer for integral image of image intensities squares = 1000*22000*8 = 176 Mb
- convolve allocates real and complex blocks (about 8192x1024 size) for image, template, and result (those blocks are used in forward and inverse FFT) = 8192*1024*3*(4/*real*/+8/*complex*/) = ~302 Mb
- If we check it by calling cudaMemGetInfo before and after cufftPlan2d, we'll see that CUFFT plans consume about 135 Mb
- Source image 22 Mb (uchar) + template ~1 Mb + result ~88 Mb = ~111 Mb
So we have about 724 Mb to allocate, and we didn't take into account that cudaMalloc2d allocates memory with some extra space after each row to keep rows aligned. On my 1280 Mb device cudaMemGetInfo says that I have only ~820 Mb free memory (and I have out-of-memory error on 23000x1000 image, when 16384x1024 block size is used).
We use power-of-two block sizes as CUFFT works faster with such blocks. It's possible to reduce memory usage by varying block size, but that may affect on speed. We are going to add optional parameters to matchTemplate and convolve functions for that.
Updated by Alexey Spizhevoy almost 13 years ago
Using of 2^n block sizes when it's not necessary was removed. It allows using slightly bigger matrices (up to 26000x1000 on my device).
- Status changed from Open to Done
Updated by Alexander Shishkov almost 13 years ago
- Target version set to 2.4.0