output of GPU detectMultiScale returns multiple of detector size (Feature #1525)
I've started using GPU detectMultiScale on 2.3.1 but the rectangles returned are only multiples of detector size. eg for the face _alt version which is 20x20 then my reported faces are one of 20x20, 40x40, 60x60, etc .
The actual detection seems to be in the right place and reflects the scaling passed in. I checked with scalings from 1.1 through to 2.0 . 1.1 gives good detection at all faces sizes but running up to 2.0 gives holes in the detection as expected, and also the processing time reduces with inreased scaling as expected.
So my question is why does GPU detectMultiScale only return multiple integers of the detector size when clearly it is using the scaling as given ? Is this a bug or a feature of the implementation.
My workaround is to apply the programmatic CPU detectMultiScale in a small ROI around the detected location of each face over the entire scale range bracketing the reported GPU size (ie if 40x40 then detect from 20x20 to 60x60 on a 60x60 pixel ROI. This is relatively low on CPU overhead as the ROI is smallish. Still it all adds up as I'm looking at crowd scenes.
The Graphics card is an older integrated G96M GPU. Querying for the number of multiprocessors (using the device info) returns 1. I think this implies it is one processor cluster as compared to 1 actual core processor unit. Maybe it is only 1 core ? I get 3 frames per second on 800x600 so not too bad for a 3 year old laptop GPU.
|duplicated by Bug #2067: CascadeClassifier GPU / CPU Detection Mismatches||Cancelled||2012-06-19|
PS I have characterised the bug a bit better. The quantisation of the scale only applies between 20x20 to 40x40 for the 20x20 frontal face model. After 40x40 the face can be any size (eg 41x41), but between 20 and 40 the detector reports only either 20 or 40.
The CPU version reports any size. The workaround stands to redetect faces 40 pixels or below.
This is on purpose. The integral image is built only once for the input image, and instead of performing downsampling and recalculation of statistics, the decimation approach is used. Thus what happens is that the float scale parameter is rounded to the low integer and that scale is processed.
The code was written at times of Tesla compute architecture (SM 1.0-1.3 capability), but with the modern GPUs one could easily modify the code so that it is more scale-friendly:
- downsample image from previous level
- calculate statistics for the level
- use it in the classifier.
I had a plan to introduce such patch, but this is not a priority for me now, so no estimate provided.