cv::ml::StatModel::calcError not working for responses of type CV_32S (Bug #4258)


Added by Jochen Goertler almost 10 years ago. Updated over 9 years ago.


Status:New Start date:2015-03-30
Priority:Normal Due date:
Assignee:Maria Dimashova % Done:

0%

Category:ml
Target version:3.0
Affected version:branch 'master' (3.0-dev) Operating System:Any
Difficulty: HW Platform:Any
Pull request:

Description

I have posted this on the Q&A Forum:
http://answers.opencv.org/question/57171/cvmlstatmodelcalcerror-not-working-for-responses-of-type-cv_32s/
But did not receive answers.

I have a feature vector consisting of several ordered variables. My responses on the other hand are categorical and of type CV_32S. I now want to create a RTrees for this problem. The documentation of TrainData::create() states that it is possible to have train data of type CV_32S:


responses – matrix of responses. If the responses are scalar, they should be stored as a single row or as a single column. The matrix should have type CV_32F or CV_32S (in the former case the responses are considered as ordered by default; in the latter case - as categorical)

In the documentation of RTrees I can't find a reason for this to be illegal.

However if I train my RTrees as described in the following gist:
https://gist.github.com/grtlr/b6195466421c9606b8f9

In Case 1 and Case 2 the output is something like the following:

calc error: 46
pred error: 0.17

Only for Case 3 the error is computed correctly:

calc error: 24
pred error: 0.24

Is this behavior desired? If so, maybe this should be clarified in the documentation of StatModel, RTrees or TrainData?

The problem seems to be in this part of the StatModel::calcError() method:

...
float val = predict(sample);
float val0 = responses.at<float>(si);

if( isclassifier )
    err += fabs(val - val0) > FLT_EPSILON;
...

If responses is of type int this would lead to a different val0 then expected?

I think this could be fixed by checking the type of responses and switching between at<float> and at<int>?

I was quite confused, that calcError returns a result between 0 <= x <= 100, although the return type is float. In my opinion a return value between 0 <= x <= 1 would be more appropriate. What do you think?


History

Updated by Jochen Goertler almost 10 years ago

If there are changes to be made, I could try to provide a pull request with the necessary fixes.

Updated by Maksim Shabunin almost 10 years ago

  • Target version set to 3.0

Updated by Maksim Shabunin over 9 years ago

Issue has been transferred to GitHub: https://github.com/Itseez/opencv/issues/4958

Also available in: Atom PDF