Bug in CvEM causes crashes under certain circumstances. Origin of the bug is a wrong initialization of kmeans. (Bug #1407)


Added by Richard Bormann over 13 years ago. Updated almost 13 years ago.


Status:Done Start date:
Priority:Normal Due date:
Assignee:Maria Dimashova % Done:

0%

Category:ml
Target version:2.4.0
Affected version: Operating System:
Difficulty: HW Platform:
Pull request:

Description

The CvEM training procedure crashes for some data queries because the initial cluster estimation with kmeans may return empty clusters. Consequently, not all clusters will be used by EM. In the best cases, we just end up with some clusters at the origin without a crash, sometimes however, the whole program dies.

To remedy this bug, you simply have to change line 870 in file modules/ml/src/em.cpp from:

cvKMeans2(temp_mat, nclusters, labels, termcrit, 10);

to:

cvKMeans2(temp_mat, nclusters, labels, termcrit, 10, 0, cv::KMEANS_PP_CENTERS);

That's all - then it should work fine.

Have a nice day,
Richard


Associated revisions

Revision ff1eb0d5
Added by Roman Donchenko over 11 years ago

Merge pull request #1407 from ilya-lavrenov:ocl_test_mog

History

Updated by Alexander Alekseychuk over 13 years ago

Hello, are you sure it helps in the general case (for any arbitrary dataset)? What happens in the pathological case if nclusters is greater than the number of samples or just the inherent number of clusters is too low, e.g. samples takes only K (K < nclusters) distinctive values?

Updated by Richard Bormann over 13 years ago

Hi, you are right, in those cases the better cluster initialization with cv::KMEANS_PP_CENTERS does not fix the problem. You can insert a loop counting the number of distinct vectors in your training set and abort if there are fewer samples than nclusters.

However, the problem I wanted to address occurs way more often, even with good data where you could easily find more than nclusters clusters. I wanted to point out how to fix that problem. Of course, if you cannot assure, that your data allows to find nclusters clusters, then you need to add more checks before calling the algorithm.

Updated by Alexander Shishkov almost 13 years ago

  • Priority changed from High to Normal
  • Target version deleted ()
  • Description changed from The [[CvEM]] training procedure crashes for some data queries because the ini... to The CvEM training procedure crashes for some data queries because the initial... More

Updated by Alexander Shishkov almost 13 years ago

  • Assignee deleted (Maria Dimashova)

Updated by Maria Dimashova almost 13 years ago

  • Assignee set to Maria Dimashova

Updated by Alexander Shishkov almost 13 years ago

  • Target version deleted ()

Updated by Maria Dimashova almost 13 years ago

Thanks for the report.
We made c++ EM-algorithm implementation - class cv::EM (>=r7987). Old class CvEM was moved to legacy module (so you should include opencv2/legacy/legacy.hpp). Implementation of CvEM methods was replaced by the call of cv::EM ones. Note that now CvEM saves the trained model in new file format because old write method has a bug (all matrices were written twice).
cv::EM uses cv::kmeans function now. Bugs with empty clusters of cv::kmeans were fixed before (new test to check it was added to opencv tests). So we hope that the problem of EM on your data is also solved. Please check it with >=r7987. If you'll get a crash again, please reopen the ticket.

  • Status changed from Open to Done

Updated by Andrey Kamaev almost 13 years ago

  • Target version set to 2.4.0

Also available in: Atom PDF