performance dissipation due to avoidable FPU-divides (Feature #2488)


Added by Alexander Kleinsorge over 4 years ago. Updated over 1 year ago.


Status:Open Start date:2012-10-30
Priority:Normal Due date:
Assignee:Alexander Kleinsorge % Done:

50%

Category:build/install
Target version:Next Hackathon
Difficulty: Pull request:

Description

any divide by constant "b = (float)a/3;" should be replaced by multiplication "b = (float)a * (1.0f/3);".
especially in embedded systems there is a performance drop (often factor = 10 between FMUL and FDIV) due to avoiable divisions.

this optimization is not done by O3, because of possible violation of ISO/IEEE-rules:
gcc: -ffast-math ==> This option should never be turned on by any -O option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions.

find expression: ".[/][ ]*[0-9]" 
finds *.cpp , e.g.: modules\calib3d\

for sure: "[^/][/][ ]*[\-]*[0-9]+[\.]" 
often in:  "[a-zA_Z]+.*[/][ ]*[\-]*[0-9]+" // sometimes pure integer, which is a different problem

History

Updated by Vadim Pisarevsky over 4 years ago

In the performance-critical code we try to avoid division. If you use some functionality and it's not sufficiently fast, we suggest you to add the corresponding performance tests (if they are missing, run "opencv_perf_calib3d --gtest_list_tests" and check), prepare the patch that does the substitution with the noticeable effect on the performance and submit it here or as a pull request at github. We constantly hit regressions in quality when we do inaccurate optimizations, so we would prefer not to change anything without a strong reason.

  • Tracker changed from Bug to Feature
  • Target version changed from 2.4.3 to 3.0

Updated by Alexander Kleinsorge over 4 years ago

THIS change will NOT introduce any inaccurate optimizations!

Replacing '/a' by '*(1.0f/b)' will give results as close to the right value than the old way '/a'.
It is in the nature of floating-point-numbers to have limited accuracy because of rounding!
Sometimes the rounding gives better results for the one, sometimes for the other method (50:50).
Often the results are identical up to the last bit.

'(a+b)+c' is not always equal to 'a+(b+c)'
[[http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems]]

The advantage is a win in speed, the only disadvantage the work to change the code (which I try to do this year - incl. perf. analysis).

Time for FMUL or FDIV: http://www.agner.org/optimize/instruction_tables.pdf, http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html
FMUL = 5(3) ticks, FDIV = 19 ticks (while an x86 can do only one FDIV per core simultaneously, in contrast to FMUL).
In embedded systems (smart-phones etc) this gap is much higher while it melts down on newest Intel-CPUs.
"division on the other hand is still significantly slower then multiplication because of its recursive nature"

Using (GP)GPU or Intel SSE is inducing more numerical noice than exchanging FDIV CONST1 by FMUL CONST2.
GPU (fast 32bit, slow 64bit), Intel-SSE (32bit, 64bit), x86-FPU (no SIMD) = 80 bit.
80bit = [[http://en.wikipedia.org/wiki/Extended_precision#IEEE_754_extended_precision_formats]]
All of them have rounding noise in the last bit! (but the negative effect is smallest in 80bit)

If inaccurate optimizations in numerical problems matter (seldom in image processing, but in science), you should avoid using GPU or SSE/SIMD!

regards, Alexander

Updated by Alexander Kleinsorge over 3 years ago

I did some optimization (same result in tests), so it can go into 2.4.
1. added const qualifier for parameters and variables. (MISRA/QAC)
2. reduced calloc() in epilines.cpp to avoid memory fragmentation
3. lot of faster math: x==sqrt(x)*sqrt(x), log(sqrt(x))==log(x)*0.5, sqrt(a)<sqrt(b)==a<b, a/b==a*(1/b)

  • Target version changed from 3.0 to 2.4.7
  • Assignee set to Alexander Kleinsorge

Updated by Alexander Kleinsorge over 3 years ago

some files patched in calib3d and features2d. waiting for review

  • % Done changed from 0 to 100

Updated by Alexander Kleinsorge over 3 years ago

  • % Done changed from 100 to 10

Updated by Alexander Kleinsorge over 3 years ago

  • % Done changed from 10 to 50

Updated by Anna Kogan over 3 years ago

  • Description changed from any divide by constant "b = (float)a/3;" should be replaced by mult... to any divide by constant "b = (float)a/3;" should be replaced by mult... More

Updated by Alexander Smorkalov about 3 years ago

  • Target version changed from 2.4.7 to Next Hackathon

Updated by Maksim Shabunin over 1 year ago

Issue has been transferred to GitHub: https://github.com/Itseez/opencv/issues/4444

Also available in: Atom PDF