# performance dissipation due to avoidable FPU-divides (Feature #2488)

**Description**

any divide by constant "b = (float)a/3;" should be replaced by multiplication "b = (float)a * (1.0f/3);".

especially in embedded systems there is a performance drop (often factor = 10 between FMUL and FDIV) due to avoiable divisions.

this optimization is not done by O3, because of possible violation of ISO/IEEE-rules:

gcc: -ffast-math ==> This option should never be turned on by any -O option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions.

find expression: ".[/][ ]*[0-9]" finds *.cpp , e.g.: modules\calib3d\ for sure: "[^/][/][ ]*[\-]*[0-9]+[\.]" often in: "[a-zA_Z]+.*[/][ ]*[\-]*[0-9]+" // sometimes pure integer, which is a different problem

### History

#### Updated by Vadim Pisarevsky over 4 years ago

In the performance-critical code we try to avoid division. If you use some functionality and it's not sufficiently fast, we suggest you to add the corresponding performance tests (if they are missing, run "opencv_perf_calib3d --gtest_list_tests" and check), prepare the patch that does the substitution with the noticeable effect on the performance and submit it here or as a pull request at github. We constantly hit regressions in quality when we do inaccurate optimizations, so we would prefer not to change anything without a strong reason.

**Tracker**changed from*Bug*to*Feature***Target version**changed from*2.4.3*to*3.0*

#### Updated by Alexander Kleinsorge over 4 years ago

THIS change will NOT introduce any inaccurate optimizations!

Replacing '/a' by '*(1.0f/b)' will give results as close to the right value than the old way '/a'.

It is in the nature of floating-point-numbers to have limited accuracy because of rounding!

Sometimes the rounding gives better results for the one, sometimes for the other method (50:50).

Often the results are identical up to the last bit.

'(a+b)+c' is not always equal to 'a+(b+c)'

[[http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems]]

The advantage is a win in speed, the only disadvantage the work to change the code (which I try to do this year - incl. perf. analysis).

Time for FMUL or FDIV: http://www.agner.org/optimize/instruction_tables.pdf, http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html

FMUL = 5(3) ticks, FDIV = 19 ticks (while an x86 can do only one FDIV per core simultaneously, in contrast to FMUL).

In embedded systems (smart-phones etc) this gap is much higher while it melts down on newest Intel-CPUs.

"division on the other hand is still significantly slower then multiplication because of its recursive nature"

Using (GP)GPU or Intel SSE is inducing more numerical noice than exchanging FDIV CONST1 by FMUL CONST2.

GPU (fast 32bit, slow 64bit), Intel-SSE (32bit, 64bit), x86-FPU (no SIMD) = 80 bit.

80bit = [[http://en.wikipedia.org/wiki/Extended_precision#IEEE_754_extended_precision_formats]]

All of them have rounding noise in the last bit! (but the negative effect is smallest in 80bit)

If inaccurate optimizations in numerical problems matter (seldom in image processing, but in science), you should avoid using GPU or SSE/SIMD!

regards, Alexander

#### Updated by Alexander Kleinsorge over 3 years ago

I did some optimization (same result in tests), so it can go into 2.4.

1. added const qualifier for parameters and variables. (MISRA/QAC)

2. reduced calloc() in epilines.cpp to avoid memory fragmentation

3. lot of faster math: x==sqrt(x)*sqrt(x), log(sqrt(x))==log(x)*0.5, sqrt(a)<sqrt(b)==a<b, a/b==a*(1/b)

**Target version**changed from*3.0*to*2.4.7***Assignee**set to*Alexander Kleinsorge*

#### Updated by Alexander Kleinsorge over 3 years ago

some files patched in calib3d and features2d. waiting for review

**% Done**changed from*0*to*100*

#### Updated by Alexander Kleinsorge over 3 years ago

**% Done**changed from*100*to*10*

#### Updated by Alexander Kleinsorge over 3 years ago

**% Done**changed from*10*to*50*

#### Updated by Anna Kogan over 3 years ago

**Description**changed from*any divide by constant "b = (float)a/3;" should be replaced by mult...*to*any divide by constant "b = (float)a/3;" should be replaced by mult...*More

#### Updated by Alexander Smorkalov over 3 years ago

**Target version**changed from*2.4.7*to*Next Hackathon*

#### Updated by Maksim Shabunin almost 2 years ago

Issue has been transferred to GitHub: https://github.com/Itseez/opencv/issues/4444