Re-benchmarking PenguinV

In a previous post, I profiled and benchmarked the blob_detection function for penguinV. Looking back on my method and results, I realized I didn’t do the best job. My sample size was small so it will really be difficult to determine if my changes are significant if I reran the benchmark tests. Therefore, I will be benchmarking performance with the performance_tests included in the project.

The bundled performance-test tests the runtime of all the functionalities of the project and builds each file with the following options by default:

-std=c++11 -Wall -Wextra -Wstrict-aliasing -Wpedantic -Wconversion -O2 -march=native

Seeing this, I believe it will be easy to change the optimizations and see their affects on performance. Curious, I ran the test as is on an AArch64 machine to see the results. It outputted the following stats:

Full Performance Test: O2
[1/260] blob_detection_SolidImage::_2048... 644.1+/-4.882 ms
[2/260] blob_detection_SolidImage::_1024... 117.8+/-0.2956 ms
[3/260] blob_detection_SolidImage::_512... 27.54+/-0.2564 ms
[4/260] blob_detection_SolidImage::_256... 6.704+/-0.09339 ms
[5/260] edge_detection_SolidImage::_2048... 1091+/-3.844 ms
[6/260] edge_detection_SolidImage::_1024... 232.2+/-1.265 ms
[7/260] edge_detection_SolidImage::_512... 58.14+/-0.5147 ms
[8/260] edge_detection_SolidImage::_256... 14.67+/-0.1085 ms
[9/260] filtering_SobelFilter::_2048... 96.92+/-0.4424 ms
[10/260] filtering_SobelFilter::_1024... 23.42+/-0.421 ms
[11/260] filtering_SobelFilter::_512... 5.759+/-0.1216 ms
[12/260] filtering_SobelFilter::_256... 1.418+/-0.0152 ms
[13/260] filtering_PrewittFilter::_2048... 99.43+/-0.2582 ms
[14/260] filtering_PrewittFilter::_1024... 23.94+/-0.1781 ms
[15/260] filtering_PrewittFilter::_512... 5.945+/-0.09214 ms
[16/260] filtering_PrewittFilter::_256... 1.471+/-0.01392 ms
[17/260] filtering_MedianFilter3x3::_2048... 1017+/-3.596 ms
[18/260] filtering_MedianFilter3x3::_1024... 250+/-3.204 ms
[19/260] filtering_MedianFilter3x3::_512... 61.91+/-0.9687 ms
[20/260] filtering_MedianFilter3x3::_256... 15.35+/-0.2274 ms
[21/260] image_function::AbsoluteDifference (256x256)... 0.7536+/-0.02952 ms
[22/260] image_function::AbsoluteDifference (512x512)... 3.08+/-0.108 ms
[23/260] image_function::AbsoluteDifference (1024x1024)... 14.38+/-0.3488 ms
[24/260] image_function::AbsoluteDifference (2048x2048)... 65.76+/-0.7173 ms
[25/260] function_pool::AbsoluteDifference (256x256)... 0.8266+/-0.2606 ms
[26/260] function_pool::AbsoluteDifference (512x512)... 2.31+/-0.4604 ms
[27/260] function_pool::AbsoluteDifference (1024x1024)... 12.91+/-1.061 ms
[28/260] function_pool::AbsoluteDifference (2048x2048)... 54.91+/-0.4894 ms
[29/260] image_function_neon::AbsoluteDifference (256x256)... 0.257+/-0.0183 ms
[30/260] image_function_neon::AbsoluteDifference (512x512)... 1.092+/-0.07149 ms
[31/260] image_function_neon::AbsoluteDifference (1024x1024)... 6.195+/-0.1585 ms
[32/260] image_function_neon::AbsoluteDifference (2048x2048)... 33.22+/-0.2824 ms
[33/260] image_function_neon::BitwiseAnd (2048x2048)... 32.99+/-0.3328 ms
[34/260] image_function_neon::BitwiseAnd (1024x1024)... 6.115+/-0.1509 ms
[35/260] image_function_neon::BitwiseAnd (512x512)... 1.049+/-0.06144 ms
[36/260] image_function_neon::BitwiseAnd (256x256)... 0.2454+/-0.02392 ms
[37/260] function_pool::BitwiseAnd (2048x2048)... 54.84+/-0.4876 ms
[38/260] function_pool::BitwiseAnd (1024x1024)... 12.42+/-0.5715 ms
[39/260] function_pool::BitwiseAnd (512x512)... 2.189+/-0.4914 ms
[40/260] function_pool::BitwiseAnd (256x256)... 0.778+/-0.2385 ms
[41/260] image_function::BitwiseAnd (2048x2048)... 57.94+/-0.3558 ms
[42/260] image_function::BitwiseAnd (1024x1024)... 12.45+/-0.468 ms
[43/260] image_function::BitwiseAnd (512x512)... 2.59+/-0.111 ms
[44/260] image_function::BitwiseAnd (256x256)... 0.6221+/-0.03655 ms
[45/260] image_function_neon::BitwiseOr (2048x2048)... 33.01+/-0.4019 ms
[46/260] image_function_neon::BitwiseOr (1024x1024)... 6.19+/-0.2321 ms
[47/260] image_function_neon::BitwiseOr (512x512)... 1.032+/-0.05734 ms
[48/260] image_function_neon::BitwiseOr (256x256)... 0.2428+/-0.02204 ms
[49/260] function_pool::BitwiseOr (2048x2048)... 55.57+/-2.935 ms
[50/260] function_pool::BitwiseOr (1024x1024)... 12.79+/-0.8624 ms
[51/260] function_pool::BitwiseOr (512x512)... 2.233+/-0.5364 ms
[52/260] function_pool::BitwiseOr (256x256)... 0.8+/-0.2369 ms
[53/260] image_function::BitwiseOr (2048x2048)... 59.52+/-2.381 ms
[54/260] image_function::BitwiseOr (1024x1024)... 12.51+/-0.8488 ms
[55/260] image_function::BitwiseOr (512x512)... 2.592+/-0.1702 ms
[56/260] image_function::BitwiseOr (256x256)... 0.6198+/-0.03203 ms
[57/260] image_function_neon::BitwiseXor (2048x2048)... 33.58+/-1.725 ms
[58/260] image_function_neon::BitwiseXor (1024x1024)... 6.392+/-0.8578 ms
[59/260] image_function_neon::BitwiseXor (512x512)... 1.079+/-0.1124 ms
[60/260] image_function_neon::BitwiseXor (256x256)... 0.2399+/-0.0168 ms
[61/260] function_pool::BitwiseXor (2048x2048)... 59.29+/-4.122 ms
[62/260] function_pool::BitwiseXor (1024x1024)... 12.98+/-1.06 ms
[63/260] function_pool::BitwiseXor (512x512)... 2.268+/-0.5645 ms
[64/260] function_pool::BitwiseXor (256x256)... 0.822+/-0.24 ms
[65/260] image_function::BitwiseXor (2048x2048)... 58.87+/-1.657 ms
[66/260] image_function::BitwiseXor (1024x1024)... 12.47+/-0.9454 ms
[67/260] image_function::BitwiseXor (512x512)... 2.563+/-0.1807 ms
[68/260] image_function::BitwiseXor (256x256)... 0.6177+/-0.03174 ms
[69/260] image_function_neon::Maximum (2048x2048)... 33.57+/-1.528 ms
[70/260] image_function_neon::Maximum (1024x1024)... 6.247+/-0.721 ms
[71/260] image_function_neon::Maximum (512x512)... 1.034+/-0.08706 ms
[72/260] image_function_neon::Maximum (256x256)... 0.2423+/-0.02615 ms
[73/260] function_pool::Maximum (2048x2048)... 59.24+/-4.597 ms
[74/260] function_pool::Maximum (1024x1024)... 12.98+/-0.8755 ms
[75/260] function_pool::Maximum (512x512)... 2.245+/-0.4465 ms
[76/260] function_pool::Maximum (256x256)... 0.7986+/-0.2407 ms
[77/260] image_function::Maximum (2048x2048)... 62.84+/-1.615 ms
[78/260] image_function::Maximum (1024x1024)... 13.4+/-1.272 ms
[79/260] image_function::Maximum (512x512)... 2.838+/-0.2363 ms
[80/260] image_function::Maximum (256x256)... 0.6828+/-0.03177 ms
[81/260] image_function_neon::Minimum (2048x2048)... 33.53+/-1.2 ms
[82/260] image_function_neon::Minimum (1024x1024)... 6.283+/-0.6494 ms
[83/260] image_function_neon::Minimum (512x512)... 1.087+/-0.1142 ms
[84/260] image_function_neon::Minimum (256x256)... 0.2436+/-0.02343 ms
[85/260] function_pool::Minimum (2048x2048)... 59.32+/-3.579 ms
[86/260] function_pool::Minimum (1024x1024)... 13.15+/-1.382 ms
[87/260] function_pool::Minimum (512x512)... 2.242+/-0.5249 ms
[88/260] function_pool::Minimum (256x256)... 0.8011+/-0.2037 ms
[89/260] image_function::Minimum (2048x2048)... 62.28+/-1.296 ms
[90/260] image_function::Minimum (1024x1024)... 13.46+/-0.9213 ms
[91/260] image_function::Minimum (512x512)... 2.863+/-0.1996 ms
[92/260] image_function::Minimum (256x256)... 0.681+/-0.0305 ms
[93/260] image_function_neon::Subtract (2048x2048)... 33.32+/-1.299 ms
[94/260] image_function_neon::Subtract (1024x1024)... 6.286+/-0.5606 ms
[95/260] image_function_neon::Subtract (512x512)... 1.108+/-0.1577 ms
[96/260] image_function_neon::Subtract (256x256)... 0.2495+/-0.02794 ms
[97/260] function_pool::Subtract (2048x2048)... 57.56+/-3.527 ms
[98/260] function_pool::Subtract (1024x1024)... 12.67+/-1.054 ms
[99/260] function_pool::Subtract (512x512)... 2.301+/-0.5693 ms
[100/260] function_pool::Subtract (256x256)... 0.8028+/-0.2292 ms
[101/260] image_function::Subtract (2048x2048)... 66.7+/-1.264 ms
[102/260] image_function::Subtract (1024x1024)... 14.74+/-1.194 ms
[103/260] image_function::Subtract (512x512)... 3.159+/-0.2722 ms
[104/260] image_function::Subtract (256x256)... 0.7526+/-0.02795 ms
[105/260] image_function::Flip (256x256)... 0.237+/-0.003931 ms
[106/260] image_function::Flip (512x512)... 0.9722+/-0.03968 ms
[107/260] image_function::Flip (1024x1024)... 4.095+/-0.373 ms
[108/260] image_function::Flip (2048x2048)... 18.27+/-0.7444 ms
[109/260] image_function_neon::Flip (256x256)... 0.08221+/-0.004138 ms
[110/260] image_function_neon::Flip (512x512)... 0.3385+/-0.02833 ms
[111/260] image_function_neon::Flip (1024x1024)... 1.477+/-0.1693 ms
[112/260] image_function_neon::Flip (2048x2048)... 8.173+/-0.6429 ms
[113/260] image_function::GammaCorrection (256x256)... 0.6447+/-0.02131 ms
[114/260] image_function::GammaCorrection (512x512)... 2.331+/-0.08534 ms
[115/260] image_function::GammaCorrection (1024x1024)... 9.381+/-0.4684 ms
[116/260] image_function::GammaCorrection (2048x2048)... 41.52+/-0.8448 ms
[117/260] function_pool::GammaCorrection (256x256)... 0.5948+/-0.1727 ms
[118/260] function_pool::GammaCorrection (512x512)... 1.346+/-0.318 ms
[119/260] function_pool::GammaCorrection (1024x1024)... 5.98+/-0.6899 ms
[120/260] function_pool::GammaCorrection (2048x2048)... 28.38+/-1.462 ms
[121/260] image_function::Invert (256x256)... 0.3844+/-0.008902 ms
[122/260] image_function::Invert (512x512)... 1.592+/-0.08024 ms
[123/260] image_function::Invert (1024x1024)... 6.7+/-0.3903 ms
[124/260] image_function::Invert (2048x2048)... 30.24+/-0.7244 ms
[125/260] function_pool::Invert (256x256)... 0.2956+/-0.1471 ms
[126/260] function_pool::Invert (512x512)... 0.594+/-0.1977 ms
[127/260] function_pool::Invert (1024x1024)... 3.411+/-0.5153 ms
[128/260] function_pool::Invert (2048x2048)... 17.2+/-1.156 ms
[129/260] image_function_neon::Invert (256x256)... 0.07568+/-0.006214 ms
[130/260] image_function_neon::Invert (512x512)... 0.3419+/-0.05994 ms
[131/260] image_function_neon::Invert (1024x1024)... 1.644+/-0.3138 ms
[132/260] image_function_neon::Invert (2048x2048)... 10.82+/-0.9105 ms
[133/260] image_function::Transpose (2048x2048)... 228.6+/-6.665 ms
[134/260] image_function::Transpose (1024x1024)... 33.33+/-1.076 ms
[135/260] image_function::Transpose (512x512)... 3.86+/-0.2014 ms
[136/260] image_function::Transpose (256x256)... 0.5783+/-0.009246 ms
[137/260] image_function::RgbToBgr (256x256)... 0.7123+/-0.04254 ms
[138/260] image_function::RgbToBgr (512x512)... 3.084+/-0.2881 ms
[139/260] image_function::RgbToBgr (1024x1024)... 15.8+/-0.98 ms
[140/260] image_function::RgbToBgr (2048x2048)... 69.02+/-1.114 ms
[141/260] function_pool::RgbToBgr (256x256)... 0.6392+/-0.278 ms
[142/260] function_pool::RgbToBgr (512x512)... 2.833+/-0.4471 ms
[143/260] function_pool::RgbToBgr (1024x1024)... 16.78+/-1.126 ms
[144/260] function_pool::RgbToBgr (2048x2048)... 74.39+/-3.535 ms
[145/260] image_function_neon::RgbToBgr (256x256)... 0.5463+/-0.04665 ms
[146/260] image_function_neon::RgbToBgr (512x512)... 2.4+/-0.2369 ms
[147/260] image_function_neon::RgbToBgr (1024x1024)... 14.15+/-1.281 ms
[148/260] image_function_neon::RgbToBgr (2048x2048)... 64.75+/-1.331 ms
[149/260] image_function::Threshold (256x256)... 0.4687+/-0.02165 ms
[150/260] image_function::Threshold (512x512)... 1.929+/-0.09718 ms
[151/260] image_function::Threshold (1024x1024)... 8.089+/-0.2793 ms
[152/260] image_function::Threshold (2048x2048)... 37.08+/-1.105 ms
[153/260] function_pool::Threshold (256x256)... 0.3702+/-0.1546 ms
[154/260] function_pool::Threshold (512x512)... 0.7408+/-0.2816 ms
[155/260] function_pool::Threshold (1024x1024)... 4.195+/-1.164 ms
[156/260] function_pool::Threshold (2048x2048)... 21.71+/-3.187 ms
[157/260] image_function_neon::Threshold (256x256)... 0.09694+/-0.006778 ms
[158/260] image_function_neon::Threshold (512x512)... 0.4259+/-0.06024 ms
[159/260] image_function_neon::Threshold (1024x1024)... 2.069+/-0.6858 ms
[160/260] image_function_neon::Threshold (2048x2048)... 13.48+/-1.06 ms
[161/260] image_function::ThresholdDouble (256x256)... 0.4686+/-0.018 ms
[162/260] image_function::ThresholdDouble (512x512)... 1.934+/-0.0931 ms
[163/260] image_function::ThresholdDouble (1024x1024)... 8.091+/-0.3839 ms
[164/260] image_function::ThresholdDouble (2048x2048)... 37.12+/-0.9274 ms
[165/260] function_pool::ThresholdDouble (256x256)... 0.3805+/-0.1708 ms
[166/260] function_pool::ThresholdDouble (512x512)... 0.7337+/-0.2455 ms
[167/260] function_pool::ThresholdDouble (1024x1024)... 4.211+/-0.6908 ms
[168/260] function_pool::ThresholdDouble (2048x2048)... 21.77+/-1.834 ms
[169/260] image_function_neon::ThresholdDouble (256x256)... 0.1061+/-0.006657 ms
[170/260] image_function_neon::ThresholdDouble (512x512)... 0.4629+/-0.05991 ms
[171/260] image_function_neon::ThresholdDouble (1024x1024)... 2.19+/-0.2844 ms
[172/260] image_function_neon::ThresholdDouble (2048x2048)... 13.63+/-1.187 ms
[173/260] image_function::LookupTable (256x256)... 0.5199+/-0.02682 ms
[174/260] image_function::LookupTable (512x512)... 2.144+/-0.09391 ms
[175/260] image_function::LookupTable (1024x1024)... 8.89+/-0.478 ms
[176/260] image_function::LookupTable (2048x2048)... 38.91+/-0.6672 ms
[177/260] function_pool::LookupTable (256x256)... 0.4409+/-0.1507 ms
[178/260] function_pool::LookupTable (512x512)... 1.083+/-0.2534 ms
[179/260] function_pool::LookupTable (1024x1024)... 5.02+/-0.4968 ms
[180/260] function_pool::LookupTable (2048x2048)... 23.89+/-1.18 ms
[181/260] image_function::Accumulate (256x256)... 0.4064+/-0.02508 ms
[182/260] image_function::Accumulate (512x512)... 1.668+/-0.09847 ms
[183/260] image_function::Accumulate (1024x1024)... 7.04+/-0.3407 ms
[184/260] image_function::Accumulate (2048x2048)... 29.22+/-0.2608 ms
[185/260] image_function_neon::Accumulate (256x256)... 0.1131+/-0.002631 ms
[186/260] image_function_neon::Accumulate (512x512)... 0.4831+/-0.03134 ms
[187/260] image_function_neon::Accumulate (1024x1024)... 2.595+/-0.6939 ms
[188/260] image_function_neon::Accumulate (2048x2048)... 14.21+/-0.6195 ms
[189/260] image_function::ConvertToGrayScale (256x256)... 0.8186+/-0.03388 ms
[190/260] image_function::ConvertToGrayScale (512x512)... 3.321+/-0.1023 ms
[191/260] image_function::ConvertToGrayScale (1024x1024)... 13.59+/-0.3472 ms
[192/260] image_function::ConvertToGrayScale (2048x2048)... 55.49+/-0.4218 ms
[193/260] function_pool::ConvertToGrayScale (256x256)... 0.4016+/-0.1393 ms
[194/260] function_pool::ConvertToGrayScale (512x512)... 1.162+/-0.2029 ms
[195/260] function_pool::ConvertToGrayScale (1024x1024)... 4.679+/-0.4304 ms
[196/260] function_pool::ConvertToGrayScale (2048x2048)... 19.88+/-0.5445 ms
[197/260] image_function::ConvertToRgb (256x256)... 0.3967+/-0.01908 ms
[198/260] image_function::ConvertToRgb (512x512)... 1.656+/-0.09458 ms
[199/260] image_function::ConvertToRgb (1024x1024)... 7.26+/-0.2994 ms
[200/260] image_function::ConvertToRgb (2048x2048)... 30.85+/-0.9787 ms
[201/260] function_pool::ConvertToRgb (256x256)... 0.3118+/-0.1274 ms
[202/260] function_pool::ConvertToRgb (512x512)... 0.9781+/-0.3978 ms
[203/260] function_pool::ConvertToRgb (1024x1024)... 5.435+/-0.8357 ms
[204/260] function_pool::ConvertToRgb (2048x2048)... 23.36+/-2.089 ms
[205/260] image_function_neon::ConvertToRgb (256x256)... 0.1736+/-0.01261 ms
[206/260] image_function_neon::ConvertToRgb (512x512)... 0.7432+/-0.1374 ms
[207/260] image_function_neon::ConvertToRgb (1024x1024)... 3.825+/-0.5173 ms
[208/260] image_function_neon::ConvertToRgb (2048x2048)... 16.85+/-0.3992 ms
[209/260] image_function::Fill (256x256)... 0.007951+/-0.00105 ms
[210/260] image_function::Fill (512x512)... 0.03333+/-0.001273 ms
[211/260] image_function::Fill (1024x1024)... 0.1347+/-0.001907 ms
[212/260] image_function::Fill (2048x2048)... 0.5469+/-0.01802 ms
[213/260] image_function::Histogram (256x256)... 0.4613+/-0.01079 ms
[214/260] image_function::Histogram (512x512)... 1.85+/-0.06158 ms
[215/260] image_function::Histogram (1024x1024)... 7.438+/-0.1847 ms
[216/260] image_function::Histogram (2048x2048)... 29.88+/-0.5901 ms
[217/260] function_pool::Histogram (256x256)... 0.2735+/-0.09155 ms
[218/260] function_pool::Histogram (512x512)... 0.6152+/-0.1443 ms
[219/260] function_pool::Histogram (1024x1024)... 2.04+/-0.1945 ms
[220/260] function_pool::Histogram (2048x2048)... 7.736+/-0.3303 ms
[221/260] image_function::ProjectionProfile (256x256)... 0.268+/-0.001333 ms
[222/260] image_function::ProjectionProfile (512x512)... 1.068+/-0.03922 ms
[223/260] image_function::ProjectionProfile (1024x1024)... 4.279+/-0.1636 ms
[224/260] image_function::ProjectionProfile (2048x2048)... 17.21+/-0.4607 ms
[225/260] function_pool::ProjectionProfile (256x256)... 0.1439+/-0.08363 ms
[226/260] function_pool::ProjectionProfile (512x512)... 0.2179+/-0.04444 ms
[227/260] function_pool::ProjectionProfile (1024x1024)... 0.4726+/-0.1119 ms
[228/260] function_pool::ProjectionProfile (2048x2048)... 1.484+/-0.36 ms
[229/260] image_function_neon::ProjectionProfile (256x256)... 0.08139+/-0.0014 ms
[230/260] image_function_neon::ProjectionProfile (512x512)... 0.3136+/-0.02433 ms
[231/260] image_function_neon::ProjectionProfile (1024x1024)... 1.243+/-0.08512 ms
[232/260] image_function_neon::ProjectionProfile (2048x2048)... 4.997+/-0.1592 ms
[233/260] image_function::ResizeDown (256x256)... 0.1259+/-0.001517 ms
[234/260] image_function::ResizeDown (512x512)... 0.5145+/-0.03417 ms
[235/260] image_function::ResizeDown (1024x1024)... 2.158+/-0.1465 ms
[236/260] image_function::ResizeDown (2048x2048)... 8.934+/-0.4549 ms
[237/260] function_pool::ResizeDown (256x256)... 0.1714+/-0.08584 ms
[238/260] function_pool::ResizeDown (512x512)... 0.3107+/-0.1401 ms
[239/260] function_pool::ResizeDown (1024x1024)... 0.8567+/-0.2059 ms
[240/260] function_pool::ResizeDown (2048x2048)... 3.101+/-0.3542 ms
[241/260] image_function::ResizeUp (256x256)... 1.957+/-0.08039 ms
[242/260] image_function::ResizeUp (512x512)... 7.927+/-0.2471 ms
[243/260] image_function::ResizeUp (1024x1024)... 33.47+/-0.4568 ms
[244/260] image_function::ResizeUp (2048x2048)... 136+/-0.5939 ms
[245/260] function_pool::ResizeUp (256x256)... 0.7161+/-0.1772 ms
[246/260] function_pool::ResizeUp (512x512)... 2.763+/-0.3112 ms
[247/260] function_pool::ResizeUp (1024x1024)... 11.71+/-0.3116 ms
[248/260] function_pool::ResizeUp (2048x2048)... 46.16+/-0.6046 ms
[249/260] image_function::Sum (256x256)... 0.2556+/-0.07038 ms
[250/260] image_function::Sum (512x512)... 0.9867+/-0.3225 ms
[251/260] image_function::Sum (1024x1024)... 3.722+/-1.396 ms
[252/260] image_function::Sum (2048x2048)... 12.82+/-0.5736 ms
[253/260] function_pool::Sum (256x256)... 0.1151+/-0.08029 ms
[254/260] function_pool::Sum (512x512)... 0.1962+/-0.08735 ms
[255/260] function_pool::Sum (1024x1024)... 0.3645+/-0.07605 ms
[256/260] function_pool::Sum (2048x2048)... 1.119+/-0.1327 ms
[257/260] image_function_neon::Sum (256x256)... 0.06127+/-3.294e-05 ms
[258/260] image_function_neon::Sum (512x512)... 0.2441+/-0.0002788 ms
[259/260] image_function_neon::Sum (1024x1024)... 0.9773+/-0.009326 ms
[260/260] image_function_neon::Sum (2048x2048)... 3.916+/-0.06995 ms

It’s a long list, but pretty useful. Now I changed the compiler option to from -O2 to -O3 and got the following output:

Full Performance Test: O3
[1/260] blob_detection_SolidImage::_2048... 659.2+/-8.414 ms
[2/260] blob_detection_SolidImage::_1024... 122.3+/-1.137 ms
[3/260] blob_detection_SolidImage::_512... 28.94+/-0.9154 ms
[4/260] blob_detection_SolidImage::_256... 6.977+/-0.2504 ms
[5/260] edge_detection_SolidImage::_2048... 1021+/-18.73 ms
[6/260] edge_detection_SolidImage::_1024... 214.3+/-1.528 ms
[7/260] edge_detection_SolidImage::_512... 53.9+/-0.8937 ms
[8/260] edge_detection_SolidImage::_256... 13.58+/-0.1193 ms
[9/260] filtering_SobelFilter::_256... 1.419+/-0.02259 ms
[10/260] filtering_SobelFilter::_512... 5.776+/-0.1041 ms
[11/260] filtering_SobelFilter::_1024... 23.34+/-0.2983 ms
[12/260] filtering_SobelFilter::_2048... 96.36+/-0.3666 ms
[13/260] filtering_MedianFilter3x3::_256... 17.05+/-0.1499 ms
[14/260] filtering_MedianFilter3x3::_512... 68.82+/-0.1801 ms
[15/260] filtering_MedianFilter3x3::_1024... 277.2+/-0.9412 ms
[16/260] filtering_MedianFilter3x3::_2048... 1129+/-1.424 ms
[17/260] filtering_PrewittFilter::_256... 1.471+/-0.01698 ms
[18/260] filtering_PrewittFilter::_512... 5.945+/-0.08082 ms
[19/260] filtering_PrewittFilter::_1024... 23.98+/-0.2115 ms
[20/260] filtering_PrewittFilter::_2048... 99.34+/-0.6689 ms
[21/260] image_function::AbsoluteDifference (256x256)... 0.2573+/-0.01359 ms
[22/260] image_function::AbsoluteDifference (512x512)... 1.119+/-0.085 ms
[23/260] image_function::AbsoluteDifference (1024x1024)... 6.548+/-0.2059 ms
[24/260] image_function::AbsoluteDifference (2048x2048)... 35.21+/-0.2765 ms
[25/260] function_pool::AbsoluteDifference (256x256)... 0.8006+/-0.2557 ms
[26/260] function_pool::AbsoluteDifference (512x512)... 2.257+/-0.4178 ms
[27/260] function_pool::AbsoluteDifference (1024x1024)... 16.2+/-3.114 ms
[28/260] function_pool::AbsoluteDifference (2048x2048)... 86.39+/-4.395 ms
[29/260] image_function_neon::AbsoluteDifference (256x256)... 0.2564+/-0.01983 ms
[30/260] image_function_neon::AbsoluteDifference (512x512)... 1.106+/-0.08391 ms
[31/260] image_function_neon::AbsoluteDifference (1024x1024)... 6.455+/-0.2998 ms
[32/260] image_function_neon::AbsoluteDifference (2048x2048)... 35.07+/-0.3718 ms
[33/260] image_function_neon::BitwiseAnd (2048x2048)... 34.85+/-0.2811 ms
[34/260] image_function_neon::BitwiseAnd (1024x1024)... 6.363+/-0.1702 ms
[35/260] image_function_neon::BitwiseAnd (512x512)... 1.067+/-0.0952 ms
[36/260] image_function_neon::BitwiseAnd (256x256)... 0.2459+/-0.02243 ms
[37/260] function_pool::BitwiseAnd (2048x2048)... 86.64+/-5.798 ms
[38/260] function_pool::BitwiseAnd (1024x1024)... 16.05+/-2.712 ms
[39/260] function_pool::BitwiseAnd (512x512)... 2.207+/-0.3405 ms
[40/260] function_pool::BitwiseAnd (256x256)... 0.7602+/-0.2183 ms
[41/260] image_function::BitwiseAnd (2048x2048)... 34.75+/-0.2531 ms
[42/260] image_function::BitwiseAnd (1024x1024)... 6.451+/-0.1742 ms
[43/260] image_function::BitwiseAnd (512x512)... 1.075+/-0.07546 ms
[44/260] image_function::BitwiseAnd (256x256)... 0.2475+/-0.0199 ms
[45/260] image_function_neon::BitwiseOr (2048x2048)... 34.72+/-0.2911 ms
[46/260] image_function_neon::BitwiseOr (1024x1024)... 6.423+/-0.1714 ms
[47/260] image_function_neon::BitwiseOr (512x512)... 1.069+/-0.1073 ms
[48/260] image_function_neon::BitwiseOr (256x256)... 0.2428+/-0.02553 ms
[49/260] function_pool::BitwiseOr (2048x2048)... 89.96+/-4.571 ms
[50/260] function_pool::BitwiseOr (1024x1024)... 15.33+/-0.7198 ms
[51/260] function_pool::BitwiseOr (512x512)... 2.174+/-0.3996 ms
[52/260] function_pool::BitwiseOr (256x256)... 0.7846+/-0.2347 ms
[53/260] image_function::BitwiseOr (2048x2048)... 34.33+/-0.4701 ms
[54/260] image_function::BitwiseOr (1024x1024)... 6.45+/-0.1625 ms
[55/260] image_function::BitwiseOr (512x512)... 1.08+/-0.1068 ms
[56/260] image_function::BitwiseOr (256x256)... 0.2484+/-0.02308 ms
[57/260] image_function_neon::BitwiseXor (2048x2048)... 34.31+/-0.4114 ms
[58/260] image_function_neon::BitwiseXor (1024x1024)... 6.437+/-0.3281 ms
[59/260] image_function_neon::BitwiseXor (512x512)... 1.064+/-0.09985 ms
[60/260] image_function_neon::BitwiseXor (256x256)... 0.2423+/-0.02624 ms
[61/260] function_pool::BitwiseXor (2048x2048)... 94.74+/-1.477 ms
[62/260] function_pool::BitwiseXor (1024x1024)... 15.59+/-1.671 ms
[63/260] function_pool::BitwiseXor (512x512)... 2.317+/-0.7291 ms
[64/260] function_pool::BitwiseXor (256x256)... 0.776+/-0.2322 ms
[65/260] image_function::BitwiseXor (2048x2048)... 34.88+/-0.8486 ms
[66/260] image_function::BitwiseXor (1024x1024)... 6.428+/-0.1557 ms
[67/260] image_function::BitwiseXor (512x512)... 1.071+/-0.09344 ms
[68/260] image_function::BitwiseXor (256x256)... 0.2472+/-0.01476 ms
[69/260] image_function_neon::Maximum (2048x2048)... 34.77+/-0.3337 ms
[70/260] image_function_neon::Maximum (1024x1024)... 6.417+/-0.1529 ms
[71/260] image_function_neon::Maximum (512x512)... 1.073+/-0.09486 ms
[72/260] image_function_neon::Maximum (256x256)... 0.2484+/-0.02032 ms
[73/260] function_pool::Maximum (2048x2048)... 84.63+/-2.365 ms
[74/260] function_pool::Maximum (1024x1024)... 16.15+/-1.066 ms
[75/260] function_pool::Maximum (512x512)... 2.424+/-0.6867 ms
[76/260] function_pool::Maximum (256x256)... 0.7402+/-0.2282 ms
[77/260] image_function::Maximum (2048x2048)... 34.92+/-0.4263 ms
[78/260] image_function::Maximum (1024x1024)... 6.507+/-0.186 ms
[79/260] image_function::Maximum (512x512)... 1.091+/-0.1074 ms
[80/260] image_function::Maximum (256x256)... 0.2546+/-0.02404 ms
[81/260] image_function_neon::Minimum (2048x2048)... 34.83+/-0.3438 ms
[82/260] image_function_neon::Minimum (1024x1024)... 6.485+/-0.2056 ms
[83/260] image_function_neon::Minimum (512x512)... 1.079+/-0.09349 ms
[84/260] image_function_neon::Minimum (256x256)... 0.2524+/-0.02601 ms
[85/260] function_pool::Minimum (2048x2048)... 84.68+/-2.176 ms
[86/260] function_pool::Minimum (1024x1024)... 16.46+/-2.329 ms
[87/260] function_pool::Minimum (512x512)... 2.193+/-0.4238 ms
[88/260] function_pool::Minimum (256x256)... 0.7748+/-0.2149 ms
[89/260] image_function::Minimum (2048x2048)... 34.94+/-0.3222 ms
[90/260] image_function::Minimum (1024x1024)... 6.471+/-0.1734 ms
[91/260] image_function::Minimum (512x512)... 1.1+/-0.09436 ms
[92/260] image_function::Minimum (256x256)... 0.257+/-0.02698 ms
[93/260] image_function_neon::Subtract (2048x2048)... 35.05+/-0.3241 ms
[94/260] image_function_neon::Subtract (1024x1024)... 6.693+/-0.6711 ms
[95/260] image_function_neon::Subtract (512x512)... 1.119+/-0.0918 ms
[96/260] image_function_neon::Subtract (256x256)... 0.262+/-0.02518 ms
[97/260] function_pool::Subtract (2048x2048)... 86.39+/-3.799 ms
[98/260] function_pool::Subtract (1024x1024)... 16.06+/-1.957 ms
[99/260] function_pool::Subtract (512x512)... 2.257+/-0.4887 ms
[100/260] function_pool::Subtract (256x256)... 0.7776+/-0.2513 ms
[101/260] image_function::Subtract (2048x2048)... 34.7+/-0.2787 ms
[102/260] image_function::Subtract (1024x1024)... 6.489+/-0.1655 ms
[103/260] image_function::Subtract (512x512)... 1.096+/-0.0755 ms
[104/260] image_function::Subtract (256x256)... 0.262+/-0.02406 ms
[105/260] image_function::Flip (256x256)... 0.06764+/-0.004992 ms
[106/260] image_function::Flip (512x512)... 0.2752+/-0.0363 ms
[107/260] image_function::Flip (1024x1024)... 1.175+/-0.08554 ms
[108/260] image_function::Flip (2048x2048)... 8.104+/-0.2728 ms
[109/260] image_function_neon::Flip (256x256)... 0.08263+/-0.004746 ms
[110/260] image_function_neon::Flip (512x512)... 0.3398+/-0.02819 ms
[111/260] image_function_neon::Flip (1024x1024)... 1.428+/-0.07633 ms
[112/260] image_function_neon::Flip (2048x2048)... 8.281+/-0.2228 ms
[113/260] image_function::GammaCorrection (256x256)... 0.6453+/-0.02275 ms
[114/260] image_function::GammaCorrection (512x512)... 2.331+/-0.08623 ms
[115/260] image_function::GammaCorrection (1024x1024)... 9.294+/-0.3946 ms
[116/260] image_function::GammaCorrection (2048x2048)... 41.54+/-0.5124 ms
[117/260] function_pool::GammaCorrection (256x256)... 0.5872+/-0.1739 ms
[118/260] function_pool::GammaCorrection (512x512)... 1.297+/-0.2463 ms
[119/260] function_pool::GammaCorrection (1024x1024)... 6.926+/-0.4417 ms
[120/260] function_pool::GammaCorrection (2048x2048)... 35.29+/-1.289 ms
[121/260] image_function::Invert (256x256)... 0.07971+/-0.007002 ms
[122/260] image_function::Invert (512x512)... 0.3575+/-0.05601 ms
[123/260] image_function::Invert (1024x1024)... 1.58+/-0.1001 ms
[124/260] image_function::Invert (2048x2048)... 11.03+/-0.1368 ms
[125/260] function_pool::Invert (256x256)... 0.326+/-0.1583 ms
[126/260] function_pool::Invert (512x512)... 0.6067+/-0.2185 ms
[127/260] function_pool::Invert (1024x1024)... 4.335+/-0.4591 ms
[128/260] function_pool::Invert (2048x2048)... 22.81+/-0.6107 ms
[129/260] image_function_neon::Invert (256x256)... 0.07806+/-0.008335 ms
[130/260] image_function_neon::Invert (512x512)... 0.3462+/-0.05722 ms
[131/260] image_function_neon::Invert (1024x1024)... 1.552+/-0.1238 ms
[132/260] image_function_neon::Invert (2048x2048)... 11.02+/-0.1689 ms
[133/260] image_function::Transpose (2048x2048)... 178.7+/-4.889 ms
[134/260] image_function::Transpose (1024x1024)... 25.01+/-0.54 ms
[135/260] image_function::Transpose (512x512)... 3.237+/-0.07739 ms
[136/260] image_function::Transpose (256x256)... 0.6578+/-0.01062 ms
[137/260] image_function::RgbToBgr (256x256)... 0.3357+/-0.03925 ms
[138/260] image_function::RgbToBgr (512x512)... 1.468+/-0.11 ms
[139/260] image_function::RgbToBgr (1024x1024)... 9.47+/-0.2005 ms
[140/260] image_function::RgbToBgr (2048x2048)... 43.82+/-0.397 ms
[141/260] function_pool::RgbToBgr (256x256)... 0.6314+/-0.2791 ms
[142/260] function_pool::RgbToBgr (512x512)... 3.208+/-0.3878 ms
[143/260] function_pool::RgbToBgr (1024x1024)... 21.68+/-1.054 ms
[144/260] function_pool::RgbToBgr (2048x2048)... 93.05+/-5.086 ms
[145/260] image_function_neon::RgbToBgr (256x256)... 0.5476+/-0.04025 ms
[146/260] image_function_neon::RgbToBgr (512x512)... 2.343+/-0.1327 ms
[147/260] image_function_neon::RgbToBgr (1024x1024)... 13.93+/-0.1758 ms
[148/260] image_function_neon::RgbToBgr (2048x2048)... 64.03+/-0.3407 ms
[149/260] image_function::Threshold (256x256)... 0.1175+/-0.006856 ms
[150/260] image_function::Threshold (512x512)... 0.4933+/-0.05341 ms
[151/260] image_function::Threshold (1024x1024)... 2.125+/-0.124 ms
[152/260] image_function::Threshold (2048x2048)... 13.77+/-0.6267 ms
[153/260] function_pool::Threshold (256x256)... 0.3876+/-0.1738 ms
[154/260] function_pool::Threshold (512x512)... 0.6913+/-0.2765 ms
[155/260] function_pool::Threshold (1024x1024)... 5.167+/-0.3892 ms
[156/260] function_pool::Threshold (2048x2048)... 28.9+/-1.034 ms
[157/260] image_function_neon::Threshold (256x256)... 0.1007+/-0.008197 ms
[158/260] image_function_neon::Threshold (512x512)... 0.4291+/-0.05646 ms
[159/260] image_function_neon::Threshold (1024x1024)... 1.917+/-0.1269 ms
[160/260] image_function_neon::Threshold (2048x2048)... 13.57+/-0.1885 ms
[161/260] image_function::ThresholdDouble (256x256)... 0.1273+/-0.008935 ms
[162/260] image_function::ThresholdDouble (512x512)... 0.5273+/-0.05674 ms
[163/260] image_function::ThresholdDouble (1024x1024)... 2.279+/-0.1325 ms
[164/260] image_function::ThresholdDouble (2048x2048)... 14.21+/-0.7554 ms
[165/260] function_pool::ThresholdDouble (256x256)... 0.388+/-0.1945 ms
[166/260] function_pool::ThresholdDouble (512x512)... 0.7134+/-0.2996 ms
[167/260] function_pool::ThresholdDouble (1024x1024)... 5.171+/-0.4106 ms
[168/260] function_pool::ThresholdDouble (2048x2048)... 28.94+/-0.8835 ms
[169/260] image_function_neon::ThresholdDouble (256x256)... 0.109+/-0.007184 ms
[170/260] image_function_neon::ThresholdDouble (512x512)... 0.4618+/-0.05223 ms
[171/260] image_function_neon::ThresholdDouble (1024x1024)... 2.031+/-0.1229 ms
[172/260] image_function_neon::ThresholdDouble (2048x2048)... 13.6+/-0.1625 ms
[173/260] image_function::LookupTable (256x256)... 0.5225+/-0.01566 ms
[174/260] image_function::LookupTable (512x512)... 2.121+/-0.0743 ms
[175/260] image_function::LookupTable (1024x1024)... 8.711+/-0.2462 ms
[176/260] image_function::LookupTable (2048x2048)... 38.86+/-0.6056 ms
[177/260] function_pool::LookupTable (256x256)... 0.4449+/-0.1668 ms
[178/260] function_pool::LookupTable (512x512)... 1.064+/-0.2321 ms
[179/260] function_pool::LookupTable (1024x1024)... 5.793+/-0.4065 ms
[180/260] function_pool::LookupTable (2048x2048)... 29.01+/-0.655 ms
[181/260] image_function::Accumulate (256x256)... 0.1164+/-0.003436 ms
[182/260] image_function::Accumulate (512x512)... 0.4904+/-0.0186 ms
[183/260] image_function::Accumulate (1024x1024)... 2.459+/-0.1153 ms
[184/260] image_function::Accumulate (2048x2048)... 14.25+/-0.1516 ms
[185/260] image_function_neon::Accumulate (256x256)... 0.1134+/-0.003251 ms
[186/260] image_function_neon::Accumulate (512x512)... 0.4761+/-0.006319 ms
[187/260] image_function_neon::Accumulate (1024x1024)... 2.417+/-0.09717 ms
[188/260] image_function_neon::Accumulate (2048x2048)... 14.21+/-0.1533 ms
[189/260] image_function::ConvertToGrayScale (256x256)... 0.2096+/-0.002201 ms
[190/260] image_function::ConvertToGrayScale (512x512)... 0.8587+/-0.03032 ms
[191/260] image_function::ConvertToGrayScale (1024x1024)... 3.521+/-0.1312 ms
[192/260] image_function::ConvertToGrayScale (2048x2048)... 15.6+/-2.737 ms
[193/260] function_pool::ConvertToGrayScale (256x256)... 0.2699+/-0.1129 ms
[194/260] function_pool::ConvertToGrayScale (512x512)... 0.5503+/-0.1665 ms
[195/260] function_pool::ConvertToGrayScale (1024x1024)... 2.045+/-0.3488 ms
[196/260] function_pool::ConvertToGrayScale (2048x2048)... 11.39+/-1.21 ms
[197/260] image_function::ConvertToRgb (256x256)... 0.3945+/-0.005277 ms
[198/260] image_function::ConvertToRgb (512x512)... 1.633+/-0.04227 ms
[199/260] image_function::ConvertToRgb (1024x1024)... 7.182+/-0.1851 ms
[200/260] image_function::ConvertToRgb (2048x2048)... 30.83+/-0.7008 ms
[201/260] function_pool::ConvertToRgb (256x256)... 0.3142+/-0.1605 ms
[202/260] function_pool::ConvertToRgb (512x512)... 1.016+/-0.3147 ms
[203/260] function_pool::ConvertToRgb (1024x1024)... 6.363+/-0.6309 ms
[204/260] function_pool::ConvertToRgb (2048x2048)... 26.53+/-1.991 ms
[205/260] image_function_neon::ConvertToRgb (256x256)... 0.1708+/-0.0105 ms
[206/260] image_function_neon::ConvertToRgb (512x512)... 0.7194+/-0.04992 ms
[207/260] image_function_neon::ConvertToRgb (1024x1024)... 3.759+/-0.1304 ms
[208/260] image_function_neon::ConvertToRgb (2048x2048)... 16.95+/-0.1775 ms
[209/260] image_function::Fill (256x256)... 0.007928+/-0.001133 ms
[210/260] image_function::Fill (512x512)... 0.03338+/-0.001469 ms
[211/260] image_function::Fill (1024x1024)... 0.135+/-0.002448 ms
[212/260] image_function::Fill (2048x2048)... 0.5476+/-0.01086 ms
[213/260] image_function::Histogram (256x256)... 0.4609+/-0.006528 ms
[214/260] image_function::Histogram (512x512)... 1.844+/-0.02853 ms
[215/260] image_function::Histogram (1024x1024)... 7.399+/-0.1199 ms
[216/260] image_function::Histogram (2048x2048)... 29.65+/-0.3645 ms
[217/260] function_pool::Histogram (256x256)... 0.2676+/-0.1251 ms
[218/260] function_pool::Histogram (512x512)... 0.6122+/-0.1219 ms
[219/260] function_pool::Histogram (1024x1024)... 2.025+/-0.1826 ms
[220/260] function_pool::Histogram (2048x2048)... 7.711+/-0.2735 ms
[221/260] image_function::ProjectionProfile (256x256)... 0.268+/-0.001927 ms
[222/260] image_function::ProjectionProfile (512x512)... 1.064+/-0.02345 ms
[223/260] image_function::ProjectionProfile (1024x1024)... 4.264+/-0.1138 ms
[224/260] image_function::ProjectionProfile (2048x2048)... 17.05+/-0.2892 ms
[225/260] function_pool::ProjectionProfile (256x256)... 0.1447+/-0.09231 ms
[226/260] function_pool::ProjectionProfile (512x512)... 0.221+/-0.07707 ms
[227/260] function_pool::ProjectionProfile (1024x1024)... 0.4685+/-0.1112 ms
[228/260] function_pool::ProjectionProfile (2048x2048)... 1.449+/-0.1808 ms
[229/260] image_function_neon::ProjectionProfile (256x256)... 0.08125+/-0.001629 ms
[230/260] image_function_neon::ProjectionProfile (512x512)... 0.3116+/-0.01195 ms
[231/260] image_function_neon::ProjectionProfile (1024x1024)... 1.231+/-0.05248 ms
[232/260] image_function_neon::ProjectionProfile (2048x2048)... 4.921+/-0.1217 ms
[233/260] image_function::ResizeDown (256x256)... 0.1258+/-0.00173 ms
[234/260] image_function::ResizeDown (512x512)... 0.5145+/-0.01864 ms
[235/260] image_function::ResizeDown (1024x1024)... 2.084+/-0.04247 ms
[236/260] image_function::ResizeDown (2048x2048)... 8.713+/-0.1194 ms
[237/260] function_pool::ResizeDown (256x256)... 0.1762+/-0.1059 ms
[238/260] function_pool::ResizeDown (512x512)... 0.3267+/-0.142 ms
[239/260] function_pool::ResizeDown (1024x1024)... 0.8051+/-0.1826 ms
[240/260] function_pool::ResizeDown (2048x2048)... 3.077+/-0.3038 ms
[241/260] image_function::ResizeUp (256x256)... 1.946+/-0.02992 ms
[242/260] image_function::ResizeUp (512x512)... 7.823+/-0.108 ms
[243/260] image_function::ResizeUp (1024x1024)... 33.14+/-0.2219 ms
[244/260] image_function::ResizeUp (2048x2048)... 135.8+/-0.372 ms
[245/260] function_pool::ResizeUp (256x256)... 0.7444+/-0.2005 ms
[246/260] function_pool::ResizeUp (512x512)... 2.861+/-0.2893 ms
[247/260] function_pool::ResizeUp (1024x1024)... 12.72+/-0.3165 ms
[248/260] function_pool::ResizeUp (2048x2048)... 51.94+/-0.5059 ms
[249/260] image_function::Sum (256x256)... 0.06704+/-0.001588 ms
[250/260] image_function::Sum (512x512)... 0.2664+/-0.004164 ms
[251/260] image_function::Sum (1024x1024)... 1.074+/-0.02653 ms
[252/260] image_function::Sum (2048x2048)... 4.358+/-0.1281 ms
[253/260] function_pool::Sum (256x256)... 0.1299+/-0.1002 ms
[254/260] function_pool::Sum (512x512)... 0.2085+/-0.08905 ms
[255/260] function_pool::Sum (1024x1024)... 0.3873+/-0.09999 ms
[256/260] function_pool::Sum (2048x2048)... 1.125+/-0.1451 ms
[257/260] image_function_neon::Sum (256x256)... 0.06125+/-2.437e-05 ms
[258/260] image_function_neon::Sum (512x512)... 0.2441+/-0.0001093 ms
[259/260] image_function_neon::Sum (1024x1024)... 0.9777+/-0.01123 ms
[260/260] image_function_neon::Sum (2048x2048)... 3.917+/-0.07282 ms

Reading the outputs side by side, performance changes can be seen with the option change. However, the original function of interest, blob_detection, does not show much improvement with the -O3 setting (644.1+/-4.882 ms to 659.2+/-8.414 ms on 2048 x 2048 images), not a good sign for the original plan. However, scrolling down the list, other functions seemed to improve a lot with the change, the first one noted being the AbsoluteDifference function. So I modify the test to only test the AbsoluteDifference function across the multiple implementations.

Absolute Difference: O2
[1/12] image_function::AbsoluteDifference (256x256)... 0.7432+/-0.02591 ms
[2/12] image_function::AbsoluteDifference (512x512)... 3.043+/-0.09214 ms
[3/12] image_function::AbsoluteDifference (1024x1024)... 14.29+/-0.1658 ms
[4/12] image_function::AbsoluteDifference (2048x2048)... 66.46+/-0.3282 ms
[5/12] function_pool::AbsoluteDifference (256x256)... 0.7936+/-0.282 ms
[6/12] function_pool::AbsoluteDifference (512x512)... 2.288+/-0.4593 ms
[7/12] function_pool::AbsoluteDifference (1024x1024)... 14.9+/-0.8164 ms
[8/12] function_pool::AbsoluteDifference (2048x2048)... 131.5+/-7.993 ms
[9/12] image_function_neon::AbsoluteDifference (256x256)... 0.2566+/-0.0231 ms
[10/12] image_function_neon::AbsoluteDifference (512x512)... 1.083+/-0.0618 ms
[11/12] image_function_neon::AbsoluteDifference (1024x1024)... 6.341+/-0.147 ms
[12/12] image_function_neon::AbsoluteDifference (2048x2048)... 79.96+/-2.067 ms
Absolute Difference: O3
[1/12] image_function::AbsoluteDifference (256x256)... 0.2604+/-0.02014 ms
[2/12] image_function::AbsoluteDifference (512x512)... 1.106+/-0.08591 ms
[3/12] image_function::AbsoluteDifference (1024x1024)... 6.369+/-0.147 ms
[4/12] image_function::AbsoluteDifference (2048x2048)... 34.7+/-0.2659 ms
[5/12] function_pool::AbsoluteDifference (256x256)... 0.7684+/-0.2098 ms
[6/12] function_pool::AbsoluteDifference (512x512)... 2.2+/-0.3969 ms
[7/12] function_pool::AbsoluteDifference (1024x1024)... 14.57+/-0.6044 ms
[8/12] function_pool::AbsoluteDifference (2048x2048)... 96.27+/-7.903 ms
[9/12] image_function_neon::AbsoluteDifference (256x256)... 0.2551+/-0.02185 ms
[10/12] image_function_neon::AbsoluteDifference (512x512)... 1.084+/-0.08416 ms
[11/12] image_function_neon::AbsoluteDifference (1024x1024)... 6.311+/-0.1501 ms
[12/12] image_function_neon::AbsoluteDifference (2048x2048)... 49.26+/-0.3417 ms

The difference in performance is significant for most situations. image_function::AbsoluteDifference performs better all around under O3 and the function_pool implementation and image_function_neon only improve with a 2048×2048 image. It should also be noted that the image_function and image_function_neon function have similar performances with 256×256, 512×512, and 1024×1024 images under -O3, which may be a hint to their implementation. Either way, it looks like image_function::AbsoluteDifference benefits from -O3 optimization setting. The following table shows the benchmarks for all the image_function functions:

Image Function Results: O2 vs O3
image_functionO2O3
AbsoluteDifference (256×256)0.7547+/-0.02822 ms0.2628+/-0.02898 ms
AbsoluteDifference (512×512)3.073+/-0.09532 ms1.105+/-0.09343 ms
AbsoluteDifference (1024×1024)14.36+/-0.2335 ms6.447+/-0.1902 ms
AbsoluteDifference (2048×2048)66.74+/-0.4498 ms34.91+/-0.3634 ms
BitwiseAnd (2048×2048)58.39+/-0.2769 ms34.35+/-0.287 ms
BitwiseAnd (1024×1024)12.29+/-0.2585 ms6.3+/-0.1823 ms
BitwiseAnd (512×512)2.555+/-0.09601 ms1.05+/-0.101 ms
BitwiseAnd (256×256)0.6288+/-0.02895 ms0.248+/-0.02753 ms
BitwiseOr (2048×2048)58.41+/-0.4194 ms34.35+/-0.2473 ms
BitwiseOr (1024×1024)12.27+/-0.1418 ms6.316+/-0.2002 ms
BitwiseOr (512×512)2.558+/-0.1022 ms1.05+/-0.08623 ms
BitwiseOr (256×256)0.6287+/-0.02762 ms0.248+/-0.02866 ms
BitwiseXor (2048×2048)58.41+/-0.3581 ms34.57+/-0.9866 ms
BitwiseXor (1024×1024)12.28+/-0.1501 ms6.322+/-0.231 ms
BitwiseXor (512×512)2.554+/-0.08971 ms1.052+/-0.09256 ms
BitwiseXor (256×256)0.6294+/-0.03089 ms0.2478+/-0.0269 ms
Maximum (2048×2048)62.53+/-0.3113 ms34.43+/-0.3139 ms
Maximum (1024×1024)13.3+/-0.167 ms6.316+/-0.2094 ms
Maximum (512×512)2.808+/-0.08771 ms1.063+/-0.09892 ms
Maximum (256×256)0.6894+/-0.02952 ms0.2508+/-0.02655 ms
Minimum (2048×2048)62.69+/-0.8282 ms34.42+/-0.2789 ms
Minimum (1024×1024)13.3+/-0.1638 ms6.328+/-0.2024 ms
Minimum (512×512)2.811+/-0.09551 ms1.061+/-0.1047 ms
Minimum (256×256)0.6898+/-0.02663 ms0.2518+/-0.02722 ms
Subtract (2048×2048)66.7+/-0.2856 ms34.53+/-0.236 ms
Subtract (1024×1024)14.45+/-0.1648 ms6.425+/-0.1894 ms
Subtract (512×512)3.071+/-0.1031 ms1.077+/-0.08148 ms
Subtract (256×256)0.7564+/-0.02617 ms0.2555+/-0.02402 ms
Flip (256×256)0.2403+/-0.003232 ms0.0693+/-0.004127 ms
Flip (512×512)0.9657+/-0.02498 ms0.2759+/-0.03079 ms
Flip (1024×1024)3.946+/-0.08853 ms1.149+/-0.09605 ms
Flip (2048×2048)18.32+/-0.2539 ms8.061+/-0.2251 ms
GammaCorrection (256×256)0.6516+/-0.01782 ms0.6492+/-0.01845 ms
GammaCorrection (512×512)2.311+/-0.05578 ms2.318+/-0.07228 ms
GammaCorrection (1024×1024)9.101+/-0.1138 ms9.12+/-0.1745 ms
GammaCorrection (2048×2048)41.47+/-0.2421 ms41.45+/-0.2185 ms
Invert (256×256)0.3891+/-0.01155 ms0.08033+/-0.005569 ms
Invert (512×512)1.581+/-0.05507 ms0.3509+/-0.04334 ms
Invert (1024×1024)6.492+/-0.1138 ms1.546+/-0.08287 ms
Invert (2048×2048)30.17+/-0.3121 ms10.97+/-0.1356 ms
Transpose (2048×2048)232.1+/-8.996 ms192+/-0.7423 ms
Transpose (1024×1024)32.15+/-0.6566 ms25.94+/-0.4277 ms
Transpose (512×512)4.439+/-0.09429 ms3.604+/-0.0901 ms
Transpose (256×256)0.5746+/-0.01322 ms0.6557+/-0.009664 ms
RgbToBgr (256×256)0.7087+/-0.03707 ms0.3282+/-0.02997 ms
RgbToBgr (512×512)2.976+/-0.1064 ms1.439+/-0.08078 ms
RgbToBgr (1024×1024)15.71+/-0.3244 ms9.438+/-0.2287 ms
RgbToBgr (2048×2048)114.9+/-0.5135 ms89.28+/-1.065 ms
Threshold (256×256)0.4726+/-0.0115 ms0.1158+/-0.006655 ms
Threshold (512×512)1.918+/-0.06233 ms0.4895+/-0.04314 ms
Threshold (1024×1024)7.885+/-0.1455 ms2.135+/-0.1368 ms
Threshold (2048×2048)36.97+/-0.2089 ms13.68+/-0.1948 ms
ThresholdDouble (256×256)0.4728+/-0.01445 ms0.1245+/-0.006697 ms
ThresholdDouble (512×512)1.918+/-0.05377 ms0.5228+/-0.04818 ms
ThresholdDouble (1024×1024)7.886+/-0.1346 ms2.258+/-0.1376 ms
ThresholdDouble (2048×2048)36.96+/-0.2169 ms14.22+/-1.116 ms
LookupTable (256×256)0.522+/-0.01216 ms0.5192+/-0.01289 ms
LookupTable (512×512)2.116+/-0.06579 ms2.117+/-0.0619 ms
LookupTable (1024×1024)8.63+/-0.1194 ms8.627+/-0.1335 ms
LookupTable (2048×2048)38.76+/-0.2569 ms38.73+/-0.23 ms
Accumulate (256×256)0.4052+/-0.002275 ms0.1165+/-0.003581 ms
Accumulate (512×512)1.647+/-0.03184 ms0.493+/-0.02187 ms
Accumulate (1024×1024)6.893+/-0.1776 ms2.464+/-0.111 ms
Accumulate (2048×2048)29.15+/-0.205 ms14.46+/-0.1871 ms
ConvertToGrayScale (256×256)0.8163+/-0.01895 ms0.2101+/-0.003119 ms
ConvertToGrayScale (512×512)3.291+/-0.06637 ms0.8535+/-0.024 ms
ConvertToGrayScale (1024×1024)13.39+/-0.1378 ms3.522+/-0.1276 ms
ConvertToGrayScale (2048×2048)55.5+/-0.2327 ms15.29+/-0.2111 ms
ConvertToRgb (256×256)0.3944+/-0.01489 ms0.3942+/-0.002858 ms
ConvertToRgb (512×512)1.628+/-0.03552 ms1.629+/-0.03845 ms
ConvertToRgb (1024×1024)7.148+/-0.1837 ms7.151+/-0.2058 ms
ConvertToRgb (2048×2048)30.82+/-0.7943 ms30.72+/-0.8727 ms
Fill (256×256)0.008156+/-0.001075 ms0.008331+/-0.001006 ms
Fill (512×512)0.03429+/-0.001251 ms0.03431+/-0.0009215 ms
Fill (1024×1024)0.1393+/-0.001923 ms0.1389+/-0.001516 ms
Fill (2048×2048)0.5644+/-0.01286 ms0.561+/-0.01111 ms
Histogram (256×256)0.4611+/-0.006191 ms0.46+/-0.001054 ms
Histogram (512×512)1.844+/-0.02686 ms1.843+/-0.02369 ms
Histogram (1024×1024)7.397+/-0.1191 ms7.395+/-0.1231 ms
Histogram (2048×2048)29.66+/-0.3616 ms29.64+/-0.3412 ms
ProjectionProfile (256×256)0.2682+/-0.001328 ms0.2677+/-0.001331 ms
ProjectionProfile (512×512)1.065+/-0.02151 ms1.064+/-0.01968 ms
ProjectionProfile (1024×1024)4.258+/-0.1016 ms4.258+/-0.1087 ms
ProjectionProfile (2048×2048)17.03+/-0.2428 ms17.06+/-0.3503 ms
ResizeDown (256×256)0.126+/-0.001329 ms0.1257+/-0.001296 ms
ResizeDown (512×512)0.5105+/-0.01992 ms0.5115+/-0.01934 ms
ResizeDown (1024×1024)2.073+/-0.06725 ms2.071+/-0.06386 ms
ResizeDown (2048×2048)8.708+/-0.1344 ms8.705+/-0.1343 ms
ResizeUp (256×256)1.946+/-0.03315 ms1.945+/-0.02984 ms
ResizeUp (512×512)7.807+/-0.08913 ms7.806+/-0.09767 ms
ResizeUp (1024×1024)32.97+/-0.214 ms33.02+/-0.1861 ms
ResizeUp (2048×2048)135.7+/-0.2605 ms135.9+/-0.337 ms
Sum (256×256)0.2561+/-0.05608 ms0.06664+/-0.001315 ms
Sum (512×512)1.017+/-0.2418 ms0.2662+/-0.002786 ms
Sum (1024×1024)3.972+/-1.098 ms1.074+/-0.03069 ms
Sum (2048×2048)13.96+/-2.55 ms4.362+/-0.1423 ms

Looks like O3 just straight up benefits the image_function module. Next time I will look at its implementation to see why this might be the case and if any documentation has been made of this finding.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create your website at WordPress.com
Get started
%d bloggers like this: