Featured

Visual Studio Code: Everyone’s (Biased) Favourite Code Editor

About Visual Studio Code

Visual Studio Code (VS Code) is a source-code editor developed by Microsoft for Windows, Linux and macOS. It includes support for debugging, embedded Git control and GitHub, syntax highlighting, intelligent code completion, snippets, and code refactoring. It is highly customizable, allowing users to change the theme, keyboard shortcuts, preferences, and install extensions that add additional functionality. The source code is free and open source and released under the permissive MIT License.

My Interest in This Software

I chose to contribute to VS Code because it is my editor of choice. I want to have a better experience with my editor and I can think of no better way than to contribute to the open-source software itself. By contributing, I hope to also gain a better understanding of VS Code and maybe learn more about my favourite code editor.

Tools and Libraries

  • Python 2.7 or higher (Not Python 3)
    C/C++ compiler tool chain
  • Node.js (> 10.16.0, < v11.0.0)
  • tsLint
  • Git
  • Yarn

More Information

PenguinV: Wrapping Up

So I tried optimizing the median function one last time by playing with the 4 level loop. From what I understood in class, reading memory along its row and iterating between rows is more efficient than reading along columns and iterating through columns. So just to be sure I slipped the third inner loop with the fourth inner loop to see how that improves performance. Well it didn’t really change performance; the test still showed numbers within the variance of the base case. To be honest, I’m pretty sure it was already reading by the row anyways. With no other ideas, I have decided to not push my changes upstream.

Although my results indicated that changing the compiler optimization option to O3 would improve the overall performance of the functions, I would only be improving the times for penguinV’s performance tests. As stated before, the libraries are not meant to be precompiled and used but included as a header file into other’s projects. This means that the compiler optimizations chosen are dependent on the recipient’s project settings. The most that changing the performance test to O3 can do is show that performance is better at O3. However, the absolute decrease in runtime is in the milliseconds and these functions are not meant to be called continuously. It is also safer for developers to see the performance at O2, which is a more stable optimization setting that their own program probably uses. Therefore, since my changes do not touch penguinV’s library primary functionalities and O3 is a less common use case than O2, I will not be making a PR with the changes.

PenguinV: Trying to Optimize Median Filtering

Continuing from my last post, after thinking about the change from -O2 to -O3, I realized just changing the options on the performance test seems dumb. The library is mean’t to be included into other peoples projects as source files and invoked from their own software. Those people can just set their options to -O3 when compiling their software; there’s no need for them to follow the performance tests’ parameters. I will still create a pull request for the change as I have run the performance tests on an x86_64 machine and confirmed performance does not drop on that architecture. However, I want to try to try to improve the performance of the only function that has a significant drop in performance on a aarch64 system with -O3: median filtering.

For those who are unfamiliar with median filtering (such as myself), here’s the wikipedia page for the technique. The following is the source code for the function:

void Median( const Image & in, uint32_t startXIn, uint32_t startYIn, Image & out, uint32_t startXOut, uint32_t startYOut,
                 uint32_t width, uint32_t height, uint32_t kernelSize )
    {
        ParameterValidation( in, startXIn, startYIn, out, startXOut, startYOut, width, height );
        VerifyGrayScaleImage( in, out );

        if( kernelSize < 3 || kernelSize % 2 == 0 || kernelSize >= width || kernelSize >= height )
            throw imageException( "Kernel size for filter is not correct" );

        // Border's problem is well-known problem which can be solved in different ways
        // We just copy parts of original image without applying filtering
        Copy( in, startXIn, startYIn, out, startXOut,
              startYOut, width, kernelSize / 2 );
        Copy( in, startXIn, startYIn + height - kernelSize / 2, out, startXOut,
              startYOut + height - kernelSize / 2, width, kernelSize / 2 );
        Copy( in, startXIn, startYIn + kernelSize / 2, out, startXOut,
              startYOut + kernelSize / 2, kernelSize / 2, height - (kernelSize - 1) );
        Copy( in, startXIn + width - kernelSize / 2, startYIn + kernelSize / 2, out, startXOut + width - kernelSize / 2,
              startYOut + kernelSize / 2, kernelSize / 2, height - (kernelSize - 1) );

        std::vector < uint8_t > data( kernelSize * kernelSize );
        uint8_t * dataFirstValue = data.data();
        uint8_t * medianValue    = dataFirstValue + data.size() / 2;
        uint8_t * dataLastValue  = dataFirstValue + data.size();
        
        const uint32_t rowSizeIn  = in.rowSize();
        const uint32_t rowSizeOut = out.rowSize();

        const uint8_t * inY  = in.data()  + startYIn                     * rowSizeIn  + startXIn;
        uint8_t       * outY = out.data() + (startYOut + kernelSize / 2) * rowSizeOut + startXOut + kernelSize / 2;

        width  = width  - (kernelSize - 1);
        height = height - (kernelSize - 1);

        const uint8_t * outYEnd = outY + height * rowSizeOut;

        for( ; outY != outYEnd; outY += rowSizeOut, inY += rowSizeIn ) {
            const uint8_t * inX = inY;
            uint8_t       * outX = outY;

            const uint8_t * outXEnd = outX + width;

            for( ; outX != outXEnd; ++outX, ++inX ) {
                uint8_t * value = data.data();

                const uint8_t * inYRead    = inX;
                const uint8_t * inYReadEnd = inYRead + kernelSize * rowSizeIn;

                for( ; inYRead != inYReadEnd; inYRead += rowSizeIn ) {
                    const uint8_t * inXRead    = inYRead;
                    const uint8_t * inXReadEnd = inXRead + kernelSize;

                    for( ; inXRead != inXReadEnd; ++inXRead, ++value )
                        *value = *inXRead;
                }

                std::nth_element( dataFirstValue, medianValue, dataLastValue );
                (*outX) = *medianValue;
            }
        }
    }

So what am I looking at here? Well the reference to the input and output images are passed in into the function, as well as pointers to their starting pixels. The image’s dimension and the window size for the filter are also passed. The function includes some parameter validation, some copies to resolve the border issue, and the nested loops to perform the actual filtering. From a glance, that code looks solid.

Now let’s see if there are any hotspots for the function. With perf, the annotation shows me that inner block in the deepest loop is time consuming.

       │      _ZN14Image_Function6MedianERKN14PenguinV_Image13ImageTemplateIhEEjjRS2_jjjjj():                                      
       │                              *value = *inXRead;                                                                           
  5.10 │        ldrb   w9, [x1, x0]                                                                                                 
  2.49 │        strb   w9, [x2, x0]   

This makes sense since the line is nested into 4 loops. There’s not much I can do about that. Every other line was green so honestly, this function doesn’t really have any noticeable hotspots. I tried doing an objdump of the binary but its honestly a mess. I did find the portion with the “hotspot” noted earlier. It

  <_ZN14Image_Function6MedianERKN14PenguinV_Image13ImageTemplateIhEEjjRS2_jjjjj+0x6e8>  // b.none
    1e00:	3940080b 	ldrb	w11, [x0, #2]
    1e04:	91000c0a 	add	x10, x0, #0x3
    1e08:	3900092b 	strb	w11, [x9, #2]
    1e0c:	eb0a007f 	cmp	x3, x10
    1e10:	54000740 	b.eq	1ef8 <_ZN14Image_Function6MedianERKN14PenguinV_Image13ImageTemplateIhEEjjRS2_jjjjj+0x6e8>  // b.none
    1e14:	39400c0b 	ldrb	w11, [x0, #3]
    1e18:	9100100a 	add	x10, x0, #0x4
    1e1c:	39000d2b 	strb	w11, [x9, #3]
    1e20:	eb0a007f 	cmp	x3, x10
    1e24:	540006a0 	b.eq	1ef8 <_ZN14Image_Function6MedianERKN14PenguinV_Image13ImageTemplateIhEEjjRS2_jjjjj+0x6e8>  // b.none
    1e28:	3940100b 	ldrb	w11, [x0, #4]
    1e2c:	9100140a 	add	x10, x0, #0x5
    1e30:	3900112b 	strb	w11, [x9, #4]
    1e34:	eb0a007f 	cmp	x3, x10
    1e38:	54000600 	b.eq	1ef8 <_ZN14Image_Function6MedianERKN14PenguinV_Image13ImageTemplateIhEEjjRS2_jjjjj+0x6e8>  // b.none
    1e3c:	3940140b 	ldrb	w11, [x0, #5]
    1e40:	9100180a 	add	x10, x0, #0x6
    1e44:	3900152b 	strb	w11, [x9, #5]
    1e48:	eb0a007f 	cmp	x3, x10

Looks like the inner most loop was unrolled: the lines repeat about 20 times before continuing to something else. I guess there’s not much to gain from focusing on this hotspot.

Next time, I will try comparing the objdump from the -O2 compilation, to see where the difference lies that cause performance difference.

Adding More Tests

My school’s telescope project is coming along quite nicely. Although I didn’t contribute too much to the project, it’s still nice to see a working iteration now. Last time I worked on the summarizer functionality and its tests. After looking over the available issues, I decided to implement more test cases for existing functionality. Specifically, I worked on the tests for the wiki-feed-parser.

Implementing the test-cases for someone else’s work was pretty simple. I looked at the implementation and then tried to cover as many of the use cases for the functionality. This was made even easier now that jest provides a table of the code coverage for the tests so I have a rough idea if I touched all the edge cases in term of line coverage. However, I really think the person who implements a function should also implement their own test-cases or just follow test driven development.

Something I ran into that made me hesitate was when I was creating test cases for error handling. The project does not seem to have a standard for error handling and their messages so I was not sure about the expected thrown error. The error I got while testing a missing <pre> tag was whatever was thrown up the stack; it was handled explicitly. Not sure if thats the best practice but I just put a note on the PR and had the test past with the currently thrown error.

Test development is pretty fun; I learn by reading other people’s code and then try my best to break it. It’s like the releasing the urge to smash snowmen as a kid during winter. The semester is coming to an end but before this week ends, I hope to push the test coverage to 100% of line coverage on jest. Hopefully I will have enough time after my assignments to do so.

Scraping My Old PR Implementation For a Better Design

So in a previous post, I was talking about my experience working on a new modal on the pagermon project. Although the participants liked the implementation, they all seemed eager to have an accordion view instead of a popup modal. Hence, in response to that I just posted and took on a new issue on the project to do just that.

Honestly, I thought this would be a task; the ui.bootstrap library already includes an accordion directive which I noticed while working on the modal. So I looked over their example code and tried to plug it in directly into the message table. Unfortunately, it was not that easy; I should honestly assume things will never be that easy, if they were that easy then someone else would have already done them already. Anyways, the directives required me to wrap the accordion cell and its corresponding drop down in the same div. However, I was trying to have the accordion view appear for a table which wasn’t possible with the directive. So I had to build the accordion from scratch.

First, I added a controller to manage the index of the table row clicked and the accordion state.

    .controller('AccordionCtrl', function ($scope) {
       $scope.toggleHiddenRow = function(id, message){
         if (id != $scope.activeRow){
           $scope.activeRow = id;
           $scope.targetMsg = message;
         } 
         else {
           $scope.activeRow = null;
         }
       };
     })

Next I added added click functionality to the table cell.

 <td class="expand"><div highlight="message.message" replacement="init.replaceText" class="pagerMessage" ng-click="toggleHiddenRow($index, message)"></td>

And finally, I included an extra table row that will only appear when $index has a value, which will display all the clicked cell’s details right underneath its row.

          <tr ng-repeat-end ng-show="activeRow==$index">
               <td colspan="7">
                   <div>Date: {{ targetMsg.date }}</div>
                   <div>Time: {{ targetMsg.timestamp }}</div>
                   <% if (!hidesource || login) { %>
                   <div>Source: {{ targetMsg.source }}</div>
                   <% } %>
                   <% if (!hidecapcode || login) { %>
                   <div>Address: {{ targetMsg.address }}</div>
                   <% } %>
                   <div>Agency: {{ targetMsg.agency }}</div>
                   <div>Alias: {{ targetMsg.alias }}</div>
                   <div>Message: {{ targetMsg.message }}</div>
               </td>
           </tr>

With these additions, I got a functioning accordion view without destroying the existing table structure. The implementation would have been much faster if I was not so stubborn about using the angular directives. It’s best to know when to give up, especially if there are alternatives.

Analysis for PenguinV

Previously I benchmarked the functions in penguinV under O2 and O3 optimization levels, but I focused on only the image_function functions. I looked back at the results to see if setting the performance_tests to O3 would just not affect or improve all functionalities across the board. Here’s the spreadsheet with the findings.

O2O3
Function NameImage Dimensionsms+/- msms+/- msO3 < O2 (5% leeway)O3 < O2 (1% leeway)
blob_detection_SolidImage::_2048638.39.60648.76.59TRUEFALSE
blob_detection_SolidImage::_1024117.90.35120.80.28TRUEFALSE
blob_detection_SolidImage::_51227.630.2228.370.17TRUEFALSE
blob_detection_SolidImage::_2566.6950.106.9090.08TRUEFALSE
edge_detection_SolidImage::_204810902.6110186.19TRUETRUE
edge_detection_SolidImage::_10242311.06214.61.09TRUETRUE
edge_detection_SolidImage::_51257.980.7953.480.67TRUETRUE
edge_detection_SolidImage::_25614.680.1413.590.12TRUETRUE
filtering_SobelFilter::_204897.080.6097.80.50TRUETRUE
filtering_SobelFilter::_102423.280.2523.270.20TRUETRUE
filtering_SobelFilter::_5125.7660.125.750.10TRUETRUE
filtering_SobelFilter::_2561.4190.021.4180.02TRUETRUE
filtering_PrewittFilter::_204899.320.2999.50.58TRUETRUE
filtering_PrewittFilter::_102423.940.1423.940.77TRUETRUE
filtering_PrewittFilter::_5125.9470.075.9440.16TRUETRUE
filtering_PrewittFilter::_2561.4710.021.4710.08TRUETRUE
filtering_MedianFilter3x3::_204810173.4111280.33FALSEFALSE
filtering_MedianFilter3x3::_10242502.732770.17FALSEFALSE
filtering_MedianFilter3x3::_51261.940.7468.810.07FALSEFALSE
filtering_MedianFilter3x3::_25615.350.2817.040.02FALSEFALSE
image_function::AbsoluteDifference(256×256)0.75290.030.26230.02TRUETRUE
image_function::AbsoluteDifference(512×512)3.0870.111.1270.14TRUETRUE
image_function::AbsoluteDifference(1024×1024)14.360.206.5430.24TRUETRUE
image_function::AbsoluteDifference(2048×2048)66.030.3234.40.77TRUETRUE
function_pool::AbsoluteDifference(256×256)0.77850.250.76980.22TRUETRUE
function_pool::AbsoluteDifference(512×512)2.2030.502.2180.57TRUETRUE
function_pool::AbsoluteDifference(1024×1024)13.50.6013.580.42TRUETRUE
function_pool::AbsoluteDifference(2048×2048)63.916.1458.650.56TRUETRUE
image_function_neon::AbsoluteDifference(256×256)0.25170.010.25230.02TRUETRUE
image_function_neon::AbsoluteDifference(512×512)1.0880.071.0950.07TRUETRUE
image_function_neon::AbsoluteDifference(1024×1024)6.3260.206.3040.15TRUETRUE
image_function_neon::AbsoluteDifference(2048×2048)33.960.3133.820.26TRUETRUE
image_function_neon::BitwiseAnd(2048×2048)33.740.3033.580.26TRUETRUE
image_function_neon::BitwiseAnd(1024×1024)6.270.236.2280.16TRUETRUE
image_function_neon::BitwiseAnd(512×512)1.050.081.0530.07TRUETRUE
image_function_neon::BitwiseAnd(256×256)0.24730.020.24260.02TRUETRUE
function_pool::BitwiseAnd(2048×2048)63.935.5858.710.85TRUETRUE
function_pool::BitwiseAnd(1024×1024)13.741.3213.610.50TRUETRUE
function_pool::BitwiseAnd(512×512)2.2090.442.1520.40TRUETRUE
function_pool::BitwiseAnd(256×256)0.79860.270.76790.25TRUETRUE
image_function::BitwiseAnd(2048×2048)57.660.4733.660.26TRUETRUE
image_function::BitwiseAnd(1024×1024)12.440.296.2730.15TRUETRUE
image_function::BitwiseAnd(512×512)2.5910.091.0670.08TRUETRUE
image_function::BitwiseAnd(256×256)0.62450.030.24630.02TRUETRUE
image_function_neon::BitwiseOr(2048×2048)33.630.2833.60.27TRUETRUE
image_function_neon::BitwiseOr(1024×1024)6.2540.196.2470.15TRUETRUE
image_function_neon::BitwiseOr(512×512)1.0590.061.0610.09TRUETRUE
image_function_neon::BitwiseOr(256×256)0.24560.030.24380.02TRUETRUE
function_pool::BitwiseOr(2048×2048)64.263.2358.590.68TRUETRUE
function_pool::BitwiseOr(1024×1024)13.450.4513.60.45TRUEFALSE
function_pool::BitwiseOr(512×512)2.1520.392.1610.42TRUETRUE
function_pool::BitwiseOr(256×256)0.79320.250.76480.23TRUETRUE
image_function::BitwiseOr(2048×2048)58.260.3933.670.60TRUETRUE
image_function::BitwiseOr(1024×1024)12.190.206.4720.36TRUETRUE
image_function::BitwiseOr(512×512)2.6050.111.0870.09TRUETRUE
image_function::BitwiseOr(256×256)0.62540.020.24610.02TRUETRUE
image_function_neon::BitwiseXor(2048×2048)33.850.3633.690.79TRUETRUE
image_function_neon::BitwiseXor(1024×1024)6.2610.176.2390.16TRUETRUE
image_function_neon::BitwiseXor(512×512)1.0660.071.0560.08TRUETRUE
image_function_neon::BitwiseXor(256×256)0.24640.020.24390.02TRUETRUE
function_pool::BitwiseXor(2048×2048)63.65.1558.690.75TRUETRUE
function_pool::BitwiseXor(1024×1024)13.550.5013.630.64TRUETRUE
function_pool::BitwiseXor(512×512)2.2210.492.1550.36TRUETRUE
function_pool::BitwiseXor(256×256)0.79510.290.75270.21TRUETRUE
image_function::BitwiseXor(2048×2048)57.90.4833.620.25TRUETRUE
image_function::BitwiseXor(1024×1024)12.690.516.2710.16TRUETRUE
image_function::BitwiseXor(512×512)2.5660.101.0690.08TRUETRUE
image_function::BitwiseXor(256×256)0.61710.030.24650.02TRUETRUE
image_function_neon::Maximum(2048×2048)33.460.2933.640.29TRUETRUE
image_function_neon::Maximum(1024×1024)6.2070.176.2630.17TRUETRUE
image_function_neon::Maximum(512×512)1.0520.091.0660.07TRUEFALSE
image_function_neon::Maximum(256×256)0.24470.020.24570.02TRUETRUE
function_pool::Maximum(2048×2048)62.185.0158.670.71TRUETRUE
function_pool::Maximum(1024×1024)13.580.9813.610.55TRUETRUE
function_pool::Maximum(512×512)2.1650.402.1660.44TRUETRUE
function_pool::Maximum(256×256)0.78390.260.72960.22TRUETRUE
image_function::Maximum(2048×2048)61.70.3933.740.21TRUETRUE
image_function::Maximum(1024×1024)14.012.346.2890.17TRUETRUE
image_function::Maximum(512×512)2.7730.091.0980.09TRUETRUE
image_function::Maximum(256×256)0.67870.030.25350.02TRUETRUE
image_function_neon::Minimum(2048×2048)33.370.2833.670.27TRUETRUE
image_function_neon::Minimum(1024×1024)6.1910.196.2610.15TRUEFALSE
image_function_neon::Minimum(512×512)1.0320.081.0570.08TRUEFALSE
image_function_neon::Minimum(256×256)0.24460.020.250.02TRUEFALSE
function_pool::Minimum(2048×2048)67.097.0658.810.81TRUETRUE
function_pool::Minimum(1024×1024)14.391.1513.570.39TRUETRUE
function_pool::Minimum(512×512)2.550.542.1830.44TRUETRUE
function_pool::Minimum(256×256)0.83130.250.75410.23TRUETRUE
image_function::Minimum(2048×2048)62.140.4333.880.89TRUETRUE
image_function::Minimum(1024×1024)13.260.206.4610.20TRUETRUE
image_function::Minimum(512×512)2.8590.131.1010.08TRUETRUE
image_function::Minimum(256×256)0.68450.030.25250.02TRUETRUE
image_function_neon::Subtract(2048×2048)33.950.4433.90.31TRUETRUE
image_function_neon::Subtract(1024×1024)6.2590.186.3220.14TRUETRUE
image_function_neon::Subtract(512×512)1.1030.101.0960.08TRUETRUE
image_function_neon::Subtract(256×256)0.25490.030.25820.02TRUEFALSE
function_pool::Subtract(2048×2048)62.925.6158.690.62TRUETRUE
function_pool::Subtract(1024×1024)13.470.5513.60.39TRUETRUE
function_pool::Subtract(512×512)2.1760.462.1720.41TRUETRUE
function_pool::Subtract(256×256)0.78730.240.77490.22TRUETRUE
image_function::Subtract(2048×2048)66.390.7133.870.24TRUETRUE
image_function::Subtract(1024×1024)14.350.206.4590.28TRUETRUE
image_function::Subtract(512×512)3.0960.111.1090.09TRUETRUE
image_function::Subtract(256×256)0.75210.030.2560.02TRUETRUE
image_function::Flip(256×256)0.23740.000.067420.00TRUETRUE
image_function::Flip(512×512)0.96580.030.27590.03TRUETRUE
image_function::Flip(1024×1024)3.9460.111.1670.10TRUETRUE
image_function::Flip(2048×2048)18.260.257.8960.31TRUETRUE
image_function_neon::Flip(256×256)0.082140.000.082460.00TRUETRUE
image_function_neon::Flip(512×512)0.33580.020.33650.02TRUETRUE
image_function_neon::Flip(1024×1024)1.4080.101.4250.09TRUEFALSE
image_function_neon::Flip(2048×2048)8.1910.188.1930.58TRUETRUE
image_function::GammaCorrection(256×256)0.64530.020.64440.02TRUETRUE
image_function::GammaCorrection(512×512)2.3210.082.3210.05TRUETRUE
image_function::GammaCorrection(1024×1024)9.1210.149.1750.23TRUETRUE
image_function::GammaCorrection(2048×2048)41.410.3541.380.30TRUETRUE
function_pool::GammaCorrection(256×256)0.57870.160.56550.15TRUETRUE
function_pool::GammaCorrection(512×512)1.3390.281.2960.25TRUETRUE
function_pool::GammaCorrection(1024×1024)6.1020.396.0490.40TRUETRUE
function_pool::GammaCorrection(2048×2048)29.370.5629.340.40TRUETRUE
image_function::Invert(256×256)0.38490.010.080230.01TRUETRUE
image_function::Invert(512×512)1.5850.050.35910.05TRUETRUE
image_function::Invert(1024×1024)6.4820.191.5760.12TRUETRUE
image_function::Invert(2048×2048)30.110.2510.760.19TRUETRUE
function_pool::Invert(256×256)0.31960.140.32140.15TRUETRUE
function_pool::Invert(512×512)0.60540.220.61510.24TRUEFALSE
function_pool::Invert(1024×1024)3.5620.333.5810.34TRUETRUE
function_pool::Invert(2048×2048)18.150.82180.40TRUETRUE
image_function_neon::Invert(256×256)0.075920.010.076210.01TRUETRUE
image_function_neon::Invert(512×512)0.34130.060.34760.06TRUEFALSE
image_function_neon::Invert(1024×1024)1.4890.101.5380.12TRUEFALSE
image_function_neon::Invert(2048×2048)10.810.2110.730.14TRUETRUE
image_function::Transpose(2048×2048)223.15.931964.36TRUETRUE
image_function::Transpose(1024×1024)31.950.3226.070.23TRUETRUE
image_function::Transpose(512×512)4.4590.093.8080.08TRUETRUE
image_function::Transpose(256×256)0.58040.010.65970.02FALSEFALSE
image_function::RgbToBgr(256×256)0.70970.040.33810.04TRUETRUE
image_function::RgbToBgr(512×512)2.9540.111.4680.09TRUETRUE
image_function::RgbToBgr(1024×1024)15.660.199.2760.16TRUETRUE
image_function::RgbToBgr(2048×2048)68.051.3643.530.65TRUETRUE
function_pool::RgbToBgr(256×256)0.59810.200.66410.28FALSEFALSE
function_pool::RgbToBgr(512×512)2.8840.362.8660.33TRUETRUE
function_pool::RgbToBgr(1024×1024)17.941.1317.830.53TRUETRUE
function_pool::RgbToBgr(2048×2048)77.894.4873.80.94TRUETRUE
image_function_neon::RgbToBgr(256×256)0.54520.040.54850.04TRUETRUE
image_function_neon::RgbToBgr(512×512)2.2820.112.3290.10TRUEFALSE
image_function_neon::RgbToBgr(1024×1024)13.850.3213.870.27TRUETRUE
image_function_neon::RgbToBgr(2048×2048)62.770.5963.20.58TRUETRUE
image_function::Threshold(256×256)0.46890.010.11370.01TRUETRUE
image_function::Threshold(512×512)1.920.080.49330.05TRUETRUE
image_function::Threshold(1024×1024)7.8350.122.160.16TRUETRUE
image_function::Threshold(2048×2048)36.840.3413.510.63TRUETRUE
function_pool::Threshold(256×256)0.37220.210.35660.15TRUETRUE
function_pool::Threshold(512×512)0.71940.270.70210.27TRUETRUE
function_pool::Threshold(1024×1024)4.560.784.3380.41TRUETRUE
function_pool::Threshold(2048×2048)22.951.2822.650.41TRUETRUE
image_function_neon::Threshold(256×256)0.097570.010.097650.01TRUETRUE
image_function_neon::Threshold(512×512)0.42240.050.43060.05TRUEFALSE
image_function_neon::Threshold(1024×1024)1.8640.141.9210.11TRUEFALSE
image_function_neon::Threshold(2048×2048)13.250.2413.230.34TRUETRUE
image_function::ThresholdDouble(256×256)0.46810.010.12220.01TRUETRUE
image_function::ThresholdDouble(512×512)1.9190.080.5260.05TRUETRUE
image_function::ThresholdDouble(1024×1024)7.8580.152.2790.10TRUETRUE
image_function::ThresholdDouble(2048×2048)36.870.3313.910.56TRUETRUE
function_pool::ThresholdDouble(256×256)0.38060.180.37030.16TRUETRUE
function_pool::ThresholdDouble(512×512)0.70960.280.7060.25TRUETRUE
function_pool::ThresholdDouble(1024×1024)4.3570.444.3430.33TRUETRUE
function_pool::ThresholdDouble(2048×2048)22.650.6222.650.44TRUETRUE
image_function_neon::ThresholdDouble(256×256)0.10680.010.10530.01TRUETRUE
image_function_neon::ThresholdDouble(512×512)0.45730.050.46420.05TRUEFALSE
image_function_neon::ThresholdDouble(1024×1024)2.0050.162.0360.12TRUEFALSE
image_function_neon::ThresholdDouble(2048×2048)13.270.2713.240.22TRUETRUE
image_function::LookupTable(256×256)0.51930.020.5180.01TRUETRUE
image_function::LookupTable(512×512)2.1190.052.1210.04TRUETRUE
image_function::LookupTable(1024×1024)8.5870.178.6550.12TRUETRUE
image_function::LookupTable(2048×2048)38.710.3038.770.52TRUETRUE
function_pool::LookupTable(256×256)0.42590.160.4370.14TRUEFALSE
function_pool::LookupTable(512×512)1.0710.271.0570.23TRUETRUE
function_pool::LookupTable(1024×1024)5.1510.405.1270.34TRUETRUE
function_pool::LookupTable(2048×2048)24.550.3424.50.39TRUETRUE
image_function::Accumulate(256×256)0.40470.000.11680.00TRUETRUE
image_function::Accumulate(512×512)1.6420.040.49180.02TRUETRUE
image_function::Accumulate(1024×1024)6.8420.132.4520.14TRUETRUE
image_function::Accumulate(2048×2048)29.010.2113.990.15TRUETRUE
image_function_neon::Accumulate(256×256)0.11350.000.11420.00TRUETRUE
image_function_neon::Accumulate(512×512)0.47540.020.47880.02TRUETRUE
image_function_neon::Accumulate(1024×1024)2.3960.152.4010.10TRUETRUE
image_function_neon::Accumulate(2048×2048)13.90.2013.920.17TRUETRUE
image_function::ConvertToGrayScale(256×256)0.81650.020.21050.00TRUETRUE
image_function::ConvertToGrayScale(512×512)3.2950.080.85840.03TRUETRUE
image_function::ConvertToGrayScale(1024×1024)13.390.143.5330.13TRUETRUE
image_function::ConvertToGrayScale(2048×2048)55.190.2415.270.24TRUETRUE
function_pool::ConvertToGrayScale(256×256)0.40860.130.26340.11TRUETRUE
function_pool::ConvertToGrayScale(512×512)1.1940.190.55070.16TRUETRUE
function_pool::ConvertToGrayScale(1024×1024)4.5340.351.9790.34TRUETRUE
function_pool::ConvertToGrayScale(2048×2048)20.560.7010.71.22TRUETRUE
image_function::ConvertToRgb(256×256)0.39430.000.39820.01TRUETRUE
image_function::ConvertToRgb(512×512)1.6320.041.6330.04TRUETRUE
image_function::ConvertToRgb(1024×1024)7.1490.207.1330.21TRUETRUE
image_function::ConvertToRgb(2048×2048)30.540.6730.610.79TRUETRUE
function_pool::ConvertToRgb(256×256)0.32810.170.32760.14TRUETRUE
function_pool::ConvertToRgb(512×512)0.90640.200.89480.19TRUETRUE
function_pool::ConvertToRgb(1024×1024)5.9550.515.8950.39TRUETRUE
function_pool::ConvertToRgb(2048×2048)24.330.6924.210.43TRUETRUE
image_function_neon::ConvertToRgb(256×256)0.17350.020.17730.01TRUEFALSE
image_function_neon::ConvertToRgb(512×512)0.71580.030.72010.04TRUETRUE
image_function_neon::ConvertToRgb(1024×1024)3.6890.113.8760.70TRUEFALSE
image_function_neon::ConvertToRgb(2048×2048)16.640.2316.920.68TRUEFALSE
image_function::Fill(256×256)0.0081520.000.0080390.00TRUETRUE
image_function::Fill(512×512)0.033580.000.034390.00TRUEFALSE
image_function::Fill(1024×1024)0.13570.000.14030.00TRUEFALSE
image_function::Fill(2048×2048)0.5490.010.5670.01TRUEFALSE
image_function::Histogram(256×256)0.46050.000.46020.00TRUETRUE
image_function::Histogram(512×512)1.8440.021.8440.03TRUETRUE
image_function::Histogram(1024×1024)7.4120.147.3920.10TRUETRUE
image_function::Histogram(2048×2048)29.720.5929.630.30TRUETRUE
function_pool::Histogram(256×256)0.2560.040.26360.11TRUEFALSE
function_pool::Histogram(512×512)0.60360.130.61410.11TRUEFALSE
function_pool::Histogram(1024×1024)2.0420.232.0110.15TRUETRUE
function_pool::Histogram(2048×2048)7.6690.237.6630.21TRUETRUE
image_function::ProjectionProfile(256×256)0.2680.000.26780.00TRUETRUE
image_function::ProjectionProfile(512×512)1.0650.021.0650.02TRUETRUE
image_function::ProjectionProfile(1024×1024)4.2580.114.2610.13TRUETRUE
image_function::ProjectionProfile(2048×2048)17.060.3217.060.29TRUETRUE
function_pool::ProjectionProfile(256×256)0.15270.050.15250.10TRUETRUE
function_pool::ProjectionProfile(512×512)0.2110.040.22890.06FALSEFALSE
function_pool::ProjectionProfile(1024×1024)0.46470.120.45740.11TRUETRUE
function_pool::ProjectionProfile(2048×2048)1.4360.201.4360.17TRUETRUE
image_function_neon::ProjectionProfile(256×256)0.081460.000.081180.00TRUETRUE
image_function_neon::ProjectionProfile(512×512)0.31220.010.31190.01TRUETRUE
image_function_neon::ProjectionProfile(1024×1024)1.2320.051.2310.06TRUETRUE
image_function_neon::ProjectionProfile(2048×2048)4.9450.164.9640.18TRUETRUE
image_function::ResizeDown(256×256)0.12610.000.12570.00TRUETRUE
image_function::ResizeDown(512×512)0.51420.020.51730.03TRUETRUE
image_function::ResizeDown(1024×1024)2.0830.052.2190.09FALSEFALSE
image_function::ResizeDown(2048×2048)8.7070.129.0330.57TRUEFALSE
function_pool::ResizeDown(256×256)0.18440.130.1880.11TRUEFALSE
function_pool::ResizeDown(512×512)0.32250.140.30530.13TRUETRUE
function_pool::ResizeDown(1024×1024)0.80960.180.81040.14TRUETRUE
function_pool::ResizeDown(2048×2048)3.0510.303.0310.27TRUETRUE
image_function::ResizeUp(256×256)1.9440.031.9480.03TRUETRUE
image_function::ResizeUp(512×512)7.8180.117.8270.11TRUETRUE
image_function::ResizeUp(1024×1024)32.960.2433.020.21TRUETRUE
image_function::ResizeUp(2048×2048)135.10.26136.60.66TRUEFALSE
function_pool::ResizeUp(256×256)0.74370.190.74630.14TRUETRUE
function_pool::ResizeUp(512×512)2.7890.242.7580.24TRUETRUE
function_pool::ResizeUp(1024×1024)12.440.3312.360.32TRUETRUE
function_pool::ResizeUp(2048×2048)48.10.5448.890.41TRUEFALSE
image_function::Sum(256×256)0.25440.080.066780.00TRUETRUE
image_function::Sum(512×512)0.98770.310.26670.00TRUETRUE
image_function::Sum(1024×1024)3.7081.401.0760.03TRUETRUE
image_function::Sum(2048×2048)12.840.504.3670.12TRUETRUE
function_pool::Sum(256×256)0.13990.070.13030.07TRUETRUE
function_pool::Sum(512×512)0.19490.070.1910.07TRUETRUE
function_pool::Sum(1024×1024)0.37680.090.37540.09TRUETRUE
function_pool::Sum(2048×2048)1.1170.141.1150.10TRUETRUE
image_function_neon::Sum(256×256)0.061250.000.061250.00TRUETRUE
image_function_neon::Sum(512×512)0.24410.000.24410.00TRUETRUE
image_function_neon::Sum(1024×1024)0.97750.010.97740.01TRUETRUE
image_function_neon::Sum(2048×2048)3.9160.083.9140.06TRUETRUE

This is a new run so the result will differ from the previous posts results, but this table is more readable. It’s a lot better on my excel file but this will have to do for WordPress. The time comparison is done with only the base time without factoring in variance. Each function is run 240 times to test performance. As seen in the table, with 5% leeway almost all the functions undergo an improvement or no change with O3 optimizations. Even with 1% leeway, the slight decrease in performance can be explained by the variance. The only function that worsens with O3 is the filtering_MedianFilter3x3 function with about a 10% time increase. So I might just enable O3 optimization for all functions but filtering_MedianFilter3x3.

Profiling Vpenguin

Last time, I benchmarked the Vpenguin functions. This time I will be profiling the Vpenguin functions with perf to see what are the hotspot functions. With the original compiler options on a Aarch64 machine and restricting the performance_test to only only Image_Function functions, I get the following report from perf:

As shown above, looks like Transpose and Resize an image takes the most time. Kind of makes sense since the code should redraw the image in both cases. Let’s focus on Transpose for now. Here’s the source code for the function.

void Transpose( const Image & in, uint32_t startXIn, uint32_t startYIn, Image & out, uint32_t startXOut, uint32_t startYOut,
                uint32_t width, uint32_t height )
{
    ParameterValidation( in, startXIn, startYIn, width, height );
    ParameterValidation( out, startXOut, startYOut, height, width );
    VerifyGrayScaleImage( in, out );

    const uint32_t rowSizeIn  = in.rowSize();
    const uint32_t rowSizeOut = out.rowSize();

    const uint8_t * inX  = in.data()  + startYIn  * rowSizeIn  + startXIn;
    uint8_t       * outY = out.data() + startYOut * rowSizeOut + startXOut;

    const uint8_t * outYEnd = outY + width * rowSizeOut;

    for( ; outY != outYEnd; outY += rowSizeOut, ++inX ) {
        const uint8_t * inY  = inX;
        uint8_t       * outX = outY;

        const uint8_t * outXEnd = outX + height;

        for( ; outX != outXEnd; ++outX, inY += rowSizeIn )
            (*outX) = *(inY);
    }
}

And here’s the annotation for the Transpose from perf:

As expected, storing the new value for the transposed image is taking a while. Honestly, not sure how else I would write this function to avoid this. I have a feeling the similar problem lies with Resize. Let’s have a look at the source code.

void Resize( const Image & in, uint32_t startXIn, uint32_t startYIn, uint32_t widthIn, uint32_t heightIn, Image & out, uint32_t startXOut, uint32_t startYOut, uint32_t widthOut, uint32_t heightOut )
{
    ParameterValidation( in, startXIn, startYIn, widthIn, heightIn );
    ParameterValidation( out, startXOut, startYOut, widthOut, heightOut );
    VerifyGrayScaleImage( in, out );

    const uint32_t rowSizeIn  = in.rowSize();
    const uint32_t rowSizeOut = out.rowSize();

    const uint8_t * inY  = in.data()  + startYIn  * rowSizeIn  + startXIn;
    uint8_t       * outY = out.data() + startYOut * rowSizeOut + startXOut;

    const uint8_t * outYEnd = outY + heightOut * rowSizeOut;

    uint32_t idY = 0;

    // Precalculation of X position
    std::vector < uint32_t > positionX( widthOut );
    for( uint32_t x = 0; x < widthOut; ++x )
        positionX[x] = x * widthIn / widthOut;

    for( ; outY != outYEnd; outY += rowSizeOut, ++idY ) {
        const uint8_t * inX  = inY + (idY * heightIn / heightOut) * rowSizeIn;
        uint8_t       * outX = outY;

        const uint8_t * outXEnd = outX + widthOut;

        const uint32_t * idX = positionX.data();

        for( ; outX != outXEnd; ++outX, ++idX )
            (*outX) = *(inX + (*idX));
    }
}

And now the profile.

The line thats creating a hotspot is very similar to the one found in Transpose. So once again, I’m stumped as to what I can do. I will look into possible solutions in my next post.

Re-benchmarking PenguinV

In a previous post, I profiled and benchmarked the blob_detection function for penguinV. Looking back on my method and results, I realized I didn’t do the best job. My sample size was small so it will really be difficult to determine if my changes are significant if I reran the benchmark tests. Therefore, I will be benchmarking performance with the performance_tests included in the project.

The bundled performance-test tests the runtime of all the functionalities of the project and builds each file with the following options by default:

-std=c++11 -Wall -Wextra -Wstrict-aliasing -Wpedantic -Wconversion -O2 -march=native

Seeing this, I believe it will be easy to change the optimizations and see their affects on performance. Curious, I ran the test as is on an AArch64 machine to see the results. It outputted the following stats:

Full Performance Test: O2
[1/260] blob_detection_SolidImage::_2048... 644.1+/-4.882 ms
[2/260] blob_detection_SolidImage::_1024... 117.8+/-0.2956 ms
[3/260] blob_detection_SolidImage::_512... 27.54+/-0.2564 ms
[4/260] blob_detection_SolidImage::_256... 6.704+/-0.09339 ms
[5/260] edge_detection_SolidImage::_2048... 1091+/-3.844 ms
[6/260] edge_detection_SolidImage::_1024... 232.2+/-1.265 ms
[7/260] edge_detection_SolidImage::_512... 58.14+/-0.5147 ms
[8/260] edge_detection_SolidImage::_256... 14.67+/-0.1085 ms
[9/260] filtering_SobelFilter::_2048... 96.92+/-0.4424 ms
[10/260] filtering_SobelFilter::_1024... 23.42+/-0.421 ms
[11/260] filtering_SobelFilter::_512... 5.759+/-0.1216 ms
[12/260] filtering_SobelFilter::_256... 1.418+/-0.0152 ms
[13/260] filtering_PrewittFilter::_2048... 99.43+/-0.2582 ms
[14/260] filtering_PrewittFilter::_1024... 23.94+/-0.1781 ms
[15/260] filtering_PrewittFilter::_512... 5.945+/-0.09214 ms
[16/260] filtering_PrewittFilter::_256... 1.471+/-0.01392 ms
[17/260] filtering_MedianFilter3x3::_2048... 1017+/-3.596 ms
[18/260] filtering_MedianFilter3x3::_1024... 250+/-3.204 ms
[19/260] filtering_MedianFilter3x3::_512... 61.91+/-0.9687 ms
[20/260] filtering_MedianFilter3x3::_256... 15.35+/-0.2274 ms
[21/260] image_function::AbsoluteDifference (256x256)... 0.7536+/-0.02952 ms
[22/260] image_function::AbsoluteDifference (512x512)... 3.08+/-0.108 ms
[23/260] image_function::AbsoluteDifference (1024x1024)... 14.38+/-0.3488 ms
[24/260] image_function::AbsoluteDifference (2048x2048)... 65.76+/-0.7173 ms
[25/260] function_pool::AbsoluteDifference (256x256)... 0.8266+/-0.2606 ms
[26/260] function_pool::AbsoluteDifference (512x512)... 2.31+/-0.4604 ms
[27/260] function_pool::AbsoluteDifference (1024x1024)... 12.91+/-1.061 ms
[28/260] function_pool::AbsoluteDifference (2048x2048)... 54.91+/-0.4894 ms
[29/260] image_function_neon::AbsoluteDifference (256x256)... 0.257+/-0.0183 ms
[30/260] image_function_neon::AbsoluteDifference (512x512)... 1.092+/-0.07149 ms
[31/260] image_function_neon::AbsoluteDifference (1024x1024)... 6.195+/-0.1585 ms
[32/260] image_function_neon::AbsoluteDifference (2048x2048)... 33.22+/-0.2824 ms
[33/260] image_function_neon::BitwiseAnd (2048x2048)... 32.99+/-0.3328 ms
[34/260] image_function_neon::BitwiseAnd (1024x1024)... 6.115+/-0.1509 ms
[35/260] image_function_neon::BitwiseAnd (512x512)... 1.049+/-0.06144 ms
[36/260] image_function_neon::BitwiseAnd (256x256)... 0.2454+/-0.02392 ms
[37/260] function_pool::BitwiseAnd (2048x2048)... 54.84+/-0.4876 ms
[38/260] function_pool::BitwiseAnd (1024x1024)... 12.42+/-0.5715 ms
[39/260] function_pool::BitwiseAnd (512x512)... 2.189+/-0.4914 ms
[40/260] function_pool::BitwiseAnd (256x256)... 0.778+/-0.2385 ms
[41/260] image_function::BitwiseAnd (2048x2048)... 57.94+/-0.3558 ms
[42/260] image_function::BitwiseAnd (1024x1024)... 12.45+/-0.468 ms
[43/260] image_function::BitwiseAnd (512x512)... 2.59+/-0.111 ms
[44/260] image_function::BitwiseAnd (256x256)... 0.6221+/-0.03655 ms
[45/260] image_function_neon::BitwiseOr (2048x2048)... 33.01+/-0.4019 ms
[46/260] image_function_neon::BitwiseOr (1024x1024)... 6.19+/-0.2321 ms
[47/260] image_function_neon::BitwiseOr (512x512)... 1.032+/-0.05734 ms
[48/260] image_function_neon::BitwiseOr (256x256)... 0.2428+/-0.02204 ms
[49/260] function_pool::BitwiseOr (2048x2048)... 55.57+/-2.935 ms
[50/260] function_pool::BitwiseOr (1024x1024)... 12.79+/-0.8624 ms
[51/260] function_pool::BitwiseOr (512x512)... 2.233+/-0.5364 ms
[52/260] function_pool::BitwiseOr (256x256)... 0.8+/-0.2369 ms
[53/260] image_function::BitwiseOr (2048x2048)... 59.52+/-2.381 ms
[54/260] image_function::BitwiseOr (1024x1024)... 12.51+/-0.8488 ms
[55/260] image_function::BitwiseOr (512x512)... 2.592+/-0.1702 ms
[56/260] image_function::BitwiseOr (256x256)... 0.6198+/-0.03203 ms
[57/260] image_function_neon::BitwiseXor (2048x2048)... 33.58+/-1.725 ms
[58/260] image_function_neon::BitwiseXor (1024x1024)... 6.392+/-0.8578 ms
[59/260] image_function_neon::BitwiseXor (512x512)... 1.079+/-0.1124 ms
[60/260] image_function_neon::BitwiseXor (256x256)... 0.2399+/-0.0168 ms
[61/260] function_pool::BitwiseXor (2048x2048)... 59.29+/-4.122 ms
[62/260] function_pool::BitwiseXor (1024x1024)... 12.98+/-1.06 ms
[63/260] function_pool::BitwiseXor (512x512)... 2.268+/-0.5645 ms
[64/260] function_pool::BitwiseXor (256x256)... 0.822+/-0.24 ms
[65/260] image_function::BitwiseXor (2048x2048)... 58.87+/-1.657 ms
[66/260] image_function::BitwiseXor (1024x1024)... 12.47+/-0.9454 ms
[67/260] image_function::BitwiseXor (512x512)... 2.563+/-0.1807 ms
[68/260] image_function::BitwiseXor (256x256)... 0.6177+/-0.03174 ms
[69/260] image_function_neon::Maximum (2048x2048)... 33.57+/-1.528 ms
[70/260] image_function_neon::Maximum (1024x1024)... 6.247+/-0.721 ms
[71/260] image_function_neon::Maximum (512x512)... 1.034+/-0.08706 ms
[72/260] image_function_neon::Maximum (256x256)... 0.2423+/-0.02615 ms
[73/260] function_pool::Maximum (2048x2048)... 59.24+/-4.597 ms
[74/260] function_pool::Maximum (1024x1024)... 12.98+/-0.8755 ms
[75/260] function_pool::Maximum (512x512)... 2.245+/-0.4465 ms
[76/260] function_pool::Maximum (256x256)... 0.7986+/-0.2407 ms
[77/260] image_function::Maximum (2048x2048)... 62.84+/-1.615 ms
[78/260] image_function::Maximum (1024x1024)... 13.4+/-1.272 ms
[79/260] image_function::Maximum (512x512)... 2.838+/-0.2363 ms
[80/260] image_function::Maximum (256x256)... 0.6828+/-0.03177 ms
[81/260] image_function_neon::Minimum (2048x2048)... 33.53+/-1.2 ms
[82/260] image_function_neon::Minimum (1024x1024)... 6.283+/-0.6494 ms
[83/260] image_function_neon::Minimum (512x512)... 1.087+/-0.1142 ms
[84/260] image_function_neon::Minimum (256x256)... 0.2436+/-0.02343 ms
[85/260] function_pool::Minimum (2048x2048)... 59.32+/-3.579 ms
[86/260] function_pool::Minimum (1024x1024)... 13.15+/-1.382 ms
[87/260] function_pool::Minimum (512x512)... 2.242+/-0.5249 ms
[88/260] function_pool::Minimum (256x256)... 0.8011+/-0.2037 ms
[89/260] image_function::Minimum (2048x2048)... 62.28+/-1.296 ms
[90/260] image_function::Minimum (1024x1024)... 13.46+/-0.9213 ms
[91/260] image_function::Minimum (512x512)... 2.863+/-0.1996 ms
[92/260] image_function::Minimum (256x256)... 0.681+/-0.0305 ms
[93/260] image_function_neon::Subtract (2048x2048)... 33.32+/-1.299 ms
[94/260] image_function_neon::Subtract (1024x1024)... 6.286+/-0.5606 ms
[95/260] image_function_neon::Subtract (512x512)... 1.108+/-0.1577 ms
[96/260] image_function_neon::Subtract (256x256)... 0.2495+/-0.02794 ms
[97/260] function_pool::Subtract (2048x2048)... 57.56+/-3.527 ms
[98/260] function_pool::Subtract (1024x1024)... 12.67+/-1.054 ms
[99/260] function_pool::Subtract (512x512)... 2.301+/-0.5693 ms
[100/260] function_pool::Subtract (256x256)... 0.8028+/-0.2292 ms
[101/260] image_function::Subtract (2048x2048)... 66.7+/-1.264 ms
[102/260] image_function::Subtract (1024x1024)... 14.74+/-1.194 ms
[103/260] image_function::Subtract (512x512)... 3.159+/-0.2722 ms
[104/260] image_function::Subtract (256x256)... 0.7526+/-0.02795 ms
[105/260] image_function::Flip (256x256)... 0.237+/-0.003931 ms
[106/260] image_function::Flip (512x512)... 0.9722+/-0.03968 ms
[107/260] image_function::Flip (1024x1024)... 4.095+/-0.373 ms
[108/260] image_function::Flip (2048x2048)... 18.27+/-0.7444 ms
[109/260] image_function_neon::Flip (256x256)... 0.08221+/-0.004138 ms
[110/260] image_function_neon::Flip (512x512)... 0.3385+/-0.02833 ms
[111/260] image_function_neon::Flip (1024x1024)... 1.477+/-0.1693 ms
[112/260] image_function_neon::Flip (2048x2048)... 8.173+/-0.6429 ms
[113/260] image_function::GammaCorrection (256x256)... 0.6447+/-0.02131 ms
[114/260] image_function::GammaCorrection (512x512)... 2.331+/-0.08534 ms
[115/260] image_function::GammaCorrection (1024x1024)... 9.381+/-0.4684 ms
[116/260] image_function::GammaCorrection (2048x2048)... 41.52+/-0.8448 ms
[117/260] function_pool::GammaCorrection (256x256)... 0.5948+/-0.1727 ms
[118/260] function_pool::GammaCorrection (512x512)... 1.346+/-0.318 ms
[119/260] function_pool::GammaCorrection (1024x1024)... 5.98+/-0.6899 ms
[120/260] function_pool::GammaCorrection (2048x2048)... 28.38+/-1.462 ms
[121/260] image_function::Invert (256x256)... 0.3844+/-0.008902 ms
[122/260] image_function::Invert (512x512)... 1.592+/-0.08024 ms
[123/260] image_function::Invert (1024x1024)... 6.7+/-0.3903 ms
[124/260] image_function::Invert (2048x2048)... 30.24+/-0.7244 ms
[125/260] function_pool::Invert (256x256)... 0.2956+/-0.1471 ms
[126/260] function_pool::Invert (512x512)... 0.594+/-0.1977 ms
[127/260] function_pool::Invert (1024x1024)... 3.411+/-0.5153 ms
[128/260] function_pool::Invert (2048x2048)... 17.2+/-1.156 ms
[129/260] image_function_neon::Invert (256x256)... 0.07568+/-0.006214 ms
[130/260] image_function_neon::Invert (512x512)... 0.3419+/-0.05994 ms
[131/260] image_function_neon::Invert (1024x1024)... 1.644+/-0.3138 ms
[132/260] image_function_neon::Invert (2048x2048)... 10.82+/-0.9105 ms
[133/260] image_function::Transpose (2048x2048)... 228.6+/-6.665 ms
[134/260] image_function::Transpose (1024x1024)... 33.33+/-1.076 ms
[135/260] image_function::Transpose (512x512)... 3.86+/-0.2014 ms
[136/260] image_function::Transpose (256x256)... 0.5783+/-0.009246 ms
[137/260] image_function::RgbToBgr (256x256)... 0.7123+/-0.04254 ms
[138/260] image_function::RgbToBgr (512x512)... 3.084+/-0.2881 ms
[139/260] image_function::RgbToBgr (1024x1024)... 15.8+/-0.98 ms
[140/260] image_function::RgbToBgr (2048x2048)... 69.02+/-1.114 ms
[141/260] function_pool::RgbToBgr (256x256)... 0.6392+/-0.278 ms
[142/260] function_pool::RgbToBgr (512x512)... 2.833+/-0.4471 ms
[143/260] function_pool::RgbToBgr (1024x1024)... 16.78+/-1.126 ms
[144/260] function_pool::RgbToBgr (2048x2048)... 74.39+/-3.535 ms
[145/260] image_function_neon::RgbToBgr (256x256)... 0.5463+/-0.04665 ms
[146/260] image_function_neon::RgbToBgr (512x512)... 2.4+/-0.2369 ms
[147/260] image_function_neon::RgbToBgr (1024x1024)... 14.15+/-1.281 ms
[148/260] image_function_neon::RgbToBgr (2048x2048)... 64.75+/-1.331 ms
[149/260] image_function::Threshold (256x256)... 0.4687+/-0.02165 ms
[150/260] image_function::Threshold (512x512)... 1.929+/-0.09718 ms
[151/260] image_function::Threshold (1024x1024)... 8.089+/-0.2793 ms
[152/260] image_function::Threshold (2048x2048)... 37.08+/-1.105 ms
[153/260] function_pool::Threshold (256x256)... 0.3702+/-0.1546 ms
[154/260] function_pool::Threshold (512x512)... 0.7408+/-0.2816 ms
[155/260] function_pool::Threshold (1024x1024)... 4.195+/-1.164 ms
[156/260] function_pool::Threshold (2048x2048)... 21.71+/-3.187 ms
[157/260] image_function_neon::Threshold (256x256)... 0.09694+/-0.006778 ms
[158/260] image_function_neon::Threshold (512x512)... 0.4259+/-0.06024 ms
[159/260] image_function_neon::Threshold (1024x1024)... 2.069+/-0.6858 ms
[160/260] image_function_neon::Threshold (2048x2048)... 13.48+/-1.06 ms
[161/260] image_function::ThresholdDouble (256x256)... 0.4686+/-0.018 ms
[162/260] image_function::ThresholdDouble (512x512)... 1.934+/-0.0931 ms
[163/260] image_function::ThresholdDouble (1024x1024)... 8.091+/-0.3839 ms
[164/260] image_function::ThresholdDouble (2048x2048)... 37.12+/-0.9274 ms
[165/260] function_pool::ThresholdDouble (256x256)... 0.3805+/-0.1708 ms
[166/260] function_pool::ThresholdDouble (512x512)... 0.7337+/-0.2455 ms
[167/260] function_pool::ThresholdDouble (1024x1024)... 4.211+/-0.6908 ms
[168/260] function_pool::ThresholdDouble (2048x2048)... 21.77+/-1.834 ms
[169/260] image_function_neon::ThresholdDouble (256x256)... 0.1061+/-0.006657 ms
[170/260] image_function_neon::ThresholdDouble (512x512)... 0.4629+/-0.05991 ms
[171/260] image_function_neon::ThresholdDouble (1024x1024)... 2.19+/-0.2844 ms
[172/260] image_function_neon::ThresholdDouble (2048x2048)... 13.63+/-1.187 ms
[173/260] image_function::LookupTable (256x256)... 0.5199+/-0.02682 ms
[174/260] image_function::LookupTable (512x512)... 2.144+/-0.09391 ms
[175/260] image_function::LookupTable (1024x1024)... 8.89+/-0.478 ms
[176/260] image_function::LookupTable (2048x2048)... 38.91+/-0.6672 ms
[177/260] function_pool::LookupTable (256x256)... 0.4409+/-0.1507 ms
[178/260] function_pool::LookupTable (512x512)... 1.083+/-0.2534 ms
[179/260] function_pool::LookupTable (1024x1024)... 5.02+/-0.4968 ms
[180/260] function_pool::LookupTable (2048x2048)... 23.89+/-1.18 ms
[181/260] image_function::Accumulate (256x256)... 0.4064+/-0.02508 ms
[182/260] image_function::Accumulate (512x512)... 1.668+/-0.09847 ms
[183/260] image_function::Accumulate (1024x1024)... 7.04+/-0.3407 ms
[184/260] image_function::Accumulate (2048x2048)... 29.22+/-0.2608 ms
[185/260] image_function_neon::Accumulate (256x256)... 0.1131+/-0.002631 ms
[186/260] image_function_neon::Accumulate (512x512)... 0.4831+/-0.03134 ms
[187/260] image_function_neon::Accumulate (1024x1024)... 2.595+/-0.6939 ms
[188/260] image_function_neon::Accumulate (2048x2048)... 14.21+/-0.6195 ms
[189/260] image_function::ConvertToGrayScale (256x256)... 0.8186+/-0.03388 ms
[190/260] image_function::ConvertToGrayScale (512x512)... 3.321+/-0.1023 ms
[191/260] image_function::ConvertToGrayScale (1024x1024)... 13.59+/-0.3472 ms
[192/260] image_function::ConvertToGrayScale (2048x2048)... 55.49+/-0.4218 ms
[193/260] function_pool::ConvertToGrayScale (256x256)... 0.4016+/-0.1393 ms
[194/260] function_pool::ConvertToGrayScale (512x512)... 1.162+/-0.2029 ms
[195/260] function_pool::ConvertToGrayScale (1024x1024)... 4.679+/-0.4304 ms
[196/260] function_pool::ConvertToGrayScale (2048x2048)... 19.88+/-0.5445 ms
[197/260] image_function::ConvertToRgb (256x256)... 0.3967+/-0.01908 ms
[198/260] image_function::ConvertToRgb (512x512)... 1.656+/-0.09458 ms
[199/260] image_function::ConvertToRgb (1024x1024)... 7.26+/-0.2994 ms
[200/260] image_function::ConvertToRgb (2048x2048)... 30.85+/-0.9787 ms
[201/260] function_pool::ConvertToRgb (256x256)... 0.3118+/-0.1274 ms
[202/260] function_pool::ConvertToRgb (512x512)... 0.9781+/-0.3978 ms
[203/260] function_pool::ConvertToRgb (1024x1024)... 5.435+/-0.8357 ms
[204/260] function_pool::ConvertToRgb (2048x2048)... 23.36+/-2.089 ms
[205/260] image_function_neon::ConvertToRgb (256x256)... 0.1736+/-0.01261 ms
[206/260] image_function_neon::ConvertToRgb (512x512)... 0.7432+/-0.1374 ms
[207/260] image_function_neon::ConvertToRgb (1024x1024)... 3.825+/-0.5173 ms
[208/260] image_function_neon::ConvertToRgb (2048x2048)... 16.85+/-0.3992 ms
[209/260] image_function::Fill (256x256)... 0.007951+/-0.00105 ms
[210/260] image_function::Fill (512x512)... 0.03333+/-0.001273 ms
[211/260] image_function::Fill (1024x1024)... 0.1347+/-0.001907 ms
[212/260] image_function::Fill (2048x2048)... 0.5469+/-0.01802 ms
[213/260] image_function::Histogram (256x256)... 0.4613+/-0.01079 ms
[214/260] image_function::Histogram (512x512)... 1.85+/-0.06158 ms
[215/260] image_function::Histogram (1024x1024)... 7.438+/-0.1847 ms
[216/260] image_function::Histogram (2048x2048)... 29.88+/-0.5901 ms
[217/260] function_pool::Histogram (256x256)... 0.2735+/-0.09155 ms
[218/260] function_pool::Histogram (512x512)... 0.6152+/-0.1443 ms
[219/260] function_pool::Histogram (1024x1024)... 2.04+/-0.1945 ms
[220/260] function_pool::Histogram (2048x2048)... 7.736+/-0.3303 ms
[221/260] image_function::ProjectionProfile (256x256)... 0.268+/-0.001333 ms
[222/260] image_function::ProjectionProfile (512x512)... 1.068+/-0.03922 ms
[223/260] image_function::ProjectionProfile (1024x1024)... 4.279+/-0.1636 ms
[224/260] image_function::ProjectionProfile (2048x2048)... 17.21+/-0.4607 ms
[225/260] function_pool::ProjectionProfile (256x256)... 0.1439+/-0.08363 ms
[226/260] function_pool::ProjectionProfile (512x512)... 0.2179+/-0.04444 ms
[227/260] function_pool::ProjectionProfile (1024x1024)... 0.4726+/-0.1119 ms
[228/260] function_pool::ProjectionProfile (2048x2048)... 1.484+/-0.36 ms
[229/260] image_function_neon::ProjectionProfile (256x256)... 0.08139+/-0.0014 ms
[230/260] image_function_neon::ProjectionProfile (512x512)... 0.3136+/-0.02433 ms
[231/260] image_function_neon::ProjectionProfile (1024x1024)... 1.243+/-0.08512 ms
[232/260] image_function_neon::ProjectionProfile (2048x2048)... 4.997+/-0.1592 ms
[233/260] image_function::ResizeDown (256x256)... 0.1259+/-0.001517 ms
[234/260] image_function::ResizeDown (512x512)... 0.5145+/-0.03417 ms
[235/260] image_function::ResizeDown (1024x1024)... 2.158+/-0.1465 ms
[236/260] image_function::ResizeDown (2048x2048)... 8.934+/-0.4549 ms
[237/260] function_pool::ResizeDown (256x256)... 0.1714+/-0.08584 ms
[238/260] function_pool::ResizeDown (512x512)... 0.3107+/-0.1401 ms
[239/260] function_pool::ResizeDown (1024x1024)... 0.8567+/-0.2059 ms
[240/260] function_pool::ResizeDown (2048x2048)... 3.101+/-0.3542 ms
[241/260] image_function::ResizeUp (256x256)... 1.957+/-0.08039 ms
[242/260] image_function::ResizeUp (512x512)... 7.927+/-0.2471 ms
[243/260] image_function::ResizeUp (1024x1024)... 33.47+/-0.4568 ms
[244/260] image_function::ResizeUp (2048x2048)... 136+/-0.5939 ms
[245/260] function_pool::ResizeUp (256x256)... 0.7161+/-0.1772 ms
[246/260] function_pool::ResizeUp (512x512)... 2.763+/-0.3112 ms
[247/260] function_pool::ResizeUp (1024x1024)... 11.71+/-0.3116 ms
[248/260] function_pool::ResizeUp (2048x2048)... 46.16+/-0.6046 ms
[249/260] image_function::Sum (256x256)... 0.2556+/-0.07038 ms
[250/260] image_function::Sum (512x512)... 0.9867+/-0.3225 ms
[251/260] image_function::Sum (1024x1024)... 3.722+/-1.396 ms
[252/260] image_function::Sum (2048x2048)... 12.82+/-0.5736 ms
[253/260] function_pool::Sum (256x256)... 0.1151+/-0.08029 ms
[254/260] function_pool::Sum (512x512)... 0.1962+/-0.08735 ms
[255/260] function_pool::Sum (1024x1024)... 0.3645+/-0.07605 ms
[256/260] function_pool::Sum (2048x2048)... 1.119+/-0.1327 ms
[257/260] image_function_neon::Sum (256x256)... 0.06127+/-3.294e-05 ms
[258/260] image_function_neon::Sum (512x512)... 0.2441+/-0.0002788 ms
[259/260] image_function_neon::Sum (1024x1024)... 0.9773+/-0.009326 ms
[260/260] image_function_neon::Sum (2048x2048)... 3.916+/-0.06995 ms

It’s a long list, but pretty useful. Now I changed the compiler option to from -O2 to -O3 and got the following output:

Full Performance Test: O3
[1/260] blob_detection_SolidImage::_2048... 659.2+/-8.414 ms
[2/260] blob_detection_SolidImage::_1024... 122.3+/-1.137 ms
[3/260] blob_detection_SolidImage::_512... 28.94+/-0.9154 ms
[4/260] blob_detection_SolidImage::_256... 6.977+/-0.2504 ms
[5/260] edge_detection_SolidImage::_2048... 1021+/-18.73 ms
[6/260] edge_detection_SolidImage::_1024... 214.3+/-1.528 ms
[7/260] edge_detection_SolidImage::_512... 53.9+/-0.8937 ms
[8/260] edge_detection_SolidImage::_256... 13.58+/-0.1193 ms
[9/260] filtering_SobelFilter::_256... 1.419+/-0.02259 ms
[10/260] filtering_SobelFilter::_512... 5.776+/-0.1041 ms
[11/260] filtering_SobelFilter::_1024... 23.34+/-0.2983 ms
[12/260] filtering_SobelFilter::_2048... 96.36+/-0.3666 ms
[13/260] filtering_MedianFilter3x3::_256... 17.05+/-0.1499 ms
[14/260] filtering_MedianFilter3x3::_512... 68.82+/-0.1801 ms
[15/260] filtering_MedianFilter3x3::_1024... 277.2+/-0.9412 ms
[16/260] filtering_MedianFilter3x3::_2048... 1129+/-1.424 ms
[17/260] filtering_PrewittFilter::_256... 1.471+/-0.01698 ms
[18/260] filtering_PrewittFilter::_512... 5.945+/-0.08082 ms
[19/260] filtering_PrewittFilter::_1024... 23.98+/-0.2115 ms
[20/260] filtering_PrewittFilter::_2048... 99.34+/-0.6689 ms
[21/260] image_function::AbsoluteDifference (256x256)... 0.2573+/-0.01359 ms
[22/260] image_function::AbsoluteDifference (512x512)... 1.119+/-0.085 ms
[23/260] image_function::AbsoluteDifference (1024x1024)... 6.548+/-0.2059 ms
[24/260] image_function::AbsoluteDifference (2048x2048)... 35.21+/-0.2765 ms
[25/260] function_pool::AbsoluteDifference (256x256)... 0.8006+/-0.2557 ms
[26/260] function_pool::AbsoluteDifference (512x512)... 2.257+/-0.4178 ms
[27/260] function_pool::AbsoluteDifference (1024x1024)... 16.2+/-3.114 ms
[28/260] function_pool::AbsoluteDifference (2048x2048)... 86.39+/-4.395 ms
[29/260] image_function_neon::AbsoluteDifference (256x256)... 0.2564+/-0.01983 ms
[30/260] image_function_neon::AbsoluteDifference (512x512)... 1.106+/-0.08391 ms
[31/260] image_function_neon::AbsoluteDifference (1024x1024)... 6.455+/-0.2998 ms
[32/260] image_function_neon::AbsoluteDifference (2048x2048)... 35.07+/-0.3718 ms
[33/260] image_function_neon::BitwiseAnd (2048x2048)... 34.85+/-0.2811 ms
[34/260] image_function_neon::BitwiseAnd (1024x1024)... 6.363+/-0.1702 ms
[35/260] image_function_neon::BitwiseAnd (512x512)... 1.067+/-0.0952 ms
[36/260] image_function_neon::BitwiseAnd (256x256)... 0.2459+/-0.02243 ms
[37/260] function_pool::BitwiseAnd (2048x2048)... 86.64+/-5.798 ms
[38/260] function_pool::BitwiseAnd (1024x1024)... 16.05+/-2.712 ms
[39/260] function_pool::BitwiseAnd (512x512)... 2.207+/-0.3405 ms
[40/260] function_pool::BitwiseAnd (256x256)... 0.7602+/-0.2183 ms
[41/260] image_function::BitwiseAnd (2048x2048)... 34.75+/-0.2531 ms
[42/260] image_function::BitwiseAnd (1024x1024)... 6.451+/-0.1742 ms
[43/260] image_function::BitwiseAnd (512x512)... 1.075+/-0.07546 ms
[44/260] image_function::BitwiseAnd (256x256)... 0.2475+/-0.0199 ms
[45/260] image_function_neon::BitwiseOr (2048x2048)... 34.72+/-0.2911 ms
[46/260] image_function_neon::BitwiseOr (1024x1024)... 6.423+/-0.1714 ms
[47/260] image_function_neon::BitwiseOr (512x512)... 1.069+/-0.1073 ms
[48/260] image_function_neon::BitwiseOr (256x256)... 0.2428+/-0.02553 ms
[49/260] function_pool::BitwiseOr (2048x2048)... 89.96+/-4.571 ms
[50/260] function_pool::BitwiseOr (1024x1024)... 15.33+/-0.7198 ms
[51/260] function_pool::BitwiseOr (512x512)... 2.174+/-0.3996 ms
[52/260] function_pool::BitwiseOr (256x256)... 0.7846+/-0.2347 ms
[53/260] image_function::BitwiseOr (2048x2048)... 34.33+/-0.4701 ms
[54/260] image_function::BitwiseOr (1024x1024)... 6.45+/-0.1625 ms
[55/260] image_function::BitwiseOr (512x512)... 1.08+/-0.1068 ms
[56/260] image_function::BitwiseOr (256x256)... 0.2484+/-0.02308 ms
[57/260] image_function_neon::BitwiseXor (2048x2048)... 34.31+/-0.4114 ms
[58/260] image_function_neon::BitwiseXor (1024x1024)... 6.437+/-0.3281 ms
[59/260] image_function_neon::BitwiseXor (512x512)... 1.064+/-0.09985 ms
[60/260] image_function_neon::BitwiseXor (256x256)... 0.2423+/-0.02624 ms
[61/260] function_pool::BitwiseXor (2048x2048)... 94.74+/-1.477 ms
[62/260] function_pool::BitwiseXor (1024x1024)... 15.59+/-1.671 ms
[63/260] function_pool::BitwiseXor (512x512)... 2.317+/-0.7291 ms
[64/260] function_pool::BitwiseXor (256x256)... 0.776+/-0.2322 ms
[65/260] image_function::BitwiseXor (2048x2048)... 34.88+/-0.8486 ms
[66/260] image_function::BitwiseXor (1024x1024)... 6.428+/-0.1557 ms
[67/260] image_function::BitwiseXor (512x512)... 1.071+/-0.09344 ms
[68/260] image_function::BitwiseXor (256x256)... 0.2472+/-0.01476 ms
[69/260] image_function_neon::Maximum (2048x2048)... 34.77+/-0.3337 ms
[70/260] image_function_neon::Maximum (1024x1024)... 6.417+/-0.1529 ms
[71/260] image_function_neon::Maximum (512x512)... 1.073+/-0.09486 ms
[72/260] image_function_neon::Maximum (256x256)... 0.2484+/-0.02032 ms
[73/260] function_pool::Maximum (2048x2048)... 84.63+/-2.365 ms
[74/260] function_pool::Maximum (1024x1024)... 16.15+/-1.066 ms
[75/260] function_pool::Maximum (512x512)... 2.424+/-0.6867 ms
[76/260] function_pool::Maximum (256x256)... 0.7402+/-0.2282 ms
[77/260] image_function::Maximum (2048x2048)... 34.92+/-0.4263 ms
[78/260] image_function::Maximum (1024x1024)... 6.507+/-0.186 ms
[79/260] image_function::Maximum (512x512)... 1.091+/-0.1074 ms
[80/260] image_function::Maximum (256x256)... 0.2546+/-0.02404 ms
[81/260] image_function_neon::Minimum (2048x2048)... 34.83+/-0.3438 ms
[82/260] image_function_neon::Minimum (1024x1024)... 6.485+/-0.2056 ms
[83/260] image_function_neon::Minimum (512x512)... 1.079+/-0.09349 ms
[84/260] image_function_neon::Minimum (256x256)... 0.2524+/-0.02601 ms
[85/260] function_pool::Minimum (2048x2048)... 84.68+/-2.176 ms
[86/260] function_pool::Minimum (1024x1024)... 16.46+/-2.329 ms
[87/260] function_pool::Minimum (512x512)... 2.193+/-0.4238 ms
[88/260] function_pool::Minimum (256x256)... 0.7748+/-0.2149 ms
[89/260] image_function::Minimum (2048x2048)... 34.94+/-0.3222 ms
[90/260] image_function::Minimum (1024x1024)... 6.471+/-0.1734 ms
[91/260] image_function::Minimum (512x512)... 1.1+/-0.09436 ms
[92/260] image_function::Minimum (256x256)... 0.257+/-0.02698 ms
[93/260] image_function_neon::Subtract (2048x2048)... 35.05+/-0.3241 ms
[94/260] image_function_neon::Subtract (1024x1024)... 6.693+/-0.6711 ms
[95/260] image_function_neon::Subtract (512x512)... 1.119+/-0.0918 ms
[96/260] image_function_neon::Subtract (256x256)... 0.262+/-0.02518 ms
[97/260] function_pool::Subtract (2048x2048)... 86.39+/-3.799 ms
[98/260] function_pool::Subtract (1024x1024)... 16.06+/-1.957 ms
[99/260] function_pool::Subtract (512x512)... 2.257+/-0.4887 ms
[100/260] function_pool::Subtract (256x256)... 0.7776+/-0.2513 ms
[101/260] image_function::Subtract (2048x2048)... 34.7+/-0.2787 ms
[102/260] image_function::Subtract (1024x1024)... 6.489+/-0.1655 ms
[103/260] image_function::Subtract (512x512)... 1.096+/-0.0755 ms
[104/260] image_function::Subtract (256x256)... 0.262+/-0.02406 ms
[105/260] image_function::Flip (256x256)... 0.06764+/-0.004992 ms
[106/260] image_function::Flip (512x512)... 0.2752+/-0.0363 ms
[107/260] image_function::Flip (1024x1024)... 1.175+/-0.08554 ms
[108/260] image_function::Flip (2048x2048)... 8.104+/-0.2728 ms
[109/260] image_function_neon::Flip (256x256)... 0.08263+/-0.004746 ms
[110/260] image_function_neon::Flip (512x512)... 0.3398+/-0.02819 ms
[111/260] image_function_neon::Flip (1024x1024)... 1.428+/-0.07633 ms
[112/260] image_function_neon::Flip (2048x2048)... 8.281+/-0.2228 ms
[113/260] image_function::GammaCorrection (256x256)... 0.6453+/-0.02275 ms
[114/260] image_function::GammaCorrection (512x512)... 2.331+/-0.08623 ms
[115/260] image_function::GammaCorrection (1024x1024)... 9.294+/-0.3946 ms
[116/260] image_function::GammaCorrection (2048x2048)... 41.54+/-0.5124 ms
[117/260] function_pool::GammaCorrection (256x256)... 0.5872+/-0.1739 ms
[118/260] function_pool::GammaCorrection (512x512)... 1.297+/-0.2463 ms
[119/260] function_pool::GammaCorrection (1024x1024)... 6.926+/-0.4417 ms
[120/260] function_pool::GammaCorrection (2048x2048)... 35.29+/-1.289 ms
[121/260] image_function::Invert (256x256)... 0.07971+/-0.007002 ms
[122/260] image_function::Invert (512x512)... 0.3575+/-0.05601 ms
[123/260] image_function::Invert (1024x1024)... 1.58+/-0.1001 ms
[124/260] image_function::Invert (2048x2048)... 11.03+/-0.1368 ms
[125/260] function_pool::Invert (256x256)... 0.326+/-0.1583 ms
[126/260] function_pool::Invert (512x512)... 0.6067+/-0.2185 ms
[127/260] function_pool::Invert (1024x1024)... 4.335+/-0.4591 ms
[128/260] function_pool::Invert (2048x2048)... 22.81+/-0.6107 ms
[129/260] image_function_neon::Invert (256x256)... 0.07806+/-0.008335 ms
[130/260] image_function_neon::Invert (512x512)... 0.3462+/-0.05722 ms
[131/260] image_function_neon::Invert (1024x1024)... 1.552+/-0.1238 ms
[132/260] image_function_neon::Invert (2048x2048)... 11.02+/-0.1689 ms
[133/260] image_function::Transpose (2048x2048)... 178.7+/-4.889 ms
[134/260] image_function::Transpose (1024x1024)... 25.01+/-0.54 ms
[135/260] image_function::Transpose (512x512)... 3.237+/-0.07739 ms
[136/260] image_function::Transpose (256x256)... 0.6578+/-0.01062 ms
[137/260] image_function::RgbToBgr (256x256)... 0.3357+/-0.03925 ms
[138/260] image_function::RgbToBgr (512x512)... 1.468+/-0.11 ms
[139/260] image_function::RgbToBgr (1024x1024)... 9.47+/-0.2005 ms
[140/260] image_function::RgbToBgr (2048x2048)... 43.82+/-0.397 ms
[141/260] function_pool::RgbToBgr (256x256)... 0.6314+/-0.2791 ms
[142/260] function_pool::RgbToBgr (512x512)... 3.208+/-0.3878 ms
[143/260] function_pool::RgbToBgr (1024x1024)... 21.68+/-1.054 ms
[144/260] function_pool::RgbToBgr (2048x2048)... 93.05+/-5.086 ms
[145/260] image_function_neon::RgbToBgr (256x256)... 0.5476+/-0.04025 ms
[146/260] image_function_neon::RgbToBgr (512x512)... 2.343+/-0.1327 ms
[147/260] image_function_neon::RgbToBgr (1024x1024)... 13.93+/-0.1758 ms
[148/260] image_function_neon::RgbToBgr (2048x2048)... 64.03+/-0.3407 ms
[149/260] image_function::Threshold (256x256)... 0.1175+/-0.006856 ms
[150/260] image_function::Threshold (512x512)... 0.4933+/-0.05341 ms
[151/260] image_function::Threshold (1024x1024)... 2.125+/-0.124 ms
[152/260] image_function::Threshold (2048x2048)... 13.77+/-0.6267 ms
[153/260] function_pool::Threshold (256x256)... 0.3876+/-0.1738 ms
[154/260] function_pool::Threshold (512x512)... 0.6913+/-0.2765 ms
[155/260] function_pool::Threshold (1024x1024)... 5.167+/-0.3892 ms
[156/260] function_pool::Threshold (2048x2048)... 28.9+/-1.034 ms
[157/260] image_function_neon::Threshold (256x256)... 0.1007+/-0.008197 ms
[158/260] image_function_neon::Threshold (512x512)... 0.4291+/-0.05646 ms
[159/260] image_function_neon::Threshold (1024x1024)... 1.917+/-0.1269 ms
[160/260] image_function_neon::Threshold (2048x2048)... 13.57+/-0.1885 ms
[161/260] image_function::ThresholdDouble (256x256)... 0.1273+/-0.008935 ms
[162/260] image_function::ThresholdDouble (512x512)... 0.5273+/-0.05674 ms
[163/260] image_function::ThresholdDouble (1024x1024)... 2.279+/-0.1325 ms
[164/260] image_function::ThresholdDouble (2048x2048)... 14.21+/-0.7554 ms
[165/260] function_pool::ThresholdDouble (256x256)... 0.388+/-0.1945 ms
[166/260] function_pool::ThresholdDouble (512x512)... 0.7134+/-0.2996 ms
[167/260] function_pool::ThresholdDouble (1024x1024)... 5.171+/-0.4106 ms
[168/260] function_pool::ThresholdDouble (2048x2048)... 28.94+/-0.8835 ms
[169/260] image_function_neon::ThresholdDouble (256x256)... 0.109+/-0.007184 ms
[170/260] image_function_neon::ThresholdDouble (512x512)... 0.4618+/-0.05223 ms
[171/260] image_function_neon::ThresholdDouble (1024x1024)... 2.031+/-0.1229 ms
[172/260] image_function_neon::ThresholdDouble (2048x2048)... 13.6+/-0.1625 ms
[173/260] image_function::LookupTable (256x256)... 0.5225+/-0.01566 ms
[174/260] image_function::LookupTable (512x512)... 2.121+/-0.0743 ms
[175/260] image_function::LookupTable (1024x1024)... 8.711+/-0.2462 ms
[176/260] image_function::LookupTable (2048x2048)... 38.86+/-0.6056 ms
[177/260] function_pool::LookupTable (256x256)... 0.4449+/-0.1668 ms
[178/260] function_pool::LookupTable (512x512)... 1.064+/-0.2321 ms
[179/260] function_pool::LookupTable (1024x1024)... 5.793+/-0.4065 ms
[180/260] function_pool::LookupTable (2048x2048)... 29.01+/-0.655 ms
[181/260] image_function::Accumulate (256x256)... 0.1164+/-0.003436 ms
[182/260] image_function::Accumulate (512x512)... 0.4904+/-0.0186 ms
[183/260] image_function::Accumulate (1024x1024)... 2.459+/-0.1153 ms
[184/260] image_function::Accumulate (2048x2048)... 14.25+/-0.1516 ms
[185/260] image_function_neon::Accumulate (256x256)... 0.1134+/-0.003251 ms
[186/260] image_function_neon::Accumulate (512x512)... 0.4761+/-0.006319 ms
[187/260] image_function_neon::Accumulate (1024x1024)... 2.417+/-0.09717 ms
[188/260] image_function_neon::Accumulate (2048x2048)... 14.21+/-0.1533 ms
[189/260] image_function::ConvertToGrayScale (256x256)... 0.2096+/-0.002201 ms
[190/260] image_function::ConvertToGrayScale (512x512)... 0.8587+/-0.03032 ms
[191/260] image_function::ConvertToGrayScale (1024x1024)... 3.521+/-0.1312 ms
[192/260] image_function::ConvertToGrayScale (2048x2048)... 15.6+/-2.737 ms
[193/260] function_pool::ConvertToGrayScale (256x256)... 0.2699+/-0.1129 ms
[194/260] function_pool::ConvertToGrayScale (512x512)... 0.5503+/-0.1665 ms
[195/260] function_pool::ConvertToGrayScale (1024x1024)... 2.045+/-0.3488 ms
[196/260] function_pool::ConvertToGrayScale (2048x2048)... 11.39+/-1.21 ms
[197/260] image_function::ConvertToRgb (256x256)... 0.3945+/-0.005277 ms
[198/260] image_function::ConvertToRgb (512x512)... 1.633+/-0.04227 ms
[199/260] image_function::ConvertToRgb (1024x1024)... 7.182+/-0.1851 ms
[200/260] image_function::ConvertToRgb (2048x2048)... 30.83+/-0.7008 ms
[201/260] function_pool::ConvertToRgb (256x256)... 0.3142+/-0.1605 ms
[202/260] function_pool::ConvertToRgb (512x512)... 1.016+/-0.3147 ms
[203/260] function_pool::ConvertToRgb (1024x1024)... 6.363+/-0.6309 ms
[204/260] function_pool::ConvertToRgb (2048x2048)... 26.53+/-1.991 ms
[205/260] image_function_neon::ConvertToRgb (256x256)... 0.1708+/-0.0105 ms
[206/260] image_function_neon::ConvertToRgb (512x512)... 0.7194+/-0.04992 ms
[207/260] image_function_neon::ConvertToRgb (1024x1024)... 3.759+/-0.1304 ms
[208/260] image_function_neon::ConvertToRgb (2048x2048)... 16.95+/-0.1775 ms
[209/260] image_function::Fill (256x256)... 0.007928+/-0.001133 ms
[210/260] image_function::Fill (512x512)... 0.03338+/-0.001469 ms
[211/260] image_function::Fill (1024x1024)... 0.135+/-0.002448 ms
[212/260] image_function::Fill (2048x2048)... 0.5476+/-0.01086 ms
[213/260] image_function::Histogram (256x256)... 0.4609+/-0.006528 ms
[214/260] image_function::Histogram (512x512)... 1.844+/-0.02853 ms
[215/260] image_function::Histogram (1024x1024)... 7.399+/-0.1199 ms
[216/260] image_function::Histogram (2048x2048)... 29.65+/-0.3645 ms
[217/260] function_pool::Histogram (256x256)... 0.2676+/-0.1251 ms
[218/260] function_pool::Histogram (512x512)... 0.6122+/-0.1219 ms
[219/260] function_pool::Histogram (1024x1024)... 2.025+/-0.1826 ms
[220/260] function_pool::Histogram (2048x2048)... 7.711+/-0.2735 ms
[221/260] image_function::ProjectionProfile (256x256)... 0.268+/-0.001927 ms
[222/260] image_function::ProjectionProfile (512x512)... 1.064+/-0.02345 ms
[223/260] image_function::ProjectionProfile (1024x1024)... 4.264+/-0.1138 ms
[224/260] image_function::ProjectionProfile (2048x2048)... 17.05+/-0.2892 ms
[225/260] function_pool::ProjectionProfile (256x256)... 0.1447+/-0.09231 ms
[226/260] function_pool::ProjectionProfile (512x512)... 0.221+/-0.07707 ms
[227/260] function_pool::ProjectionProfile (1024x1024)... 0.4685+/-0.1112 ms
[228/260] function_pool::ProjectionProfile (2048x2048)... 1.449+/-0.1808 ms
[229/260] image_function_neon::ProjectionProfile (256x256)... 0.08125+/-0.001629 ms
[230/260] image_function_neon::ProjectionProfile (512x512)... 0.3116+/-0.01195 ms
[231/260] image_function_neon::ProjectionProfile (1024x1024)... 1.231+/-0.05248 ms
[232/260] image_function_neon::ProjectionProfile (2048x2048)... 4.921+/-0.1217 ms
[233/260] image_function::ResizeDown (256x256)... 0.1258+/-0.00173 ms
[234/260] image_function::ResizeDown (512x512)... 0.5145+/-0.01864 ms
[235/260] image_function::ResizeDown (1024x1024)... 2.084+/-0.04247 ms
[236/260] image_function::ResizeDown (2048x2048)... 8.713+/-0.1194 ms
[237/260] function_pool::ResizeDown (256x256)... 0.1762+/-0.1059 ms
[238/260] function_pool::ResizeDown (512x512)... 0.3267+/-0.142 ms
[239/260] function_pool::ResizeDown (1024x1024)... 0.8051+/-0.1826 ms
[240/260] function_pool::ResizeDown (2048x2048)... 3.077+/-0.3038 ms
[241/260] image_function::ResizeUp (256x256)... 1.946+/-0.02992 ms
[242/260] image_function::ResizeUp (512x512)... 7.823+/-0.108 ms
[243/260] image_function::ResizeUp (1024x1024)... 33.14+/-0.2219 ms
[244/260] image_function::ResizeUp (2048x2048)... 135.8+/-0.372 ms
[245/260] function_pool::ResizeUp (256x256)... 0.7444+/-0.2005 ms
[246/260] function_pool::ResizeUp (512x512)... 2.861+/-0.2893 ms
[247/260] function_pool::ResizeUp (1024x1024)... 12.72+/-0.3165 ms
[248/260] function_pool::ResizeUp (2048x2048)... 51.94+/-0.5059 ms
[249/260] image_function::Sum (256x256)... 0.06704+/-0.001588 ms
[250/260] image_function::Sum (512x512)... 0.2664+/-0.004164 ms
[251/260] image_function::Sum (1024x1024)... 1.074+/-0.02653 ms
[252/260] image_function::Sum (2048x2048)... 4.358+/-0.1281 ms
[253/260] function_pool::Sum (256x256)... 0.1299+/-0.1002 ms
[254/260] function_pool::Sum (512x512)... 0.2085+/-0.08905 ms
[255/260] function_pool::Sum (1024x1024)... 0.3873+/-0.09999 ms
[256/260] function_pool::Sum (2048x2048)... 1.125+/-0.1451 ms
[257/260] image_function_neon::Sum (256x256)... 0.06125+/-2.437e-05 ms
[258/260] image_function_neon::Sum (512x512)... 0.2441+/-0.0001093 ms
[259/260] image_function_neon::Sum (1024x1024)... 0.9777+/-0.01123 ms
[260/260] image_function_neon::Sum (2048x2048)... 3.917+/-0.07282 ms

Reading the outputs side by side, performance changes can be seen with the option change. However, the original function of interest, blob_detection, does not show much improvement with the -O3 setting (644.1+/-4.882 ms to 659.2+/-8.414 ms on 2048 x 2048 images), not a good sign for the original plan. However, scrolling down the list, other functions seemed to improve a lot with the change, the first one noted being the AbsoluteDifference function. So I modify the test to only test the AbsoluteDifference function across the multiple implementations.

Absolute Difference: O2
[1/12] image_function::AbsoluteDifference (256x256)... 0.7432+/-0.02591 ms
[2/12] image_function::AbsoluteDifference (512x512)... 3.043+/-0.09214 ms
[3/12] image_function::AbsoluteDifference (1024x1024)... 14.29+/-0.1658 ms
[4/12] image_function::AbsoluteDifference (2048x2048)... 66.46+/-0.3282 ms
[5/12] function_pool::AbsoluteDifference (256x256)... 0.7936+/-0.282 ms
[6/12] function_pool::AbsoluteDifference (512x512)... 2.288+/-0.4593 ms
[7/12] function_pool::AbsoluteDifference (1024x1024)... 14.9+/-0.8164 ms
[8/12] function_pool::AbsoluteDifference (2048x2048)... 131.5+/-7.993 ms
[9/12] image_function_neon::AbsoluteDifference (256x256)... 0.2566+/-0.0231 ms
[10/12] image_function_neon::AbsoluteDifference (512x512)... 1.083+/-0.0618 ms
[11/12] image_function_neon::AbsoluteDifference (1024x1024)... 6.341+/-0.147 ms
[12/12] image_function_neon::AbsoluteDifference (2048x2048)... 79.96+/-2.067 ms
Absolute Difference: O3
[1/12] image_function::AbsoluteDifference (256x256)... 0.2604+/-0.02014 ms
[2/12] image_function::AbsoluteDifference (512x512)... 1.106+/-0.08591 ms
[3/12] image_function::AbsoluteDifference (1024x1024)... 6.369+/-0.147 ms
[4/12] image_function::AbsoluteDifference (2048x2048)... 34.7+/-0.2659 ms
[5/12] function_pool::AbsoluteDifference (256x256)... 0.7684+/-0.2098 ms
[6/12] function_pool::AbsoluteDifference (512x512)... 2.2+/-0.3969 ms
[7/12] function_pool::AbsoluteDifference (1024x1024)... 14.57+/-0.6044 ms
[8/12] function_pool::AbsoluteDifference (2048x2048)... 96.27+/-7.903 ms
[9/12] image_function_neon::AbsoluteDifference (256x256)... 0.2551+/-0.02185 ms
[10/12] image_function_neon::AbsoluteDifference (512x512)... 1.084+/-0.08416 ms
[11/12] image_function_neon::AbsoluteDifference (1024x1024)... 6.311+/-0.1501 ms
[12/12] image_function_neon::AbsoluteDifference (2048x2048)... 49.26+/-0.3417 ms

The difference in performance is significant for most situations. image_function::AbsoluteDifference performs better all around under O3 and the function_pool implementation and image_function_neon only improve with a 2048×2048 image. It should also be noted that the image_function and image_function_neon function have similar performances with 256×256, 512×512, and 1024×1024 images under -O3, which may be a hint to their implementation. Either way, it looks like image_function::AbsoluteDifference benefits from -O3 optimization setting. The following table shows the benchmarks for all the image_function functions:

Image Function Results: O2 vs O3
image_functionO2O3
AbsoluteDifference (256×256)0.7547+/-0.02822 ms0.2628+/-0.02898 ms
AbsoluteDifference (512×512)3.073+/-0.09532 ms1.105+/-0.09343 ms
AbsoluteDifference (1024×1024)14.36+/-0.2335 ms6.447+/-0.1902 ms
AbsoluteDifference (2048×2048)66.74+/-0.4498 ms34.91+/-0.3634 ms
BitwiseAnd (2048×2048)58.39+/-0.2769 ms34.35+/-0.287 ms
BitwiseAnd (1024×1024)12.29+/-0.2585 ms6.3+/-0.1823 ms
BitwiseAnd (512×512)2.555+/-0.09601 ms1.05+/-0.101 ms
BitwiseAnd (256×256)0.6288+/-0.02895 ms0.248+/-0.02753 ms
BitwiseOr (2048×2048)58.41+/-0.4194 ms34.35+/-0.2473 ms
BitwiseOr (1024×1024)12.27+/-0.1418 ms6.316+/-0.2002 ms
BitwiseOr (512×512)2.558+/-0.1022 ms1.05+/-0.08623 ms
BitwiseOr (256×256)0.6287+/-0.02762 ms0.248+/-0.02866 ms
BitwiseXor (2048×2048)58.41+/-0.3581 ms34.57+/-0.9866 ms
BitwiseXor (1024×1024)12.28+/-0.1501 ms6.322+/-0.231 ms
BitwiseXor (512×512)2.554+/-0.08971 ms1.052+/-0.09256 ms
BitwiseXor (256×256)0.6294+/-0.03089 ms0.2478+/-0.0269 ms
Maximum (2048×2048)62.53+/-0.3113 ms34.43+/-0.3139 ms
Maximum (1024×1024)13.3+/-0.167 ms6.316+/-0.2094 ms
Maximum (512×512)2.808+/-0.08771 ms1.063+/-0.09892 ms
Maximum (256×256)0.6894+/-0.02952 ms0.2508+/-0.02655 ms
Minimum (2048×2048)62.69+/-0.8282 ms34.42+/-0.2789 ms
Minimum (1024×1024)13.3+/-0.1638 ms6.328+/-0.2024 ms
Minimum (512×512)2.811+/-0.09551 ms1.061+/-0.1047 ms
Minimum (256×256)0.6898+/-0.02663 ms0.2518+/-0.02722 ms
Subtract (2048×2048)66.7+/-0.2856 ms34.53+/-0.236 ms
Subtract (1024×1024)14.45+/-0.1648 ms6.425+/-0.1894 ms
Subtract (512×512)3.071+/-0.1031 ms1.077+/-0.08148 ms
Subtract (256×256)0.7564+/-0.02617 ms0.2555+/-0.02402 ms
Flip (256×256)0.2403+/-0.003232 ms0.0693+/-0.004127 ms
Flip (512×512)0.9657+/-0.02498 ms0.2759+/-0.03079 ms
Flip (1024×1024)3.946+/-0.08853 ms1.149+/-0.09605 ms
Flip (2048×2048)18.32+/-0.2539 ms8.061+/-0.2251 ms
GammaCorrection (256×256)0.6516+/-0.01782 ms0.6492+/-0.01845 ms
GammaCorrection (512×512)2.311+/-0.05578 ms2.318+/-0.07228 ms
GammaCorrection (1024×1024)9.101+/-0.1138 ms9.12+/-0.1745 ms
GammaCorrection (2048×2048)41.47+/-0.2421 ms41.45+/-0.2185 ms
Invert (256×256)0.3891+/-0.01155 ms0.08033+/-0.005569 ms
Invert (512×512)1.581+/-0.05507 ms0.3509+/-0.04334 ms
Invert (1024×1024)6.492+/-0.1138 ms1.546+/-0.08287 ms
Invert (2048×2048)30.17+/-0.3121 ms10.97+/-0.1356 ms
Transpose (2048×2048)232.1+/-8.996 ms192+/-0.7423 ms
Transpose (1024×1024)32.15+/-0.6566 ms25.94+/-0.4277 ms
Transpose (512×512)4.439+/-0.09429 ms3.604+/-0.0901 ms
Transpose (256×256)0.5746+/-0.01322 ms0.6557+/-0.009664 ms
RgbToBgr (256×256)0.7087+/-0.03707 ms0.3282+/-0.02997 ms
RgbToBgr (512×512)2.976+/-0.1064 ms1.439+/-0.08078 ms
RgbToBgr (1024×1024)15.71+/-0.3244 ms9.438+/-0.2287 ms
RgbToBgr (2048×2048)114.9+/-0.5135 ms89.28+/-1.065 ms
Threshold (256×256)0.4726+/-0.0115 ms0.1158+/-0.006655 ms
Threshold (512×512)1.918+/-0.06233 ms0.4895+/-0.04314 ms
Threshold (1024×1024)7.885+/-0.1455 ms2.135+/-0.1368 ms
Threshold (2048×2048)36.97+/-0.2089 ms13.68+/-0.1948 ms
ThresholdDouble (256×256)0.4728+/-0.01445 ms0.1245+/-0.006697 ms
ThresholdDouble (512×512)1.918+/-0.05377 ms0.5228+/-0.04818 ms
ThresholdDouble (1024×1024)7.886+/-0.1346 ms2.258+/-0.1376 ms
ThresholdDouble (2048×2048)36.96+/-0.2169 ms14.22+/-1.116 ms
LookupTable (256×256)0.522+/-0.01216 ms0.5192+/-0.01289 ms
LookupTable (512×512)2.116+/-0.06579 ms2.117+/-0.0619 ms
LookupTable (1024×1024)8.63+/-0.1194 ms8.627+/-0.1335 ms
LookupTable (2048×2048)38.76+/-0.2569 ms38.73+/-0.23 ms
Accumulate (256×256)0.4052+/-0.002275 ms0.1165+/-0.003581 ms
Accumulate (512×512)1.647+/-0.03184 ms0.493+/-0.02187 ms
Accumulate (1024×1024)6.893+/-0.1776 ms2.464+/-0.111 ms
Accumulate (2048×2048)29.15+/-0.205 ms14.46+/-0.1871 ms
ConvertToGrayScale (256×256)0.8163+/-0.01895 ms0.2101+/-0.003119 ms
ConvertToGrayScale (512×512)3.291+/-0.06637 ms0.8535+/-0.024 ms
ConvertToGrayScale (1024×1024)13.39+/-0.1378 ms3.522+/-0.1276 ms
ConvertToGrayScale (2048×2048)55.5+/-0.2327 ms15.29+/-0.2111 ms
ConvertToRgb (256×256)0.3944+/-0.01489 ms0.3942+/-0.002858 ms
ConvertToRgb (512×512)1.628+/-0.03552 ms1.629+/-0.03845 ms
ConvertToRgb (1024×1024)7.148+/-0.1837 ms7.151+/-0.2058 ms
ConvertToRgb (2048×2048)30.82+/-0.7943 ms30.72+/-0.8727 ms
Fill (256×256)0.008156+/-0.001075 ms0.008331+/-0.001006 ms
Fill (512×512)0.03429+/-0.001251 ms0.03431+/-0.0009215 ms
Fill (1024×1024)0.1393+/-0.001923 ms0.1389+/-0.001516 ms
Fill (2048×2048)0.5644+/-0.01286 ms0.561+/-0.01111 ms
Histogram (256×256)0.4611+/-0.006191 ms0.46+/-0.001054 ms
Histogram (512×512)1.844+/-0.02686 ms1.843+/-0.02369 ms
Histogram (1024×1024)7.397+/-0.1191 ms7.395+/-0.1231 ms
Histogram (2048×2048)29.66+/-0.3616 ms29.64+/-0.3412 ms
ProjectionProfile (256×256)0.2682+/-0.001328 ms0.2677+/-0.001331 ms
ProjectionProfile (512×512)1.065+/-0.02151 ms1.064+/-0.01968 ms
ProjectionProfile (1024×1024)4.258+/-0.1016 ms4.258+/-0.1087 ms
ProjectionProfile (2048×2048)17.03+/-0.2428 ms17.06+/-0.3503 ms
ResizeDown (256×256)0.126+/-0.001329 ms0.1257+/-0.001296 ms
ResizeDown (512×512)0.5105+/-0.01992 ms0.5115+/-0.01934 ms
ResizeDown (1024×1024)2.073+/-0.06725 ms2.071+/-0.06386 ms
ResizeDown (2048×2048)8.708+/-0.1344 ms8.705+/-0.1343 ms
ResizeUp (256×256)1.946+/-0.03315 ms1.945+/-0.02984 ms
ResizeUp (512×512)7.807+/-0.08913 ms7.806+/-0.09767 ms
ResizeUp (1024×1024)32.97+/-0.214 ms33.02+/-0.1861 ms
ResizeUp (2048×2048)135.7+/-0.2605 ms135.9+/-0.337 ms
Sum (256×256)0.2561+/-0.05608 ms0.06664+/-0.001315 ms
Sum (512×512)1.017+/-0.2418 ms0.2662+/-0.002786 ms
Sum (1024×1024)3.972+/-1.098 ms1.074+/-0.03069 ms
Sum (2048×2048)13.96+/-2.55 ms4.362+/-0.1423 ms

Looks like O3 just straight up benefits the image_function module. Next time I will look at its implementation to see why this might be the case and if any documentation has been made of this finding.

Note to Self: Don’t Read Foreign Code When Unwell

So after two weeks of going in and out of the hospital, I can finally say I’m sick of sitting in waiting rooms and trying to work on my laptop without a table. My newfound appreciation for my home workstation has reinvigorated me with energy to continue working on my school assignment.

In a previous post, I indicated my interest in working with Google’s HighwayHash repo and finding a section of the code to optimize. Although I indicated that I would work with penguinV in the subsequent post and even isolated the blob_detection function as my function of interest, I went back looking into HighwayHash source code at the hospital. It really was an interesting project that I wished to leave a contribution.

I hoped to find a simple block of code to work on with HighwayHash, but my god the source code was actually difficult to follow; my depth of knowledge for C++ is shallow at best and the algorithms they used took time to understand. Now I don’t want to make excuses here but I would like to blame my multiple 5+ hour waits in the hospital in the middle of the night. Honestly, hospitals are terrible working environments. Over the course of the week, I was able to figure how to compile the program on a Aarch64 machine (I was missing a declaration for the makefile). However, I wasn’t able to determine what to work on and spending a week on the search seemed like a reasonable sign to give up and stick to the game plan. Therefore, I will continue looking into penguinV, which should have been my original course of action. At least now I have no regrets about abandoning HighwayHash, because I tried and failed.

In my next post, I will post my progress with penguinV.

Note to Self: AngularJS Isn’t Angular

So I continued working in the open source community and I decided to make further contributions to a previous repo I worked on, pagermon. Last time, I added some additional settings to the webpage that allowed for the toggling of each column in the pager table. My PR also added the ability to change the column names. The owners liked my previous PR but will not be merging the feature until they figure out how to fix the stylesheet so that it is consistent with my changes. This time, I took on an issue that required me to make a pop up modal for the messages table so that phones may have an easier time reading messages.

The feature is just the implementation of a modal in AngularJS. Note the keywords in the previous sentence: modal and AngularJS. It’s been a while since I have touched Angular and I completely forgot that AngularJS and Angular are two different things. So for the first iteration of the feature I wrote the code in angular and was honestly stumped when I couldn’t compile the code properly. Took me like an hour before I realized the code was written in AngularJS. Honestly, I could have saved a lot more time if I just read the documentation for the project, as it states it uses AngularJS. The sources code was also a big red flag as the syntax for Angular and AngularJS are different. But in my defence, I was lacking sleep and food as I am just a poor student.

When I figured out I was using the wrong framework, implementing the feature became a lot easier. Adding new components in AngularJS is quite simple, especially since the pagermon utilitzes Angular directives for Boostrap which provides base code that I can easily copy and modify in the documentation. The feature turned out nicely and the owners liked it, which made me quite satisfied.

penguinV (S1): Benchmark and Plan

Last time I looked into Google’s HighwayHash as my target repo but failed to get the benchmark working. This timeMoving away from hashes, I looked for some image processing libraries, specifically a small to medium size project that would be simple to follow. Many Github searches later, I find a project that fits my criteria. PenguinV is ” a simple and fast C++ image processing library with focus on heterogeneous systems”. The project is easy to follow and looks like there are areas to which I can improve.

The function I am hoping to improve is the BlobDetection::find function, which searches for all the possible blobs in an image. The package includes a built-in benchmark to test BlobDetection functionality so I will profile that program first. Compiling the code takes a while but execution takes about a second. The input and output are as follows:

Looking into the C++ file, I try to modify the code to use a larger Bitmap image to increase the run time but there wasn’t a noticeable increase in time.

Time

Using the time command, I get the following average times after 20 runs:

real    0m0.084s
user    0m0.062s
sys     0m0.015s
Perf / Gprof

Using perf, I get the following report

# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 293  of event 'cycles:uppp'
# Event count (approx.): 60337088
#
# Overhead  Command          Shared Object           Symbol
# ........  ...............  ......................  ....................................................................................................
#
    27.52%  example_blob_de  libc-2.27.so            [.] _mcount@@GLIBC_2.18
    24.42%  example_blob_de  example_blob_detection  [.] Blob_Detection::BlobDetection::find
    14.80%  example_blob_de  example_blob_detection  [.] std::vector<unsigned int, std::allocator<unsigned int> >::emplace_back<unsigned int>
     6.25%  example_blob_de  libc-2.27.so            [.] __memcpy_generic
     5.19%  example_blob_de  example_blob_detection  [.] Image_Function::ConvertToGrayScale
     3.33%  example_blob_de  example_blob_detection  [.] Image_Function::Histogram
     2.87%  example_blob_de  example_blob_detection  [.] Image_Function::Threshold
     2.26%  example_blob_de  libc-2.27.so            [.] _int_malloc
     1.76%  example_blob_de  ld-2.27.so              [.] _dl_lookup_symbol_x
     1.32%  example_blob_de  libc-2.27.so            [.] cfree@GLIBC_2.17
     1.25%  example_blob_de  libc-2.27.so            [.] _int_free
     1.19%  example_blob_de  libc-2.27.so            [.] malloc
     1.16%  example_blob_de  ld-2.27.so              [.] do_lookup_x
     0.93%  example_blob_de  ld-2.27.so              [.] _dl_relocate_object
     0.74%  example_blob_de  libc-2.27.so            [.] __memmove_generic
     0.56%  example_blob_de  libc-2.27.so            [.] malloc_consolidate
     0.40%  example_blob_de  libstdc++.so.6.0.25     [.] operator new
     0.40%  example_blob_de  example_blob_detection  [.] _mcount@plt
     0.40%  example_blob_de  example_blob_detection  [.] std::vector<unsigned int, std::allocator<unsigned int> >::_M_realloc_insert<unsigned int const&>
     0.39%  example_blob_de  libstdc++.so.6.0.25     [.] std::use_facet<std::codecvt<char, char, __mbstate_t> >@plt
     0.38%  example_blob_de  example_blob_detection  [.] Image_Function::SetPixel
     0.37%  example_blob_de  example_blob_detection  [.] Blob_Detection::BlobInfo::contourY
     0.36%  example_blob_de  libstdc++.so.6.0.25     [.] 0x000000000008d6e8
     0.36%  example_blob_de  ld-2.27.so              [.] strcmp
     0.34%  example_blob_de  example_blob_detection  [.] main
     0.33%  example_blob_de  example_blob_detection  [.] Blob_Detection::BlobInfo::contourX
     0.28%  example_blob_de  ld-2.27.so              [.] _dl_fixup
     0.22%  example_blob_de  ld-2.27.so              [.] check_match
     0.09%  example_blob_de  [unknown]               [k] 0xffff000010096654
     0.05%  example_blob_de  ld-2.27.so              [.] _dl_load_cache_lookup

A lot of time is taken up by the BlobDetection::find, however more than a third of that time is used by the stdlib vector::emplace function. Annotating the Blob_Detection::BlobDetection::find line, I find that no single part of the takes up a majority of its time. When profiling with gprof, the report also shows similar results:

                0.01    0.01       1/1           Blob_Detection::BlobDetection::find(PenguinV_Image::ImageTemplate<unsigned char> const&, Blob_Detection::BlobParameters, unsigned char) [2]
[1]    100.0    0.01    0.01       1         Blob_Detection::BlobDetection::find(PenguinV_Image::ImageTemplate<unsigned char> const&, unsigned int, unsigned int, unsigned int, unsigned int, Blob_Detection::BlobParameters, unsigned char) [1]
                0.01    0.00  328018/328018      void std::vector<unsigned int, std::allocator<unsigned int> >::emplace_back<unsigned int>(unsigned int&&) [3]
                0.00    0.00    3022/3022        void std::vector<unsigned int, std::allocator<unsigned int> >::_M_realloc_insert<unsigned int>(__gnu_cxx::__normal_iterator<unsigned 

vector::emplace_back is called 328018 times, not exactly what I was hoping to see. Looks like outside the use of the vector functions, there isn’t an obvious hotspot for me to fix.

Plan of Approach

I will have to do a deeper inspection of the function to see if there’s any optimizations I can incorporate. However, one thing I will like to try would be to experiment with the g++ optimization flags. The following are the default flags:

-std=c++11 -Wall -Wextra -Wstrict-aliasing -Wpedantic -Wconversion -O2 -march=native

Since it’s using -O2, there might be some additional flags that will further increase performance.

Design a site like this with WordPress.com
Get started