Profiling Vpenguin

Last time, I benchmarked the Vpenguin functions. This time I will be profiling the Vpenguin functions with perf to see what are the hotspot functions. With the original compiler options on a Aarch64 machine and restricting the performance_test to only only Image_Function functions, I get the following report from perf:

As shown above, looks like Transpose and Resize an image takes the most time. Kind of makes sense since the code should redraw the image in both cases. Let’s focus on Transpose for now. Here’s the source code for the function.

void Transpose( const Image & in, uint32_t startXIn, uint32_t startYIn, Image & out, uint32_t startXOut, uint32_t startYOut,
                uint32_t width, uint32_t height )
{
    ParameterValidation( in, startXIn, startYIn, width, height );
    ParameterValidation( out, startXOut, startYOut, height, width );
    VerifyGrayScaleImage( in, out );

    const uint32_t rowSizeIn  = in.rowSize();
    const uint32_t rowSizeOut = out.rowSize();

    const uint8_t * inX  = in.data()  + startYIn  * rowSizeIn  + startXIn;
    uint8_t       * outY = out.data() + startYOut * rowSizeOut + startXOut;

    const uint8_t * outYEnd = outY + width * rowSizeOut;

    for( ; outY != outYEnd; outY += rowSizeOut, ++inX ) {
        const uint8_t * inY  = inX;
        uint8_t       * outX = outY;

        const uint8_t * outXEnd = outX + height;

        for( ; outX != outXEnd; ++outX, inY += rowSizeIn )
            (*outX) = *(inY);
    }
}

And here’s the annotation for the Transpose from perf:

As expected, storing the new value for the transposed image is taking a while. Honestly, not sure how else I would write this function to avoid this. I have a feeling the similar problem lies with Resize. Let’s have a look at the source code.

void Resize( const Image & in, uint32_t startXIn, uint32_t startYIn, uint32_t widthIn, uint32_t heightIn, Image & out, uint32_t startXOut, uint32_t startYOut, uint32_t widthOut, uint32_t heightOut )
{
    ParameterValidation( in, startXIn, startYIn, widthIn, heightIn );
    ParameterValidation( out, startXOut, startYOut, widthOut, heightOut );
    VerifyGrayScaleImage( in, out );

    const uint32_t rowSizeIn  = in.rowSize();
    const uint32_t rowSizeOut = out.rowSize();

    const uint8_t * inY  = in.data()  + startYIn  * rowSizeIn  + startXIn;
    uint8_t       * outY = out.data() + startYOut * rowSizeOut + startXOut;

    const uint8_t * outYEnd = outY + heightOut * rowSizeOut;

    uint32_t idY = 0;

    // Precalculation of X position
    std::vector < uint32_t > positionX( widthOut );
    for( uint32_t x = 0; x < widthOut; ++x )
        positionX[x] = x * widthIn / widthOut;

    for( ; outY != outYEnd; outY += rowSizeOut, ++idY ) {
        const uint8_t * inX  = inY + (idY * heightIn / heightOut) * rowSizeIn;
        uint8_t       * outX = outY;

        const uint8_t * outXEnd = outX + widthOut;

        const uint32_t * idX = positionX.data();

        for( ; outX != outXEnd; ++outX, ++idX )
            (*outX) = *(inX + (*idX));
    }
}

And now the profile.

The line thats creating a hotspot is very similar to the one found in Transpose. So once again, I’m stumped as to what I can do. I will look into possible solutions in my next post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create your website at WordPress.com
Get started
%d bloggers like this: