• 热门专题

MaliOpenCLSDKv1.1.0教程样例之四图像对象

作者:  发布日期:2014-03-04 21:23:42
  • 介绍

      纹理(图像)是现代图像应用的很大一部分。正因为如此,图形硬件已经发展到允许高访问性能地对纹理进行访问和操作。为充分使用这一硬件,OpenCL包括了一个可选的图像数据类型。这些"图像对象"在所有Mali-T600系列GPU上受到支持。图像代表大型数据网格,可以并行地被处理。正应为如此,图像数据和图像操作通常非常适合在OpenCL中做加速。图像数据有两种方式可以被OpenCL存储和操作:缓冲区对象图像对象

    内存缓冲区

      内存缓冲区只是数据的普通数组。因为它们适合所有类型的数据(例如,图像,网格,线性阵列等),各种图像操作是困难的。

        > 为了在一个给定的坐标访问数据,你必须计算正确的数据偏移;

        > 你必须使用确切的坐标来访问你的数据,或者为归一化(或者其它)坐标实现你自己的访问方式;

        > 你也必须处理坐标在图像区域之外的情况;

        > 任何算法或优化通常根据所使用的图像格式固定,例如RGB888(如果你需要修改图像格式,算法/优化必须修改);

        > 图像滤波(如双线性过滤)必须手动完成。

    图像对象

      图像对象是一种特殊的内存类型,它使得对图像数据的工作更加容易。图像对象:

        > 支持直接通过坐标访问;

        > 支持归一化的坐标;

        > 处理超出范围的坐标(你可以从不同的处理方案选择);

        > 提供一个抽象图像格式(访问RGB888图像与访问RGB565图像是一样的);

        > 支持双线性过滤(通过硬件加速)。

    建议

      是否使用图像对象取决于应用。你必须考虑下列因素:

        > 为图像数据使用图像对象,简化了访问与操作数据的需要的代码;

        > 当使用图像对象时,在一个时钟周期里只能有一个像素被处理。当使用缓冲区时,如果你的图像格式是每通道少于32位的,你可以在每个时钟周期里处理多个像素

        例如,如果你的图像格式是RGB8888(每个像素是32位),使用缓冲区,你可以向量化你的算法,一次操作4个像素(32-bit * 4 = 128-bit,Mali-T600系列GPU推荐的图像宽度),但是对于图像对象,速度固定在一个时钟周期一个像素点。

        如果格式是每通道32位或更多,那么缓冲区的优势就没有了,因为两种方式都是一个时钟周期一个像素。例如,如果格式是RGBA32(每个像素128位),每个时钟周期只有一个像素可以被处理,因为一个像素填满了推荐的向量宽度。

        >在更复杂的情况最大的性能来自于整个系统的负载均衡。在Mali-T600系列GPU上,图像对象使用纹理流水线,这是独立于加载/存储和算术流水线的。因此,同时使用图像对象和缓冲区可能是有益的,以最大限度地利用该系统。

        例如,使用图像对象加载输入图像,然后在内存缓冲区加载数据来改变图像(例如,卷积滤波器)。

    图像缩放

      如何使用图像对象调整一幅图像的大小。

    双线性滤波

      OpenCL图像对象的特定好处之一是其内建的双线性滤波函数。当你从OpenCL图像对象读取时,可以获取四个最接近像素的平均值,而不是选择一个距离给定坐标最近的像素。这是一个硬件加速,在Mali-T600系列GPU上纹理流水线中。这意味着缩放的图像可以有更高的性能,并且功耗更低。我们将使用这个例子来提供一个关于如何使用OpenCL图像对象的演练。


    图1:一个最近像素(左)和双线性滤波(右)的例子

    图像对象和内存缓冲区的差异

      OpenCL的图像对象用法与OpenCL缓冲区几乎相同:

        > 它们都有类型cl_mem

        > 对于分配,clCreateBuffer成为clCreateImage2D(或clCreateImage3D);

        > 对于映射,clEnqueueMapBuffer成为clEnqueueMapImage;

        > 对于取消映射,clEnqueueUnmapBuffer对两种内存类型都工作。

      当使用图像对象时,最大的不同在于:

        > 图像对象需要一个"采样器",以便从采样器读取;

        > 内核不能对同一图像都可读可写(在内核定义时,图像参数必须标记为__read_only或__write_only);

        > 图像有一个已定义的数据格式

    采样器

      正如前面所讲,为了能够从一个图像对象中读取数据,你必须有一个采样器。采样器定义了:

        > 你是用的坐标是否是归一化

          >> 归一化的(在范围[0,1]中);

          >> 非归一化的。

        >坐标超出图像范围时使用的策略

          >> 不使用(你确保坐标在范围之内);

          >> 钳位到边缘(返回最接近有效像素的颜色);

          >> 钳位(返回由图像格式定义的边界颜色);

          >> 重复(好象有图像的无限复制平铺彼此相邻的行为);

          >> 镜像重复(同"重复"相同,除了在每个边缘处的坐标翻转);

        >过滤策略的使用

          >> 最近

          >> 双线性

      这些选项的某些组合受到限制

      采样器可使用clCreateSampler()在宿主机端定义,以参数的形式传递到内核,或者直接在内核中定义。将采样器作为一个参数传递给内核,可灵活地选用不同的采样选项来运行相同的内核。

    使用双线性滤波调整图像尺寸

      除非另作说明,否则所有代码片段均来自"image_scaling.cpp"。

      在样例代码中,我们将使用OpenCL调整一个输入图像的大小。图像在双线性滤波使能的情况下放大8倍数(代码中可调)。

    1. 为你的图像分配内存


      图像对象的分配几乎与缓冲区相同,主要的差别是必须指定所使用图像的数据格式。你可以使用printSupported2DImageFormats方法列出平台上可用的图像格式。

        /*
         * Specify the format of the image.
         * The bitmap image we are using is RGB888, which is not a supported OpenCL image format.
         * We will use RGBA8888 and add an empty alpha channel.
         */
        cl_image_format format;
        format.image_channel_data_type = CL_UNORM_INT8;
        format.image_channel_order = CL_RGBA;
        /* Allocate memory for the input image that can be accessed by the CPU and GPU. */
        bool createMemoryObjectsSuccess = true;
        memoryObjects[0] = clCreateImage2D(context, CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, &format, width, height, 0, NULL, &errorNumber);
        createMemoryObjectsSuccess &= checkSuccess(errorNumber);
        memoryObjects[1] = clCreateImage2D(context, CL_MEM_WRITE_ONLY | CL_MEM_ALLOC_HOST_PTR, &format, newWidth, newHeight, 0, NULL, &errorNumber);
        createMemoryObjectsSuccess &= checkSuccess(errorNumber);
        if (!createMemoryObjectsSuccess)
        {
            cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
            cerr << "Failed creating the image. " << __FILE__ << ":"<< __LINE__ << endl;
            return 1;
        }

    2. 映射内存到主机指针

      再次,这一步同映射一个缓冲区非常相似。

        /*
         * Like with memory buffers, we now map the allocated memory to a host side pointer.
         * Unlike buffers, we must specify origin coordinates, width and height for the region of the image we wish to map.
         */
        size_t origin[3] = {0, 0, 0};
        size_t region[3] = {width, height, 1};
        /*
         * clEnqueueMapImage also returns the rowPitch; the width of the mapped region in bytes.
         * If the image format is not known, this is required information when accessing the image object as a normal array.
         * The number of bytes per pixel can vary with the image format being used,
         * this affects the offset into the array for a given coordinate.
         * In our case the image format is fixed as RGBA8888 so we don't need to worry about the rowPitch.
         */
        size_t rowPitch;
        unsigned char* inputImageRGBA = (unsigned char*)clEnqueueMapImage(commandQueue,  memoryObjects[0], CL_TRUE, CL_MAP_WRITE, origin, region, &rowPitch, NULL, 0, NULL, NULL, &errorNumber);
        if (!checkSuccess(errorNumber))
        {
            cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
            cerr << "Failed mapping the input image. " << __FILE__ << ":"<< __LINE__ << endl;
            return 1;
        }

    3. 初始化内存

      使用主机端的指针用数据填充图像。

    4. 取消映射

      取消主机端指针的映射(像缓冲区那样使用clEnqueueUnmapBuffer),从而使数据可以在内核中被使用。

    5. 传递图像到内核

      像缓冲区那样,作为一个参数传递图像到内核。

    6. 在内核中使用图像

      在这一部分中,代码片段来自"image_scaling.cl"。

        a.定义采样器

    const sampler_t sampler = CLK_NORMALIZED_COORDS_TRUE | CLK_ADDRESS_CLAMP | CLK_FILTER_LINEAR;
        b.计算坐标

        /*
         * There is one kernel instance per pixel in the destination image.
         * The global id of this kernel instance is therefore a coordinate in the destination image.
         */
        int2 coordinate = (int2)(get_global_id(0), get_global_id(1));
        /*
         * That coordinate is only valid for the destination image.
         * If we normalize the coordinates to the range [0.0, 1.0] (using the height and width of the destination image),
         * we can use them as coordinates in the sourceImage.
         */
        float2 normalizedCoordinate = convert_float2(coordinate) * (float2)(widthNormalizationFactor, heightNormalizationFactor);
        c. 读源图像

        /*
         * Read colours from the source image.
         * The sampler determines how the coordinates are interpreted.
         * Because bilinear filtering is enabled, the value of colour will be the average of the 4 pixels closest to the coordinate.
         */
        float4 colour = read_imagef(sourceImage, sampler, normalizedCoordinate);
        d. 写目标图像
        /*
         * Write the colour to the destination image.
         * No sampler is used here as all writes must specify an exact valid pixel coordinate.
         */
        write_imagef(destinationImage, coordinate, colour);

    7. 获取返回值

      映射图像对象到一个主机端指针,读取结果。


    运行样例

      运行后,一个名为"output.bmp”的图像在板子上被创建,输出类似于:

    11 Image formats supported (channel order, channel data type):
    CL_RGBA, CL_UNORM_INT8
    CL_RGBA, CL_UNORM_INT16
    CL_RGBA, CL_SIGNED_INT8
    CL_RGBA, CL_SIGNED_INT16
    CL_RGBA, CL_SIGNED_INT32
    CL_RGBA, CL_UNSIGNED_INT8
    CL_RGBA, CL_UNSIGNED_INT16
    CL_RGBA, CL_UNSIGNED_INT32
    CL_RGBA, CL_HALF_FLOAT
    CL_RGBA, CL_FLOAT
    CL_BGRA, CL_UNORM_INT8
    Profiling information:
    Queued time:    0.092ms
    Wait time:      0.135206ms
    Run time:       31.5405ms

    附录1:内核源码

    /*
     * This confidential and proprietary software may be used only as
     * authorised by a licensing agreement from ARM Limited
     *    (C) COPYRIGHT 2013 ARM Limited
     *        ALL RIGHTS RESERVED
     * The entire notice above must be reproduced on all authorised
     * copies and copies may only be made to the extent permitted
     * by a licensing agreement from ARM Limited.
     */
    
    /* [Define a sampler] */
    const sampler_t sampler = CLK_NORMALIZED_COORDS_TRUE | CLK_ADDRESS_CLAMP | CLK_FILTER_LINEAR;
    /* [Define a sampler] */
    
    /**
     * \brief Image scaling kernel function.
     * \param[in] sourceImage Input image object.
     * \param[out] destinationImage Re-sized output image object.
     * \param[in] widthNormalizationFactor 1 / destinationImage width.
     * \param[in] heightNormalizationFactor 1 / destinationImage height.
     */
    __kernel void image_scaling(__read_only image2d_t sourceImage,
                                __write_only image2d_t destinationImage,
                                const float widthNormalizationFactor,
                                const float heightNormalizationFactor)
    {
        /*
         * It is possible to get the width and height of an image object (using get_image_width and get_image_height).
         * You could use this to calculate the normalization factors in the kernel.
         * In this case, because the width and height doesn't change for each kernel,
         * it is better to pass normalization factors to the kernel as parameters.
         * This way we do the calculations once on the host side instead of in every kernel.
         */
    
        /* [Calculate the coordinates] */
        /*
         * There is one kernel instance per pixel in the destination image.
         * The global id of this kernel instance is therefore a coordinate in the destination image.
         */
        int2 coordinate = (int2)(get_global_id(0), get_global_id(1));
    
        /*
         * That coordinate is only valid for the destination image.
         * If we normalize the coordinates to the range [0.0, 1.0] (using the height and width of the destination image),
         * we can use them as coordinates in the sourceImage.
         */
        float2 normalizedCoordinate = convert_float2(coordinate) * (float2)(widthNormalizationFactor, heightNormalizationFactor);
        /* [Calculate the coordinates] */
    
        /* [Read from the source image] */
        /*
         * Read colours from the source image.
         * The sampler determines how the coordinates are interpreted.
         * Because bilinear filtering is enabled, the value of colour will be the average of the 4 pixels closest to the coordinate.
         */
        float4 colour = read_imagef(sourceImage, sampler, normalizedCoordinate);
        /* [Read from the source image] */
    
        /* [Write to the destination image] */
        /*
         * Write the colour to the destination image.
         * No sampler is used here as all writes must specify an exact valid pixel coordinate.
         */
        write_imagef(destinationImage, coordinate, colour);
        /* [Write to the destination image] */
    }
    


    附录2:主机端源码

    /*
     * This confidential and proprietary software may be used only as
     * authorised by a licensing agreement from ARM Limited
     *    (C) COPYRIGHT 2013 ARM Limited
     *        ALL RIGHTS RESERVED
     * The entire notice above must be reproduced on all authorised
     * copies and copies may only be made to the extent permitted
     * by a licensing agreement from ARM Limited.
     */
    
    #include "common.h"
    #include "image.h"
    
    #include <CL/cl.h>
    #include <iostream>
    
    using namespace std;
    
    /**
     * \brief OpenCL image object sample code.
     * \details Demonstration of how to use OpenCL image objects to resize an image.
     * \return The exit code of the application, non-zero if a problem occurred.
     */
    int main(void)
    {
        cl_context context = 0;
        cl_command_queue commandQueue = 0;
        cl_program program = 0;
        cl_device_id device = 0;
        cl_kernel kernel = 0;
        const int numMemoryObjects = 2;
        cl_mem memoryObjects[numMemoryObjects] = {0, 0};
        cl_int errorNumber;
    
        /* Set up OpenCL environment: create context, command queue, program and kernel. */
        if (!createContext(&context))
        {
            cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
            cerr << "Failed to create an OpenCL context. " << __FILE__ << ":"<< __LINE__ << endl;
            return 1;
        }
    
        if (!createCommandQueue(context, &commandQueue, &device))
        {
            cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
            cerr << "Failed to create the OpenCL command queue. " << __FILE__ << ":"<< __LINE__ << endl;
            return 1;
        }
    
        if (!createProgram(context, device, "assets/image_scaling.cl", &program))
        {
            cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
            cerr << "Failed to create OpenCL program." << __FILE__ << ":"<< __LINE__ << endl;
            return 1;
        }
    
        kernel = clCreateKernel(program, "image_scaling", &errorNumber);
        if (!checkSuccess(errorNumber))
        {
            cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
            cerr << "Failed to create OpenCL kernel. " << __FILE__ << ":"<< __LINE__ << endl;
            return 1;
        }
    
        /* Print the image formats that the OpenCL device supports. */
        cout << endl;
        printSupported2DImageFormats(context);
        cout << endl;
    
        /* The scaling factor to use when resizing the image. */
        const int scaleFactor = 8;
    
        /* Load the input image data. */
        unsigned char* inputImage = NULL;
        int width, height;
        loadFromBitmap("assets/input.bmp", &width, &height, &inputImage);
    
        /*
         * Calculate the width and height of the new image.
         * Used to allocate the correct amount of output memory and the number of kernels to use.
         */
        int newWidth = width * scaleFactor;
        int newHeight = height * scaleFactor;
    
        /* [Allocate image objects] */
        /*
         * Specify the format of the image.
         * The bitmap image we are using is RGB888, which is not a supported OpenCL image format.
         * We will use RGBA8888 and add an empty alpha channel.
         */
        cl_image_format format;
        format.image_channel_data_type = CL_UNORM_INT8;
        format.image_channel_order = CL_RGBA;
    
        /* Allocate memory for the input image that can be accessed by the CPU and GPU. */
        bool createMemoryObjectsSuccess = true;
    
        memoryObjects[0] = clCreateImage2D(context, CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, &format, width, height, 0, NULL, &errorNumber);
        createMemoryObjectsSuccess &= checkSuccess(errorNumber);
    
        memoryObjects[1] = clCreateImage2D(context, CL_MEM_WRITE_ONLY | CL_MEM_ALLOC_HOST_PTR, &format, newWidth, newHeight, 0, NULL, &errorNumber);
        createMemoryObjectsSuccess &= checkSuccess(errorNumber);
    
        if (!createMemoryObjectsSuccess)
        {
            cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
            cerr << "Failed creating the image. " << __FILE__ << ":"<< __LINE__ << endl;
            return 1;
        }
        /* [Allocate image objects] */
    
        /* [Map image objects to host pointers] */
        /*
         * Like with memory buffers, we now map the allocated memory to a host side pointer.
         * Unlike buffers, we must specify origin coordinates, width and height for the region of the image we wish to map.
         */
        size_t origin[3] = {0, 0, 0};
        size_t region[3] = {width, height, 1};
    
        /*
         * clEnqueueMapImage also returns the rowPitch; the width of the mapped region in bytes.
         * If the image format is not known, this is required information when accessing the image object as a normal array.
         * The number of bytes per pixel can vary with the image format being used,
         * this affects the offset into the array for a given coordinate.
         * In our case the image format is fixed as RGBA8888 so we don't need to worry about the rowPitch.
         */
        size_t rowPitch;
    
        unsigned char* inputImageRGBA = (unsigned char*)clEnqueueMapImage(commandQueue,  memoryObjects[0], CL_TRUE, CL_MAP_WRITE, origin, region, &rowPitch, NULL, 0, NULL, NULL, &errorNumber);
        if (!checkSuccess(errorNumber))
        {
            cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
            cerr << "Failed mapping the input image. " << __FILE__ << ":"<< __LINE__ << endl;
            return 1;
        }
        /* [Map image objects to host pointers] */
    
        /* Convert the input data from RGB to RGBA (moves it to the OpenCL allocated memory at the same time). */
        RGBToRGBA(inputImage, inputImageRGBA, width, height);
        delete[] inputImage;
    
        /* Unmap the image from the host. */
        if (!checkSuccess(clEnqueueUnmapMemObject(commandQueue, memoryObjects[0], inputImageRGBA, 0, NULL, NULL)))
        {
            cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
            cerr << "Failed unmapping the input image. " << __FILE__ << ":"<< __LINE__ << endl;
            return 1;
        }
    
        /*
         * Calculate the normalization factor for the image coordinates.
         * By using normalized coordinates we don't have to manually map the destination coordinates to the source coordinates.
         */
        cl_float widthNormalizationFactor = 1.0f / newWidth;
        cl_float heightNormalizationFactor = 1.0f / newHeight;
    
        /* Setup the kernel arguments. */
        bool setKernelArgumentsSuccess = true;
        setKernelArgumentsSuccess &= checkSuccess(clSetKernelArg(kernel, 0, sizeof(cl_mem), &memoryObjects[0]));
        setKernelArgumentsSuccess &= checkSuccess(clSetKernelArg(kernel, 1, sizeof(cl_mem), &memoryObjects[1]));
        setKernelArgumentsSuccess &= checkSuccess(clSetKernelArg(kernel, 2, sizeof(cl_float), &widthNormalizationFactor));
        setKernelArgumentsSuccess &= checkSuccess(clSetKernelArg(kernel, 3, sizeof(cl_float), &heightNormalizationFactor));
        if (!setKernelArgumentsSuccess)
        {
            cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, 3);
            cerr << "Failed setting OpenCL kernel arguments. " << __FILE__ << ":"<< __LINE__ << endl;
            return 1;
        }
    
        /*
         * Set the kernel work size. Each kernel operates on one pixel of the ouput image.
         * Therefore, we need newWidth * newHeight kernel instances.
         * We are using two work dimensions because it maps nicely onto the coordinates of the image.
         * With one dimension we would have to derive the y coordinate from the x coordinate in the kernel.
         */
        const int workDimensions = 2;
        size_t globalWorkSize[workDimensions] = {newWidth, newHeight};
    
        /* An event to associate with the kernel. Allows us to retrieve profiling information later. */
        cl_event event = 0;
    
        /* Enqueue the kernel. */
        if (!checkSuccess(clEnqueueNDRangeKernel(commandQueue, kernel, workDimensions, NULL, globalWorkSize, NULL, 0, NULL, &event)))
        {
            cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
            cerr << "Failed enqueuing the kernel. " << __FILE__ << ":"<< __LINE__ << endl;
            return 1;
        }
    
        /* Wait for kernel execution completion. */
        if (!checkSuccess(clFinish(commandQueue)))
        {
            cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
            cerr << "Failed waiting for kernel execution to finish. " << __FILE__ << ":"<< __LINE__ << endl;
            return 1;
        }
    
        /* Print the profiling information for the event. */
        printProfilingInfo(event);
        /* Release the event object. */
        if (!checkSuccess(clReleaseEvent(event)))
        {
            cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
            cerr << "Failed releasing the event object. " << __FILE__ << ":"<< __LINE__ << endl;
            return 1;
        }
    
        size_t newRegion[3] = {newWidth, newHeight, 1};
    
        unsigned char* outputImage = (unsigned char*)clEnqueueMapImage(commandQueue,  memoryObjects[1], CL_TRUE, CL_MAP_READ, origin, newRegion, &rowPitch, NULL, 0, NULL, NULL, &errorNumber);
        if (!checkSuccess(errorNumber))
        {
            cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
            cerr << "Failed mapping the input image. " << __FILE__ << ":"<< __LINE__ << endl;
            return 1;
        }
    
        unsigned char* outputImageRGB = new unsigned char[newWidth * newHeight * 3];
        RGBAToRGB(outputImage, outputImageRGB, newWidth, newHeight);
    
        saveToBitmap("output.bmp", newWidth, newHeight, outputImageRGB);
    
        delete[] outputImageRGB;
    
        cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
    
    
        return 0;
    }
    


About IT165 - 广告服务 - 隐私声明 - 版权申明 - 免责条款 - 网站地图 - 网友投稿 - 联系方式
本站内容来自于互联网,仅供用于网络技术学习,学习中请遵循相关法律法规