Running Renderscript efficiently with PowerVR GPUs on Android

12 June 2013
Imagination Technologies

Renderscript is a high performance compute API implemented in Android 4.2 (JELLY_BEAN_MR1), Google’s popular operating system for mobile and embedded devices. It allows developers to run a subset of C99 code across all available processor cores such as CPUs, GPUs or DSPs. Renderscript is particularly useful for applications that require parallel operations typically seen in algorithms linked to image processing, mathematical modeling, or any applications that require a great deal of mathematical computation. The Android SDK Manager provides access to a variety of Renderscript sample code.

We’ve recently run a variety of sample Renderscript and Filterscript code from Google, and found that a vast majority of their examples run perfectly well on PowerVR GPUs with no changes required. The very few that don’t either immediately fall back on the CPU or require a minor script update (optimizing the data type for a small number of variables).

What’s accelerated on PowerVR Series5XT GPUs

Camera applications such as smile and blink detection, high dynamic range (HDR), panoramic post processing and dynamic contrast enhancement are ideal candidates for acceleration using Renderscript. Voice enhancement algorithms are another key area of interest, with noise cancellation and beamforming being able to take advantage of Renderscript implementations.

A more detailed look at how Renderscript works and the software architecture it implements can be found on Android’s dedicated website. It includes useful diagrams and a short tutorial on creating a Renderscript application and calling scripts.

Helping developers access Renderscript efficiently on PowerVR GPUs

Developers can rely on Renderscript to automatically provide access to all available computational resources without having to worry about support for a certain processing architecture. What this essentially means is that the API provides a layer of abstraction for programs from the underlying instruction set of the CPU or the GPU. As Renderscript code is compiled on the device at runtime, non-parallel code will likely execute on the CPU while parallel code defined by ‘ForEach’ elements of each script will be considered for GPU acceleration.

rs_overview Renderscript system overview*

This is where things get more vendor-specific. The driver decides which computational resource to use on a per-script basis. With our advanced driver implementations, both PowerVR Series5/5XT and Series6 cores will attempt to employ GPU acceleration wherever possible and keep CPU fall-back always as an option, meaning that a majority of scripts that contain parallel code will always target the PowerVR engine.

All PowerVR cores are managed using a software firmware (MicroKernel) which controls all higher level events at the GPU level. This approach offers numerous advantages including full offloading of the main CPU host of virtually all interrupt handling while fully maintaining maximal flexibility where the GPU execution is based on a software-controlled firmware.

On PowerVR Series5/5XT cores, the microkernel firmware is executed on the USSE pipelines to ensure optimal silicon area utilization. PowerVR Series6 ‘Rogue’ cores move the microkernel execution to a dedicated C-programmable multi-threaded microcontroller which enables full debugging functionality of the GPU core.

This software-based management of the GPU core ensures that all PowerVR GPUs have the flexibility to adapt to present and future market requirements. All of Imagination’s GPUs are thus able to support not only current compute APIs like OpenCL, Renderscript and Filterscript, but will have much better support for future implementations put forward by heterogeneous processing-focused groups like the HSA Foundation.

Mobile GPU compute is about high performance and low power, not just standards

Renderscript gives developers the option to define the floating point precision required by their compute algorithms. This is particularly useful when targeting mobile platforms where battery life is the main consideration when developing applications. As few mobile CPUs or GPUs currently provide full compliance to the IEEE 754-2008 standard, this flexibility is particularly useful when developers want to both optimize their code for power efficiency and target as many platforms as possible without worrying about code compatibility issues.

Imagination’s PowerVR GPUs are designed with these low power and high efficiency considerations in mind. Therefore mapping a majority of the Renderscript intrinsics to our hardware architecture becomes a much more straightforward process in terms of GPU acceleration expectations.

In the graphs below, we see how scaling the number of GPU cores (going from a single core to a three-core PowerVR SGX544 GPU) leads to a corresponding increase in performance.

Most Android Renderscript examples running on PowerVR SGX544SC get a 3x boost when moved to a PowerVR SGX544MP3-based platform.

Furthermore, blends, blurs, color matrices or 3×3/5×5 convolve operations typically do not require FP64 precision. Therefore highly parallel scripts can now be developed and deployed across a range of Android devices from mass market smartphones to high-end tablets, with reasonable performance improvements for all use cases and a high degree of portability.

PowerVR GPUs have optimized 32-bit pipelines that are designed to support realistic mobile use-cases, with certain family members of the ‘Rogue’ architecture being able to scale to HPC tasks where FP64 precision is needed.

We await your feedback on Renderscript and Filterscript in the comments box below. To keep up to date with the latest developments for GPU Compute on PowerVR cores, follow us on Twitter (@ImaginationTech) and subscribe to our RSS feed.