Imagination’s R&D group has developed a face detection algorithm, which is based on a classifier cascade and is optimized to run on mobile devices comprising a CPU and PowerVR GPU. The algorithm employs several optimizations to improve performance and accuracy. In particular, instead of searching each entire frame for faces, the detector limits its search to regions in which faces were previously detected plus a few randomly selected regions. Tracking previously-found faces ensures they are not lost, while testing a variety of other regions ensures that new faces are found quickly.

The main steps performed are illustrated in the figure below:

12-Block-level implementation of face detection on CPU and GPU Block-level implementation of face detection on CPU and GPU

Source image preprocessing

The preprocessing kernel constructs three temporary images from a single source image including:

  1. A mipmap containing multiple versions of the source image at different scales.
  2. A copy of the image in chromatic colour space.
  3. A single-channel image (or probability map) that for each pixel records the probability that the corresponding pixel in the source image has skin colour, calculated by comparing the colour in the chromatic image to the colour of faces detected in previous frames.

The chromatic image and probability map are stored at quarter-resolution, which is sufficient to preserve accuracy while minimising memory and bandwidth requirements.
The pre-processing kernel operates on pixels of the source image in parallel: each work-item processes a separate block of 4×4 pixels, outputting one pixel of the chromatic image and one pixel of the probability map.

Tile generation

To facilitate parallel processing, the source image is divided into multiple tiles that can be processed independently on separate GPU clusters. These regions are described using an integral image that simplifies computation of Haar-like features.

Cascade classification

The cascade classifier limits its search to the vicinity of any faces detected in the previous frame (and surrounding areas), skin-coloured areas identified by thresholding the probability map, and regions selected by the random candidate generator.

In comparison to the sequential sliding window approach required by a CPU, the GPU work-items can evaluate multiple windows in parallel. A property of the algorithm is that some evaluations complete much sooner than others, each window requiring anywhere from one to one hundred stages of computation. To maintain parallelism, when a work-item finishes evaluating one window it starts evaluating another.

Find regions with skin colour

The skin region detector finds areas of the probability map that have high probability, passing these coordinates to the cascade classifier.

Zero-copy implementation

The CPU code is implemented in C++ and the GPU kernels are implemented in OpenCL. As shown in the diagram below, an Android demonstration application is created using the PowerVR imaging framework (introduced in a previous article in this series). This framework enables the face detection algorithm to be efficiently pipelined across the ISP, GPU and CPU, making use of shared zero-copy memory and cache allocations that minimize synchronization overheads.

13-Creating-an-Android-app-using-the-PowerVR-imaging-framework_fCreating an Android app using the PowerVR imaging framework

When integrated into an application based on the PowerVR Imaging Framework SDK, Imagination’s optimized face detection algorithm can detect up to four faces processed in real-time at 1080p 30fps using a two-cluster GPU part clocked at 200MHz. This leaves plenty of headroom to combine other tasks into the software pipeline such as image stabilization beforehand and beautification afterwards, while still achieving 1080p30 performance on many existing mobile and tablet products available in the market today.

Concluding remarks

Imagination’s hardware portfolio enables silicon vendors to create devices that deliver best-in-class performance while operating under a tight power and thermal envelope. Its PowerVR GPUs provide the performance and flexibility needed to accelerate both graphics and data-parallel computations across many mobile and embedded devices in the market today.

By pairing Imagination hardware with the PowerVR Imaging Framework, designers can now harness the vast amounts of performance available in their target SoC including the GPU, ISP, CPU, video codecs and hardware accelerators. Imagination’s close collaboration with strategic OEMs–and in some cases their third-party software partners–has already helped deliver new computational photography and computer vision use cases to market that intelligently distribute the required computations across the available heterogeneous hardware components.

Further reading

Here is a menu to help you navigate through every article published in this heterogeneous compute series:

 

Please let us know if you have any feedback on the materials published on the blog and leave a comment on what you’d like to see next. Make sure you also follow us on Twitter (@ImaginationPR, @GPUCompute and @PowerVRInsider) for more news and announcements from Imagination.

About the author: