CASE STUDY

Metal-Based Pixel Processing for Video Decoder - V-Nova

TechnoLynx helped V-Nova bring GPU acceleration to their LCEVC external video decoder framework on Apple devices. The work focused on replacing CPU-heavy decoding with Metal-based GPU processing, lowering CPU usage while maintaining high playback quality across iPhone, iPad, and Apple TV, including AVPlayer-based iOS apps.

Metal Shaders GPU Acceleration Video Decoding Pixel Processing Performance

The Challenge

V-Nova’s MPEG-5 LCEVC codec already performed well on AMD/NVIDIA GPUs, but on iOS the decoder relied on CPU-only processing, creating performance gaps under heavy loads, especially for high-resolution video and fast frame delivery. The goal was to add Metal GPU acceleration to reduce CPU load and improve scalability on Apple hardware without breaking framework compatibility.

CPU-only decoding on iOS

No Metal shader support, leading to less-than-ideal performance on Apple devices.

Scalability under load

CPU decoding struggled with higher resolutions, rapid frame changes, and real-time playback demands.

Frame handling + latency

CPU-bound image processing increased frame times, latency, and uneven playback under intensive workloads.

Compatibility requirements

The solution had to work with V-Nova’s external decoder framework and AVPlayer-based iOS apps across Apple devices.

Fraud detector audit cover image

Image credits: Freepik.

Project Timeline

From CPU-only iOS decoding to Metal-based GPU pixel processing across Apple devices

Platform gap assessment

Confirmed the core issue: GPU acceleration existed on non-Apple platforms, but iOS decoding remained CPU-only with no Metal path.

Designed a Metal-based approach to shift pixel processing and decoding workloads onto the GPU while maintaining framework compatibility across iPhone, iPad, and Apple TV.

Metal GPU strategy

Power-aware execution model

During testing, running the GPU in short bursts worked better than keeping it always active. Letting the GPU work at full load briefly and then rest helped manage power and heat more effectively.

Implemented a producer-consumer queue for frames so CPU and GPU could work in parallel, and built Metal GPU kernels that combine multiple operations into single passes to reduce memory reads and improve cache usage.

Pipeline and kernel optimisation

Runtime flexibility + validation

Precompiled kernel variants to avoid runtime stalls and enable future format support, refined memory layout for efficient access, and validated performance under normal and heavy-load playback conditions.

The Solution

The TechnoLynx team built a GPU-based solution using Apple’s Metal shader language. The goal was to move heavy decoding tasks away from the CPU. We kept compatibility with V-Nova’s external decoder framework and ensured support across iPhone, iPad, and Apple TV.

Metal shader GPU processing

Built Metal GPU kernels and combined multiple operations into a single pass to reduce memory reads and improve GPU cache usage.

Power-aware execution

Focused on efficient image processing techniques that would use GPU power without draining battery too quickly, and used short GPU bursts to help manage power and heat.

Frame pipeline + flexibility

Used a producer-consumer model with a frame queue so CPU and GPU could work in parallel, precompiled different kernel variants to choose at run-time without delays, and refined memory layout for efficient access.

Technical Specifications

Apple devices: iPhone, iPad, and Apple TV.
Had to work with V-Nova’s external decoder framework and AV Player-based iOS apps.
GPU-based solution using Apple’s Metal shader language.
Metal GPU kernels combined multiple operations into a single pass to reduce memory reads and improve cache usage.
Producer-consumer model with a frame queue so CPU and GPU work in parallel.
Precompiled different kernel variants to choose the right kernel at run-time without delays, with flexibility for different data formats later if needed.
Refined memory layout to keep memory access simple and efficient.

The Outcome

The Metal-based GPU implementation lowered CPU usage in most conditions and outperformed the original under simulated heavy loads, while power use stayed mostly the same and overall system heat stayed lower because the GPU finished tasks faster and rested more often. When video playback needed more processing power, the system responded better and did not freeze or lag.

Up to 25%
lower CPU usage in most conditions.
38%
Outperformed the original by about 38% under simulated heavy loads (fast frame changes or higher resolutions), with fewer dropped frames and smoother playback.

Key Achievements

Under normal conditions, video playback performance was equal to the original.

Under simulated heavy loads (fast frame changes or higher resolutions), the Metal-based solution had fewer dropped frames and smoother playback.

Power use stayed mostly the same, and overall system heat stayed lower because the GPU finished tasks faster and rested more often.

When video playback needed more processing power, the system responded better and did not freeze or lag.

Want to Improve Fraud Detection Resilience?

Let’s discuss how stronger data pipelines, better accuracy measurement, and ML-driven approaches can reduce risk from rare outlier cases.