CASE STUDY

Embedded Video Coding on GPU (Under NDA)

TechnoLynx delivered a fully customised embedded video coding solution by optimising GPU execution—using dedicated graphics and discrete GPUs to improve compression efficiency, runtime performance, and integration within the client’s pipeline.

GPU Optimisation Embedded Video Coding CUDA Compression

The Challenge

The client was a post-funding startup building an embedded video coding approach with the potential to change how video is compressed, processed, and delivered. However, off-the-shelf encoders were not customisable enough and could not integrate cleanly into their pipeline.

Deep Encoder Customisation

Existing market encoders lacked the flexibility needed to implement the client’s unique requirements and still fit the wider pipeline.

Standards Compliance Constraint

The client did not control the decoder side, so the solution had to remain compliant with established coding standards while still enabling modifications.

CPU Overload, GPU Available

The target embedded system had a powerful GPU, but the CPU was already overworked—so offloading encoding tasks to the GPU was critical.

Embedded video coding on GPU

Image credits: Freepik.

Project Timeline

Constraints & Requirements

Established the need for deep encoder control while staying compliant with coding standards, and confirmed GPU offload was essential due to CPU load on the embedded target.

Split the encoding pipeline into distinct modules and defined APIs, enabling both teams to work autonomously while ensuring smooth integration.

Modular Pipeline

Core Encoder Work

Implemented and iteratively improved transform and prediction functionality, starting from a state-of-the-art baseline agreed with the client.

Used CUDA to offload computationally intensive work to the GPU, ensuring the GPU was utilised efficiently without overwhelming the system.

GPU Optimisation

Integration & Playback

Ensured embed and playback worked with the client’s hosting service and video player, including refresh-rate optimisation for fluid playback under high GPU load.

The Solution

TechnoLynx worked closely with the client’s team in a hands-on collaboration model. The client focused on video coding direction, while TechnoLynx owned GPU-specific optimisation and performance-critical improvements within the encoding process.

Architecture

Adopted a modular pipeline design by defining distinct encoder modules and establishing clear APIs, allowing parallel progress while ensuring seamless system integration.

Encoder Core

Focused on transform and prediction functionality—key drivers of compression efficiency—and iterated improvements from a state-of-the-art baseline through benchmarking and tuning cycles.

CUDA Acceleration

Used CUDA to push computationally intensive tasks to the GPU so the CPU could remain available for other critical system processes.

Technical Specifications

Tools Cross-platform C++, CUDA (NVIDIA), CMake
Modules Transform and prediction functionality (compression efficiency drivers)
Requirement Reliable operation across different operating systems and GPU environments
Constraint Maintain compliance with established coding standards (decoder not controlled)
Integration Embed code + compatibility with hosting service and video player
UX Refresh-rate optimisation for smooth, responsive playback under GPU load
Embedded GPU video coding

The Outcome

TechnoLynx delivered a fully customised encoding solution that met the client’s requirements. By offloading the most computationally intensive work to the GPU, the CPU stayed available for other tasks—preventing overload—while compression efficiency exceeded expectations, enabling higher-quality video at lower bitrates.

Key Achievements

Offloaded transform and prediction tasks to the GPU, keeping the CPU free for other critical system processes

Improved compression efficiency to support higher-quality video delivery at lower bitrates

Consistent performance across different GPU environments and operating systems using cross-platform C++ and CMake

Optimised refresh rates so embedded playback remained fluid and responsive during high-performance GPU workloads

Delivered embed code that integrated smoothly with hosting services and video players across a wide range of setups

Established a future-proof foundation to support advanced use cases including VR, ray tracing, and 3D graphics

Ready to Optimise Your Video Pipeline?

Let’s discuss GPU optimisation for embedded video coding—improving compression efficiency, runtime performance, and playback quality.