TechnoLynx delivered a fully customised embedded video coding solution by optimising GPU execution—using dedicated graphics and discrete GPUs to improve compression efficiency, runtime performance, and integration within the client’s pipeline.
The client was a post-funding startup building an embedded video coding approach with the potential to change how video is compressed, processed, and delivered. However, off-the-shelf encoders were not customisable enough and could not integrate cleanly into their pipeline.
Deep Encoder Customisation
Existing market encoders lacked the flexibility needed to implement the client’s unique requirements and still fit the wider pipeline.
Standards Compliance Constraint
The client did not control the decoder side, so the solution had to remain compliant with established coding standards while still enabling modifications.
CPU Overload, GPU Available
The target embedded system had a powerful GPU, but the CPU was already overworked—so offloading encoding tasks to the GPU was critical.
Image credits: Freepik.
Established the need for deep encoder control while staying compliant with coding standards, and confirmed GPU offload was essential due to CPU load on the embedded target.
Split the encoding pipeline into distinct modules and defined APIs, enabling both teams to work autonomously while ensuring smooth integration.
Implemented and iteratively improved transform and prediction functionality, starting from a state-of-the-art baseline agreed with the client.
Used CUDA to offload computationally intensive work to the GPU, ensuring the GPU was utilised efficiently without overwhelming the system.
Ensured embed and playback worked with the client’s hosting service and video player, including refresh-rate optimisation for fluid playback under high GPU load.
TechnoLynx worked closely with the client’s team in a hands-on collaboration model. The client focused on video coding direction, while TechnoLynx owned GPU-specific optimisation and performance-critical improvements within the encoding process.
Adopted a modular pipeline design by defining distinct encoder modules and establishing clear APIs, allowing parallel progress while ensuring seamless system integration.
Focused on transform and prediction functionality—key drivers of compression efficiency—and iterated improvements from a state-of-the-art baseline through benchmarking and tuning cycles.
Used CUDA to push computationally intensive tasks to the GPU so the CPU could remain available for other critical system processes.
TechnoLynx delivered a fully customised encoding solution that met the client’s requirements. By offloading the most computationally intensive work to the GPU, the CPU stayed available for other tasks—preventing overload—while compression efficiency exceeded expectations, enabling higher-quality video at lower bitrates.
Offloaded transform and prediction tasks to the GPU, keeping the CPU free for other critical system processes
Improved compression efficiency to support higher-quality video delivery at lower bitrates
Consistent performance across different GPU environments and operating systems using cross-platform C++ and CMake
Optimised refresh rates so embedded playback remained fluid and responsive during high-performance GPU workloads
Delivered embed code that integrated smoothly with hosting services and video players across a wide range of setups
Established a future-proof foundation to support advanced use cases including VR, ray tracing, and 3D graphics
Let’s discuss GPU optimisation for embedded video coding—improving compression efficiency, runtime performance, and playback quality.