Problem

Our client had a GPU-heavy application built for high performance on AMD and NVIDIA hardware. It used OpenCL and performed well on standard graphics processing units. But performance dropped on Apple’s Metal framework.

Creative professionals often use Apple devices. With M1 and M2 chips, Apple hardware is now widely used in fields like virtual reality, 3D design, and computer graphics. Many of these tasks depend on GPU performance.

Apple’s Metal is not compatible with OpenCL. The code worked, but was too slow. The client needed the app to perform well across devices.

This included Apple’s custom silicon and Metal framework. Their goal was to maintain a single-source codebase while expanding support.

Challenges of the Conversion Process

The main challenge was GPU architecture. OpenCL and Metal use different concepts. We needed to match performance without changing the core algorithm.

The client’s code was deeply optimised for OpenCL. The code assumed GPU scheduling and address space behaviour typical of AMD and NVIDIA devices. Apple’s Metal works differently. It needed custom adjustments to work with the Apple GPU pipeline.

We also had to address parallelism. The software used both the central processing unit (CPU) and GPU. Moving this logic to Metal required precise control to prevent bottlenecks.

Real-time performance was key. The app needed to deliver fast results for users in fields like ray tracing, video editing, and machine learning. We couldn’t afford delays.

Another issue was GPGPU use. The app did more than graphics. It performed complex general purpose tasks such as physics and AI computations. We had to preserve this capability.

Some tools used in the original setup also depended on specific GPU features not supported on Intel graphics or Metal. We needed fallbacks.

Solution

Our team created a tool that converts OpenCL code to Metal. The system supports nearly 100% of the OpenCL code used by the app. This means it needs minimal manual changes.

We focused on building a flexible framework. It maps OpenCL features directly to their Metal equivalents. Where Metal lacked a direct match, we wrote efficient custom implementations.

The tool preserved key structures such as address space handling, thread hierarchy, and synchronisation. This helped maintain compatibility and performance.

We also built a dual-backend setup. The software runs on both OpenCL and Metal. The same source can be built for multiple targets. This helped the client avoid maintaining separate versions.

This solution helped the software meet GPU performance targets on Apple M1 and M2 chips. It now supports applications including ray tracing, virtual reality, and advanced computer graphics.

Results

Our framework improved GPU speed significantly. On the M1, performance increased by over 300%. On M2, results were even better.

Real-time GPU tasks ran at smooth frame rates. This made the software viable for video rendering, design, and scientific computing on Apple devices.

The app could now run across AMD and NVIDIA with OpenCL, and Apple hardware with Metal. It also worked on devices with Intel graphics and dedicated graphics cards. This included video card support across major desktop platforms.

The dual-backend approach kept development simple. It ensured cross-platform compatibility without rewriting the entire codebase.

Long-Term Impact and Future-Proofing

The solution will scale with future Apple devices. As M-series chips evolve, the framework can adapt. The tool supports extensions and improvements.

The codebase now handles a wide range of GPU devices. It supports performance tuning across platforms. The team can test features on one system and apply improvements across others.

The client can now support GPU workloads like ray tracing, graphics applications, and machine learning on Apple. They are no longer limited by Metal’s unique structure.

Practical Use and Broader Adoption

The performance gain from this project opened new use cases. The software, once limited to specific hardware, became usable across different devices. This included laptops, desktops, and workstations with varying GPU setups. Professionals in media production, data science, and machine learning could now access the same application with consistent performance.

Many creative users work in environments that switch between macOS and Windows. By supporting both Metal and OpenCL, the client met this demand. Artists working on visual effects, animation, or video editing no longer had to switch tools when changing systems. This reduced friction and improved workflow.

Support for Metal also meant better use of Apple’s unified memory. Unlike traditional GPUs with separate memory blocks, M1 and M2 chips share memory between the CPU and GPU. Our tool adapted the code to work efficiently within this shared address space. This helped reduce memory copying and sped up execution.

The new GPU support also made a difference for virtual reality developers. They often rely on real-time feedback and low latency. Metal’s direct hardware access and our optimisations enabled smooth VR performance on Apple hardware. Developers could now build and test VR applications using just an Apple device.

The same was true for ray tracing. This task needs both speed and accuracy. Real-time ray tracing depends on efficient use of graphics processing units. Our ported code allowed these features to run well on Apple GPUs, which are known for handling large workloads under tight energy constraints.

Another area of growth was machine learning. Many AI tasks need strong GPU performance. The application could now support tasks such as feature extraction, model training, and result visualisation.

All of this could run on a machine without needing an external video card. This was important for mobile teams and solo developers.

We also ensured compatibility with other GPU brands, including Intel graphics. This expanded the app’s reach further. It no longer depended on a dedicated graphics card. Users with integrated graphics could still run important features, though at slightly reduced speeds.

To support future updates, we included logging and performance tracking. This allowed the client to see how the app performed on different hardware. It also helped them identify where more improvements could be made. These logs were easy to interpret and helped the client maintain quality across updates.

Adding Metal support also improved energy use. Apple chips are known for their power efficiency. Our port respected that.

By matching the GPU’s job sizes to the chip’s preferred patterns, we reduced heat and power draw. This helped users working on laptops or mobile workstations.

We also considered broader industry shifts. As more developers adopt Metal, support for cross-platform frameworks will matter more. Our client now has a solid base. They can support OpenCL, Metal, and future APIs without major rewrites.

This case shows how adapting GPU software can bring long-term gains. Better support, faster speeds, and wider access all came from one major update. With the right tools, even complex GPU tasks can work well across systems.

Conclusion

This project shows how to move GPU code from OpenCL to Metal without losing performance. It highlights the challenge of balancing CPU and GPU work, especially in high-performance computing.

With our help, the client gained real-time performance, cross-platform support, and a flexible setup. They reached users on Apple hardware, expanded their market, and kept development costs low.

At TechnoLynx, we build GPU solutions that work. We help businesses improve performance across AMD and NVIDIA, Apple Metal, Intel graphics, and dedicated graphics cards. Whether it’s graphics rendering, machine learning, or general GPU work—we make it fast, portable, and reliable.

Contac us to find out more!

Image by Freepik
Image by Freepik