Case-Study: V-Nova - GPU Porting from OpenCL to Metal

Case study on moving a GPU application from OpenCL to Metal for our client V-Nova. Boosts performance, adds support for real-time apps, VR, and machine learning on Apple M1/M2 chips.

Case-Study: V-Nova - GPU Porting from OpenCL to Metal
Written by TechnoLynx Published on 15 Dec 2023

Background

V-Nova approached TechnoLynx with a clear need. They had a large, well-structured GPU codebase written in OpenCL code. Their solution worked well on most platforms, but performance on Apple devices using M1 and M2 chips needed improvement.

Apple supports Metal as the primary GPU framework. To avoid rewriting everything from scratch, V-Nova wanted a way to reuse their existing code on Apple hardware.

This was not just about code conversion. It was about making sure the existing programming model stayed intact. The task also included maintaining performance across different platforms without fragmenting the codebase.

GPU porting can be complex, especially when frameworks differ in memory models, syntax, and supported features. V-Nova needed a solution that worked across all platforms, handled shared memory correctly, and delivered performance improvements without compromising functionality.

Problem

V-Nova had a GPU-heavy application built for high performance on AMD and NVIDIA hardware. It used OpenCL and performed well on standard graphics processing units. But performance dropped on Apple’s Metal framework.

Creative professionals often use Apple devices. With M1 and M2 chips, Apple hardware is now widely used in fields like virtual reality, 3D design, and computer graphics. Many of these tasks depend on GPU performance.

This included Apple’s custom Metal framework. Their goal was to maintain a single-source codebase while expanding support.

We also ran into differences in thread indexing and address space usage. OpenCL uses keywords like __local and __private, while Metal uses thread group and thread. Barriers, event handling, and memory access worked differently.

The aim was not just to make things run. We needed GPU computing to work reliably and efficiently, with consistent outputs across all platforms. This meant solving the problem in a new way.

Challenges

The main challenge was GPU architecture. OpenCL and Metal use different concepts. We needed to match performance without changing the core algorithm.

V-Nova’s code was deeply optimised for OpenCL. The code assumed GPU scheduling and address space behaviour typical of AMD and NVIDIA devices. Apple’s Metal works differently. It needed custom adjustments to work with the Apple GPU pipeline.

We also had to address parallelism. The software used both the central processing unit (CPU) and GPU. Moving this logic to Metal required precise control to prevent bottlenecks.

Real-time performance was key. The app needed to deliver fast results for users in fields like ray tracing, video editing, and machine learning. We couldn’t afford delays.

Another issue was GPGPU use. The app did more than graphics. It performed complex general-purpose tasks such as physics and AI computations. We had to preserve this capability.

Some tools used in the original setup also depended on specific GPU features not supported on Intel graphics. We needed fallbacks.

Solution

We created a tool that could port code at runtime. It reads OpenCL code and outputs a working Metal version. This made it possible to reuse most of the existing kernels with almost no rewriting.

We kept key GPU constructs intact. Address space handling, thread hierarchy, and synchronisation were preserved. This made the programming model behave in a familiar way across both systems.

The tool caches compiled kernels using checksums. If a kernel hasn’t changed, we don’t recompile it. This speeds up app startup and improves development time. In cases where Metal lacked a direct match for OpenCL features, we wrote custom code to fill the gap.

For example, instead of calling get_global_id() like in OpenCL, thread indices are passed as arguments in Metal. Global memory and shared memory concepts were mapped carefully. In Metal, buffers are often shared by default between CPU and GPU, so unmap() becomes a no-op.

We also handled small but important differences. Logical operators return different values for true and false in the two systems. The syntax for memory barriers is not the same.

Vector field access needed adjustment. Event handling was rebuilt to fit the Metal system.

This GPU porting tool means developers can write one kernel and run it on both OpenCL and Metal. No need for two separate codebases. It saves time, prevents bugs, and keeps development simpler.

The work also included testing. We built a tool that records all memory usage, parameters, and results before and after running a kernel. This data goes into a file that can be replayed later.

Using this file, we can compare the behaviour of OpenCL and Metal kernels. We can even pass output from one backend into the next one in the pipeline. This helps us find where differences begin. We can spot which kernel causes incorrect results and fix it fast.

Results

The final solution worked well. V-Nova did not have to split their GPU codebase. They could use the same source code on Apple devices with Metal and on other platforms with OpenCL. Our framework improved GPU speed significantly. On the M1, performance increased by over 300%. On M2, results were even better.

Real-time GPU tasks ran at smooth frame rates. This made the software viable for video rendering, design, and scientific computing on Apple devices.

The runtime porting tool made it easier to test and update the code. Developers could focus on improving features without worrying about backend differences.

The GPU computing performance on Apple M1 and M2 chips improved. By porting code directly to Metal and avoiding emulation, we reduced execution time and increased throughput.

Shared memory access was handled efficiently. Global memory usage was mapped properly. This helped avoid memory access errors, which are common when switching between GPU frameworks.

The tool also made testing much easier. With output replay and buffer tracking, debugging became faster and more accurate. Differences in kernel output were tracked to the exact point of failure.

The solution even allowed for some creativity. We worked within the limitations of Metal but still achieved the same results as OpenCL.

We didn’t forget the basics. Even though we dealt with advanced GPU frameworks, we still respected the programming language rules and worked from clear goals. At times, it felt like working through the periodic table—picking and adapting elements like iron (for structure) and pure metals (for clarity), and adjusting weight where needed

Replay Tool
Replay Tool

Final Thoughts

This case showed that smart GPU porting can solve complex issues. With the right tools and planning, moving from OpenCL to Metal can be smooth.

We made sure that shared memory, global memory, and kernel logic matched across both platforms. This allowed V-Nova to keep their performance high and avoid code duplication.

TechnoLynx delivered a solution that was simple to maintain, efficient to run, and easy to extend. The programming model stayed consistent, which kept developers happy.

And most importantly, the software now works well on Apple hardware—without extra work from the client’s side. GPU computing on M1 and M2 is no longer a limitation. It’s just part of the process.

Real-Time Edge Processing with GPU Acceleration

Real-Time Edge Processing with GPU Acceleration

10/07/2025

Learn how GPU acceleration and mobile hardware enable real-time processing in edge devices, boosting AI and graphics performance at the edge.

Case Study: CloudRF  Signal Propagation and Tower Optimisation

Case Study: CloudRF  Signal Propagation and Tower Optimisation

15/05/2025

See how TechnoLynx helped CloudRF speed up signal propagation and tower placement simulations with GPU acceleration, custom algorithms, and cross-platform support. Faster, smarter radio frequency planning made simple.

Generative AI Development Services for Smarter AI Solutions

Generative AI Development Services for Smarter AI Solutions

12/02/2025

Looking for generative AI development services? Learn how machine learning models, natural language processing, and neural networks improve content creation, image generation, and more.

Custom AI Development Services for Business Growth

Custom AI Development Services for Business Growth

29/01/2025

Looking for custom AI development services? Learn how tailored AI models can improve efficiency and drive growth.

Benefits of Classical Computer Vision for Your Business

Benefits of Classical Computer Vision for Your Business

28/01/2025

Learn how classical computer vision technology, including image processing, optical character recognition (OCR), and facial recognition, can improve inventory management, medical imaging, and more for your business.

Machine Learning on GPU: A Faster Future

Machine Learning on GPU: A Faster Future

26/11/2024

Learn how GPUs transform machine learning, including AI tasks, deep learning, and handling large amounts of data efficiently.

GPU Coding Program: Simplifying GPU Programming for All

GPU Coding Program: Simplifying GPU Programming for All

13/11/2024

Learn about GPU coding programs, key programming languages, and how TechnoLynx can make GPU programming accessible for faster processing and advanced computing.

Enhance Your Applications with Promising GPU APIs

Enhance Your Applications with Promising GPU APIs

16/08/2024

Review more complex GPU APIs to get the most out of your applications. Understand how programming may be optimised for efficiency and performance with GPUs tailored to computational processes.

Why do we need GPU in AI?

Why do we need GPU in AI?

16/07/2024

Discover why GPUs are essential in AI. Learn about their role in machine learning, neural networks, and deep learning projects.

How to use GPU Programming in Machine Learning?

How to use GPU Programming in Machine Learning?

9/07/2024

Learn how to implement and optimise machine learning models using NVIDIA GPUs, CUDA programming, and more. Find out how TechnoLynx can help you adopt this technology effectively.

How AI Can Benefit Product Development Consultancy?

How AI Can Benefit Product Development Consultancy?

1/07/2024

Let's see how AI is revolutionising product development consultancy. Learn about the benefits of AI in market research, design, production, and customer satisfaction. Find out how TechnoLynx can help your business succeed with AI-driven solutions.

Why Generative AI Consulting is Vital in 2024?

Why Generative AI Consulting is Vital in 2024?

21/06/2024

Discover why generative AI consulting is essential in 2024. Learn how AI consulting can help businesses harness generative AI models, natural language processing, and deep learning for long-term success.

Key Benefits of Generative AI for Text-to-Speech

29/05/2024

Discover the key benefits of generative AI for text-to-speech. Learn how generative AI models and consulting services enhance customer experience with natural-sounding voices.

Benefits of custom software engineering services in 2024

28/05/2024

Discover the advantages of custom software engineering services in 2024. Learn how AI consulting, machine learning, and tailored solutions can enhance your business processes.

What is AI Consulting?

24/05/2024

Discover the benefits of AI Consulting and how it can transform your business strategy. Learn how TechnoLynx provides expert AI consulting services to help you achieve your business goals.

The Power of Generative AI in Customer Service - GenAI Use Cases

17/05/2024

Discover how generative AI is redefining customer service across industries. Learn about the benefits, applications, and strategies for using this cutting-edge technology to keep the customer first.

AI in Archaeology: Advancements and Applications

8/05/2024

Explore the role of artificial intelligence (AI) in archaeology, including its use in analysing archaeological sites, processing lidar data, and enhancing archaeological research.

Generative AI Consulting for Business Advancement

29/04/2024

Discover how generative AI consulting services from TechnoLynx can transform your business with natural language processing, computer vision, and high-quality image generation, enhancing customer service and driving innovation across various industries.

The Potential of Generative AI Consulting Services

26/04/2024

TechnoLynx offers expert generative AI consulting services, gaining the power of natural language processing, computer vision, and deep learning to create high-quality generated content across a wide range of industries.

AI Consulting Services: Empowering Businesses with AI

24/04/2024

Discover how AI consulting services, like TechnoLynx, guide businesses through the complexities of AI adoption, developing ethical strategies and driving growth with emerging AI technologies.

AI in Digital Visual Arts: Exploring Creative Frontiers

22/04/2024

Traverse the fusion of AI and digital visual arts. Discover cutting-edge techniques and increase your creativity with AI-powered tools. Embrace the future of artistry today!

The Essence of AI Consulting and MLOps Solutions

21/04/2024

Learn about ChatGPT Consulting and the benefits of AI ML consulting. Explore how our AI consultants provide expert Machine Learning consulting and MLOps solutions tailored to your needs.

Empowering Business Growth with Custom Software Development

19/04/2024

Discover how our custom software development company enhances business operations with tailored solutions. From real-time analytics to agile software development, we deliver cutting-edge software products, ensuring security, quality assurance, and superior user experience.

Case-Study: Text-to-Speech Inference Optimisation on Edge (Under NDA)

12/03/2024

See how our team applied a case study approach to build a real-time Kazakh text-to-speech solution using ONNX, deep learning, and different optimisation methods.

Growth in Businesses through Custom Software Development

14/02/2024

Find out how custom development services by TechnoLynx are here to consolidate processes, optimise productivity, and propel the business growth.

Machine learning consulting

8/11/2023

At TechnoLynx, we're dedicated to helping businesses take advantage of the immense potential of machine learning. Read more about activities.

Navigating the Potential GPU Shortage in the Age of AI

7/08/2023

The rapid advancements in artificial intelligence have fueled an unprecedented demand for powerful GPUs (Graphics Processing Units) to drive AI computations.

Case-Study: Generative AI for Stock Market Prediction

6/06/2023

Case study on using Generative AI for stock market prediction. Combines sentiment analysis, natural language processing, and large language models to identify trading opportunities in real time.

Case-Study: Performance Modelling of AI Inference on GPUs

15/05/2023

Learn how TechnoLynx helps reduce inference costs for trained neural networks and real-time applications including natural language processing, video games, and large language models.

Data Science Conference (DSC) Vienna

23/04/2023

Last Friday, we had the pleasure of participating in the Data Science Conference (DSC) held in Vienna!

Case Study: Multi-Target Multi-Camera Tracking

10/02/2023

Learn how TechnoLynx built a cost-efficient, AI-powered multi-target tracking system using existing CCTV infrastructure. Real-time object tracking across non-overlapping cameras using global and local IDs.

The 3 Reasons Why GPUs Didn’t Work Out for You available now!

7/02/2023

TechnoLynx started to publish on Medium! From now on, you will be able to read all about our engineers’ expert views, tips and insights...

The three Reasons Why GPUs Didnt Work Out for You

1/02/2023

Most GPU-naïve companies would like to think of GPUs as CPUs with many more cores and wider SIMD lanes, but unfortunately, that understanding is missing some crucial differences.

Case-Study: Action Recognition for Security (Under NDA)

11/01/2023

See how TechnoLynx used AI-powered action recognition to improve video analysis and automate complex tasks. Learn how smart solutions can boost efficiency and accuracy in real-world applications.

Training a Language Model on a Single GPU in one day

4/01/2023

AI Research from the University of Maryland investigating the cramming challenge for Training a Language Model on a Single GPU in one day.

Consulting: AI for Personal Training Case Study - Kineon

2/11/2022

TechnoLynx partnered with Kineon to design an AI-powered personal training concept, combining biosensors, machine learning, and personalised workouts to support fitness goals and personal training certification paths.

Case-Study: A Generative Approach to Anomaly Detection (Under NDA)

22/05/2022

See how we successfully compeleted this project using Anomaly Detection!

Case Study: Accelerating Cryptocurrency Mining (Under NDA)

29/12/2020

Our client had a vision to analyse and engage with the most disruptive ideas in the crypto-currency domain. Read more to see our solution for this mission!

Case Study - AI-Generated Dental Simulation

10/11/2020

Our client, Tasty Tech, was an organically growing start-up with a first-generation product in the dental space, and their product-market fit was validated. Read more.

Case Study - Fraud Detector Audit (Under NDA)

17/09/2020

Discover how a robust fraud detection system combines traditional methods with advanced machine learning to detect various forms of fraud!

Case-Study: V-Nova - Metal-Based Pixel Processing for Video Decoder

15/04/2020

TechnoLynx improved V-Nova’s video decoder with GPU-based pixel processing, Metal shaders, and efficient image handling for high-quality colour images across Apple devices.

Case Study - Accelerating Physics -Simulation Using GPUs (Under NDA)

23/01/2020

TechnoLynx used GPU acceleration to improve physics simulations for an SME, leveraging dedicated graphics cards, advanced algorithms, and real-time processing to deliver high-performance solutions, opening up new applications and future development potential.

← Back to Blog Overview