Case-Study: Performance-porting of GPU application from OpenCL to Metal

This case study demonstrates our successful project in GPU application!

15/12/2023

Case-Study: Performance-porting of GPU application from OpenCL to Metal

Problem

Our client had a highly specialised GPU application, specifically designed and optimised for AMD and NVIDIA hardware. The client had invested significant resources in developing a high-quality algorithm, which was performing exceptionally well on these platforms. However, despite the software’s ability to run on Apple devices, the performance on Apple’s Metal framework was far below optimal levels.

This was a significant issue for the client because many creative professionals prefer Apple hardware. Apple devices, particularly those using the M1 and M2 chips, have become popular due to their power and design, making them a staple in the creative industry.

However, Apple uses its proprietary Metal framework for GPU technology, rather than the more common OpenCL, which posed a problem for software designed to work on more standard GPU architectures. Therefore, optimising the software to run smoothly on Apple devices became a high-priority task for the client, as it had the potential to unlock more business opportunities and expand their customer base.

The sub-par performance on Apple hardware was primarily due to the way the algorithm was optimised. The original code heavily used OpenCL, a framework that works well with AMD and NVIDIA graphics cards but was not as effective on Apple’s Metal framework. This difference in how GPU tasks are handled across platforms led to significant inefficiencies when the software ran on Apple hardware, making it unsuitable for creative professionals who rely on high performance for tasks like video editing, 3D rendering, and other GPU-intensive work.

Solution

Our team approached this project with the goal of ensuring the GPU application could be fully optimised for Apple’s Metal framework while maintaining the performance levels the client had achieved on AMD and NVIDIA hardware. This required building a flexible, highly adaptable framework that would convert the existing OpenCL codebase into Metal, Apple’s proprietary system.

To solve the problem, we developed a generic and flexible framework capable of covering the subset of functionality used by OpenCL and mapping these features to Metal. Our approach involved moving the execution of the algorithm closer to Metal’s native functionality, allowing for improved performance on Apple GPUs, particularly the M1 and M2 chips.

This conversion was not just a one-time process; it resulted in a sophisticated tool that could automatically convert nearly 100% of the original OpenCL code into Metal functionality.

By doing so, we ensured that the graphics processing units (GPUs) on Apple hardware could handle the same tasks as their AMD and NVIDIA counterparts with minimal performance loss. Our framework was specifically designed to address the computational needs of high-performance computing, ensuring real-time efficiency even in resource-intensive tasks.

Challenges of the Conversion Process

One of the key challenges in this project was maintaining the high-performance characteristics of the algorithm during the conversion process. OpenCL and Metal operate differently, and it was essential to ensure that the functionality provided by OpenCL could be replicated within Metal without sacrificing efficiency.

Another difficulty lay in balancing parallel processing tasks between the CPU (Central Processing Unit) and GPU. GPUs are designed for parallel processing, which is what made the original algorithm so effective on AMD and NVIDIA hardware. In moving to Apple’s Metal framework, we had to ensure that the balance between central processing units and GPUs was maintained, especially when the algorithm was performing real-time computations in applications that required immediate feedback, such as machine learning models and graphics rendering.

In many ways, this was a project that required careful consideration of general purpose computing on GPUs (GPGPU). Ensuring that the algorithm could handle specific tasks efficiently without overwhelming the system was key to achieving the performance boost our client required. The new framework we developed needed to handle a wide range of tasks, from graphics rendering to artificial intelligence computations, all while performing at a level that met industry standards for creative professionals.

Results

After implementing our conversion tool, we achieved dramatic performance improvements. On the smaller M1 GPU, the algorithm saw speed improvements of over 300%, and the larger M2 GPU delivered even better results. These real-time speed improvements not only made the client’s software viable on Apple hardware, but it also positioned it as a top-tier option for users who required high-performance solutions for their creative work.

The conversion framework we developed was more than just a quick-fix solution. It was designed with the future in mind, ensuring that the client’s software could be easily updated and maintained. The internal structure of the framework retained the single-source approach used in the original project, which made future development more streamlined. With the support of two backends (OpenCL and Metal), the client now had the flexibility to continue offering their solution across different platforms without additional overhead or complexity.

As a result, the client was able to expand their market to include Apple users, a significant business opportunity given the large number of creative professionals who rely on Apple devices for their work. The client’s software, previously optimised only for AMD and NVIDIA GPUs, could now deliver comparable performance on Apple’s Metal framework, thus broadening their appeal and increasing their customer base.

Long-Term Impact and Future-Proofing

One of the significant advantages of this project was its long-term impact. The solution we developed was not just a one-time fix; it was a high-performance computing solution that could continue to evolve with new hardware releases from Apple.

As Apple continues to develop its M-series chips, the software will be able to scale accordingly, thanks to the flexible nature of the conversion framework. The fact that the tool converts nearly 100% of the original codebase ensures that future updates and optimisations will be straightforward, saving the client time and money in future development cycles.

Moreover, this flexible framework provides the client with the ability to optimise for different GPUs, meaning they can easily expand their software’s compatibility across other platforms if needed.

Conclusion

This case study demonstrates the real-world challenges of optimising GPU technology for a wide range of hardware, particularly when dealing with proprietary frameworks like Apple’s Metal. By developing a flexible, high-performance computing solution, we were able to deliver a product that not only met the client’s immediate needs but also positioned them for long-term success in the evolving GPU market.

The project highlighted the importance of understanding the nuances of different GPUs and their related frameworks. Whether working with OpenCL, Metal, or any other system, the ability to map parallel processing tasks efficiently across central processing units and GPUs is essential for achieving optimal performance.

For businesses looking to optimise their software for multiple platforms, this case study offers a detailed example of how to approach the problem. The key takeaway is that real-time performance improvements are possible, even when working with fundamentally different GPU frameworks, as long as the project is approached with flexibility and future-proofing in mind.

At TechnoLynx, we specialise in tackling these kinds of high-performance computing challenges, helping businesses optimise their software for a wide range of platforms and hardware. Whether you’re working with AMD, NVIDIA, Apple, or any other hardware provider, our team has the expertise to ensure your software runs at its best, delivering the performance your customers expect.

Contac us to find out more!

Read our Blog!

Technical Excellence

Founded in 2019 by Balázs Keszthelyi, co-inventor of more than a dozen patents and contributor to two international standards, we know how to beat the state-of-the-art.

Balázs’ passion for high quality and superior performance sets a high bar, generating value for our clients and growth for our employees.

Meet our team

Technologies

Computer Vision
Generative AI
Extended Reality (XR)

What We Do

We specialise in guiding clients through the entire research and development journey, from initial prototyping to seamless integration and even safeguarding intellectual property. As an innovative solutions center, we not only identify areas for workflow enhancement but also actively engage in crafting and implementing solutions.

Reach out!

Services

Technical Business Analysis & Consulting
R&D Outsourcing
Custom Software Development
MLOps
Performance Optimisation

26/11/2024

Machine Learning on GPU: A Faster Future

Learn how GPUs transform machine learning, including AI tasks, deep learning, and handling large amounts of data efficiently.

13/11/2024

GPU Coding Program: Simplifying GPU Programming for All

Learn about GPU coding programs, key programming languages, and how TechnoLynx can make GPU programming accessible for faster processing and advanced computing.

16/08/2024

Enhance Your Applications with Promising GPU APIs

Review more complex GPU APIs to get the most out of your applications. Understand how programming may be optimised for efficiency and performance with GPUs tailored to computational processes.

16/07/2024

Why do we need GPU in AI?

Discover why GPUs are essential in AI. Learn about their role in machine learning, neural networks, and deep learning projects.

9/07/2024

How to use GPU Programming in Machine Learning?

Learn how to implement and optimise machine learning models using NVIDIA GPUs, CUDA programming, and more. Find out how TechnoLynx can help you adopt this technology effectively.

12/03/2024

Case-Study: Text-to-Speech

Read about our case study in Text-to-speech!

7/08/2023

Navigating the Potential GPU Shortage in the Age of AI

The rapid advancements in artificial intelligence have fueled an unprecedented demand for powerful GPUs (Graphics Processing Units) to drive AI computations.

6/06/2023

Case-Study: NLP Applications for Stock Market Prediction

Read all about our case study in Stock Market Prediction!

15/05/2023

Case-Study: Performance Modelling of AI Inference on GPUs

Read all about our case study in Performance Modelling of AI Inference in GPUs!

10/02/2023

Case Study: Multi-Target Multi-Camera Tracking

Read all about our case study in Multi-Target Multi-Camera Tracking!

7/02/2023

The 3 Reasons Why GPUs Didn’t Work Out for You available now!

TechnoLynx started to publish on Medium! From now on, you will be able to read all about our engineers’ expert views, tips and insights...

1/02/2023

The three Reasons Why GPUs Didnt Work Out for You

Most GPU-naïve companies would like to think of GPUs as CPUs with many more cores and wider SIMD lanes, but unfortunately, that understanding is missing some crucial differences.

11/01/2023

Case-Study: Action Recognition

We are proud to present our detailed case study in Action Recognition!

4/01/2023

Training a Language Model on a Single GPU in one day

AI Research from the University of Maryland investigating the cramming challenge for Training a Language Model on a Single GPU in one day.

2/11/2022

Consulting: AI for Personal Training

Read all about our case study in AI application in Personal Training!

22/05/2022

Case-Study: A Generative Approach to Anomaly Detection

See how we successfully compeleted this project using Anomaly Detection!

29/12/2020

Case Study - Accelerating Cryptocurrency Mining

Our client had a vision to analyse and engage with the most disruptive ideas in the crypto-currency domain. Read more to see our solution for this mission!

10/11/2020

Case Study - AI-Generated Dental Simulation

Our client, Tasty Tech, was an organically growing start-up with a first-generation product in the dental space, and their product-market fit was validated. Read more.

17/09/2020

Case Study - Fraud Detector Audit

Discover how a robust fraud detection system combines traditional methods with advanced machine learning to detect various forms of fraud!

15/04/2020

Case Study - Embedded Video Coding on GPU

TechnoLynx developed a customised embedded video coding solution using GPU optimisation, dedicated graphics cards, and discrete GPUs to enhance video compression efficiency, performance, and integration within the client’s pipeline.

23/01/2020

Case Study - Accelerating Physics -Simulation Using GPUs

TechnoLynx used GPU acceleration to improve physics simulations for an SME, leveraging dedicated graphics cards, advanced algorithms, and real-time processing to deliver high-performance solutions, opening up new applications and future development potential.